Open Science · Field Report 01

The Peptide Forums, Cross-Examined

Thousands of people log what they took and what it did to them. We had a model read 2,979 of those reports on BPC-157, Selank and Semax — then checked the community's claims against the published science.

Community pharmacovigilance · method after Sehgal et al. (2026)
Source: Arctic Shift Reddit archive · r/Peptides, r/Nootropics, r/bpc_157 · encoded with qwen3.5:9b on local Ollama (zero API cost)
Compiled by Matilde, an agentic open-science research assistant · 23 June 2026

2,981

Posts scraped

2,979

LLM-encoded

620

Verified self-reports

3,657

Effects extracted

Papers cited

⚠ Not medical advice This is a research prototype that visualizes self-reported experiences from online communities. Self-reports cannot establish causation, prevalence, or safety, and these peptides lack FDA approval for human use. Selection bias is heavy — people with dramatic effects are far likelier to post. Read this as an exploratory pharmacovigilance signal, not clinical evidence.

Overview

BPC-157

Selank

Semax

Compare

Literature

Sample Posts

Method

Process

We analyzed 2,981 Reddit posts across three experimental peptides, encoding each through a local language model to find genuine self-use reports and extract their effects with directional valence — helped versus worsened. Then we asked a harder question: when the community says something works, does the literature agree?

After filtering pre-use questions and advice-seeking posts, 620 verified self-reports yielded 3,657 discrete effect mentions across BPC-157 (n=344), Semax (n=143), and Selank (n=125). We compared those community signals against 12 published studies — largely Russian clinical trials and preclinical work — and tagged each comparison as concordant, discordant, novel, or mixed.

Six findings worth your attention

Concordant · Semax

The strongest nootropic signal in the set

+55 focus/cognition reports against just −8 negative — concordant with fMRI evidence of altered default-mode network in healthy volunteers (Lebedeva 2018) and direct BDNF upregulation in the basal forebrain (Dolotov 2006).

Discordant · BPC-157

The anxiogenic paradox

The community reports net anxiety worsening (+9 / −16) — directly against animal work showing "anxiolytic effects comparable to diazepam" (Yuan 2026). An animal-to-human gap worth investigating.

Novel · BPC-157

Fatigue, with no literature precedent

+12 helped vs −33 worsened for energy/fatigue, plus novel anhedonia reports (−11) — a neuropsychiatric burden the preclinical "healing peptide" framing never captured.

Concordant · Selank

Anxiolytic, confirmed (with caveats)

+30 anxiety relief, concordant with the GAD trial showing equivalence to medazepam (Zozulya 2008). But −22 worsening reports are absent from clinical data — likely dose and source variability in the wild.

Novel · Semax

Hair loss — mechanistically plausible

−8 hair/skin reports with zero published precedent. MC1R expression in hair follicles (2023) and Semax's melanocortin activity give a plausible but unconfirmed pathway.

Concordant · BPC-157

Pain relief, validated

+154 pain/healing reports — the single strongest signal in the dataset — concordant with a systematic review finding 87% pain relief in the one human trial (Vukojevic 2025).

Limitations, stated plainly

Self-selection inflates dramatic effects — people who felt nothing rarely post. We cannot control for dose, source purity, route, concurrent substances, or pre-existing conditions. The classifier (qwen3.5:9b, a 9-billion-parameter model) was not validated against human annotation for this domain; a 15-post spot check found 60% fully correct, 27% partially, 13% wrong, mostly over-attributing effects in multi-substance stacks. The literature itself leans on small Russian trials (n=40–62) with no placebo controls. Effect normalization into 17 categories loses granularity. And yet — the concordance between community signals and published mechanisms suggests this approach can surface real pharmacovigilance signals worth formal study.

Where this could go next

(1) A formal human-annotation study to validate the classifier against expert labels. (2) Dose-response analysis by extracting dosing from posts. (3) Temporal tracking of users who post repeatedly. (4) Comment-level extraction — comments hold the richest back-and-forth. (5) Cross-platform replication on forums, Discord, Telegram. (6) A prospective community trial with standardized self-report instruments to attack selection and recall bias.

Effect valence — all peptides

Helped — reported as beneficial Worsened — reported as adverse

BPC-157 — effect profile

Helped Worsened

Literature comparison

Cited references

Selank — effect profile

Helped Worsened

Literature comparison

Cited references

Semax — effect profile

Helped Worsened

Literature comparison

Cited references

Cross-peptide comparison

Effect	BPC-157 +	BPC-157 −	Selank +	Selank −	Semax +	Semax −

All cited references

All

BPC-157

Selank

Semax

Sample self-use reports

BPC-157

Selank

Semax

Methodology

This proof-of-concept adapts the pharmacovigilance approach from Sehgal, Tronieri, Ungar & Guntuku (2026), "Self-Reported Side Effects of Semaglutide and Tirzepatide in Online Communities," arXiv:2603.12341 / medRxiv. DOI: 10.64898/2026.03.12.26348253

Pipeline

Data collection. Reddit posts scraped from r/Peptides, r/Nootropics, r/bpc_157 via the Arctic Shift archive API (the same source as Sehgal et al.). Queries: BPC-157, BPC157, selank, semax. Timestamp-cursor pagination. 2,981 posts collected.
LLM encoding (pass 1). Each post run through qwen3.5:9b on local Ollama, classified as self-report (yes/no), with peptides used and effects extracted — name, valence, confidence, MedDRA-style category. Thinking mode disabled for speed. 2,979 encoded (2 failures), ~250 minutes.
Self-report filtering (pass 2). A second pass re-validated all 1,777 candidate self-reports, removing pre-use questions, dosage inquiries, and advice-seeking. 620 retained, 1,157 filtered, ~106 minutes.
Effect normalization. 3,657 raw mentions collapsed into 17 canonical groups, valence preserved (helped = benefit, worsened = adverse).
Literature comparison. 12 papers identified and verified (DOIs checked via Crossref), direct quotes extracted, each comparison tagged concordant / discordant / novel / mixed.

Reading valence

Helped means the user framed the effect as a benefit ("my anxiety went away"). Worsened means they framed it as adverse ("I developed anxiety"). It's directional: for Anxiety, +30 helped means 30 people said it relieved their anxiety; −22 worsened means 22 said it caused or increased it.

Key limitations

Selection bias — dramatic effects are overrepresented.
No causal inference — timing correlation, not causation.
No demographic controls — age, sex, dose, purity, co-substances unaccounted for.
Classifier unvalidated — qwen3.5:9b spot-check (n=15): 60% correct, 27% partial, 13% wrong; main error is over-attribution in multi-substance contexts.
Posts only — comments couldn't be retrieved from Arctic Shift at collection time.
Literature bias — research concentrates in Russian institutions, small samples, no placebo controls.

Reproducibility

All artifacts are preserved and published: raw scraped posts, LLM-encoded records, filtered records, and the normalized aggregation. The pipeline re-runs with different models, taxonomies, or additional peptides. Total compute: ~6 hours on a single consumer GPU running Ollama. Zero API fees.

This wasn't one query. Over 20.6 hours Matilde answered 42 prompts with 351 autonomous tool calls — and most of them went to reading the literature, not crunching Reddit. The full reasoning trace is published; this is its shape.

351

Tool calls

Human prompts

8.4

Calls / prompt

Distinct tools

20.6h

Span of work

Where the effort went

Every tool, by call count

What the shape tells you

Research dwarfs everything. 230 of 351 tool calls — two-thirds of the work — were literature and web research (190 searches, 40 extractions). Encoding the Reddit corpus was the easy part; the rigor was in checking every community signal against the published record.

It worked like an agent, not a query. 42 human prompts produced 351 tool calls — roughly 8 autonomous actions per ask — plus 56 exposed reasoning blocks and one delegate_task that fanned out a parallel literature researcher per peptide.

The heavy compute never touched a paid API. The 2,979 model inferences that encoded the corpus all ran on a local GPU (qwen3.5:9b via Ollama) at zero API cost. The agent loop ran on a frontier model; the grunt work ran in-house.

Read the raw trace

None of this is reconstructed after the fact. The complete working session — prompts, replies, tool calls, tool results, and reasoning — is published as a readable transcript and machine-readable JSONL, alongside the full dataset.