Thousands of people log what they took and what it did to them. We had a model read 2,979 of those reports on BPC-157, Selank and Semax — then checked the community's claims against the published science.
We analyzed 2,981 Reddit posts across three experimental peptides, encoding each through a local language model to find genuine self-use reports and extract their effects with directional valence — helped versus worsened. Then we asked a harder question: when the community says something works, does the literature agree?
After filtering pre-use questions and advice-seeking posts, 620 verified self-reports yielded 3,657 discrete effect mentions across BPC-157 (n=344), Semax (n=143), and Selank (n=125). We compared those community signals against 12 published studies — largely Russian clinical trials and preclinical work — and tagged each comparison as concordant, discordant, novel, or mixed.
+55 focus/cognition reports against just −8 negative — concordant with fMRI evidence of altered default-mode network in healthy volunteers (Lebedeva 2018) and direct BDNF upregulation in the basal forebrain (Dolotov 2006).
The community reports net anxiety worsening (+9 / −16) — directly against animal work showing "anxiolytic effects comparable to diazepam" (Yuan 2026). An animal-to-human gap worth investigating.
+12 helped vs −33 worsened for energy/fatigue, plus novel anhedonia reports (−11) — a neuropsychiatric burden the preclinical "healing peptide" framing never captured.
+30 anxiety relief, concordant with the GAD trial showing equivalence to medazepam (Zozulya 2008). But −22 worsening reports are absent from clinical data — likely dose and source variability in the wild.
−8 hair/skin reports with zero published precedent. MC1R expression in hair follicles (2023) and Semax's melanocortin activity give a plausible but unconfirmed pathway.
+154 pain/healing reports — the single strongest signal in the dataset — concordant with a systematic review finding 87% pain relief in the one human trial (Vukojevic 2025).
Self-selection inflates dramatic effects — people who felt nothing rarely post. We cannot control for dose, source purity, route, concurrent substances, or pre-existing conditions. The classifier (qwen3.5:9b, a 9-billion-parameter model) was not validated against human annotation for this domain; a 15-post spot check found 60% fully correct, 27% partially, 13% wrong, mostly over-attributing effects in multi-substance stacks. The literature itself leans on small Russian trials (n=40–62) with no placebo controls. Effect normalization into 17 categories loses granularity. And yet — the concordance between community signals and published mechanisms suggests this approach can surface real pharmacovigilance signals worth formal study.
(1) A formal human-annotation study to validate the classifier against expert labels. (2) Dose-response analysis by extracting dosing from posts. (3) Temporal tracking of users who post repeatedly. (4) Comment-level extraction — comments hold the richest back-and-forth. (5) Cross-platform replication on forums, Discord, Telegram. (6) A prospective community trial with standardized self-report instruments to attack selection and recall bias.
| Effect | BPC-157 + | BPC-157 − | Selank + | Selank − | Semax + | Semax − |
|---|
This proof-of-concept adapts the pharmacovigilance approach from Sehgal, Tronieri, Ungar & Guntuku (2026), "Self-Reported Side Effects of Semaglutide and Tirzepatide in Online Communities," arXiv:2603.12341 / medRxiv. DOI: 10.64898/2026.03.12.26348253
Helped means the user framed the effect as a benefit ("my anxiety went away"). Worsened means they framed it as adverse ("I developed anxiety"). It's directional: for Anxiety, +30 helped means 30 people said it relieved their anxiety; −22 worsened means 22 said it caused or increased it.
All artifacts are preserved and published: raw scraped posts, LLM-encoded records, filtered records, and the normalized aggregation. The pipeline re-runs with different models, taxonomies, or additional peptides. Total compute: ~6 hours on a single consumer GPU running Ollama. Zero API fees.
This wasn't one query. Over 20.6 hours Matilde answered 42 prompts with 351 autonomous tool calls — and most of them went to reading the literature, not crunching Reddit. The full reasoning trace is published; this is its shape.
Research dwarfs everything. 230 of 351 tool calls — two-thirds of the work — were literature and web research (190 searches, 40 extractions). Encoding the Reddit corpus was the easy part; the rigor was in checking every community signal against the published record.
It worked like an agent, not a query. 42 human prompts produced 351 tool calls — roughly 8 autonomous actions per ask — plus 56 exposed reasoning blocks and one delegate_task that fanned out a parallel literature researcher per peptide.
The heavy compute never touched a paid API. The 2,979 model inferences that encoded the corpus all ran on a local GPU (qwen3.5:9b via Ollama) at zero API cost. The agent loop ran on a frontier model; the grunt work ran in-house.
None of this is reconstructed after the fact. The complete working session — prompts, replies, tool calls, tool results, and reasoning — is published as a readable transcript and machine-readable JSONL, alongside the full dataset.