How cite8 verification works
"Verified by cite8" means a specific, reproducible thing. This page documents what cite8 checks, what it doesn't claim, and how anyone — a reader, a competitor, a search-engine crawler — can re-run the same audit against the same content and get the same result.
What "verified" means #
When a health-content article is "verified by cite8," it means cite8 took the article's markdown body, extracted every sentence carrying a peer-reviewed citation (PMID:NNNNN, a PubMed URL, or a DOI URL), resolved each citation against either the curated cite8 corpus or a live PubMed lookup, and ran the claim + the resolved source's abstract through a per-citation verifier. The verifier emits one of five verdicts for each pair:
An article "passes" cite8's editorial gate iff its per-citation audit contains zero contradicts verdicts, zero verdicts, and zero unresolved citations. The unrelated clause is what makes the gate meaningful for catching fabricated PMIDs — a citation that resolves to a real paper on an off-topic subject is the smoking gun for "wrong paper attached to this claim."
The pipeline #
PMID:NNNNN, a PubMed URL, or a DOI URL.
passing: true|false flag.
The verifier model is Claude Haiku 4.5. The generation model (used for the synthesis surface, not the verifier) is Claude Sonnet 4.6. Embeddings are Voyage voyage-3-large; reranking is Voyage rerank-2. Material changes to the verifier prompt or the passing rule are listed in the methodology changelog below.
Corpus scope and freshness #
Cite8's corpus is a curated PubMed slice, not an exhaustive index. The current corpus spans roughly 20,000 documents across:
- Musculoskeletal clinical practice — low back pain, neck pain, knee/hip/shoulder/ankle conditions, spinal manipulation, exercise therapy, manual therapies, electrotherapy, taping, dry needling, acupuncture, traction, McKenzie, and adjacent modalities (shockwave, laser/PBM, cupping, IASTM, foam rolling, percussive therapy).
- Clinical practice guidelines and clinical prediction rules — ACP, NICE, JOSPT, CCGPP, OPTIMa, plus the canonical MSK CPRs (Ottawa Ankle, Canadian C-Spine, STarT Back, etc.).
- Diagnostic and evaluation evidence — orthopedic special tests (Lachman, McMurray, Hawkins-Kennedy, Spurling, etc.) with sensitivity/specificity studies, imaging-appropriateness guidelines, red-flag screening literature.
- Behavioral and psychological interventions — cognitive behavioral therapy, motivational interviewing, biopsychosocial frameworks, pain neuroscience education, mindfulness-based approaches.
- Medicolegal expert-witness contexts — whiplash biomechanics, motor-vehicle accident injury science, causation methodology (Bradford Hill, differential diagnosis), permanent impairment rating, workers' compensation outcomes, functional capacity evaluation.
- Consumer / health-content topics — supplements (glucosamine, curcumin, omega-3, vitamin D, etc.), dietary patterns (Mediterranean, anti-inflammatory), sleep and pain, obesity and MSK, posture and ergonomics.
- Populations — pediatric, geriatric (falls, sarcopenia, osteoporosis), athletic (return-to-play criteria), and post-surgical rehabilitation.
Two corpus tags today: pubmed-msk (intentionally ingested through targeted MeSH-aware queries) and live-cache (papers live-fetched from PubMed during article verification and persisted for subsequent queries). Future medicolegal corpora — amaguides, nhtsa, optima — will be added as named corpus tags without changing the API contract. Each document also carries a domains[] array of slug-like tags (e.g. lbp, exercise, cbt, whiplash) for fine-grained filtering.
Scope matters for interpreting verdicts. A citation whose PMID isn't in the corpus is resolved via a live PubMed E-utilities lookup at verification time; the verdict is computed against that live-fetched abstract. A claim that's off-topic from the corpus's coverage entirely (e.g. an oncology-pharmacology question against the current MSK + behavioral + medicolegal corpus) will typically return unrelated verdicts — not because the underlying claim is wrong, but because the cited evidence isn't on-topic to the corpus's domain. An unrelated verdict means "wrong paper attached to this claim," not "this claim is false."
Freshness. The corpus is updated by on-demand ingest. Response headers on POST /api/v1/verify-* endpoints include X-Cite8-Corpus-As-Of — the timestamp of the most recently ingested document — so consumers can detect "this citation likely-stale" versus "this citation likely-fabricated." Articles verified before a corpus update can be re-verified against the current corpus at any time via the API.
What cite8 does not claim #
Verification is a structural check on cited evidence, not a statement about the article's overall correctness. Specifically:
📌 Cite8 does not certify that the article is true. A passing audit means every cited paper actually supports (or partially supports) the claim it's attached to. It does not mean the underlying claim is correct in absolute terms.
📌 Cite8 does not certify that the cited paper is the best available evidence. A more recent or more rigorous study may exist that wasn't cited. The gate doesn't search for contrary evidence not referenced in the article.
📌 Cite8 does not guarantee no contrary evidence exists. If the article cites studies that support a claim, and an opposing meta-analysis exists but isn't cited, cite8's per-citation gate won't surface that gap. (Multi-source synthesis is a future capability, not a current one.)
📌 Cite8 verifies abstracts, not full text. Technique-specific details that aren't in the abstract (precise dosing, instrumentation, sub-population effects) may not be checkable at all.
📌 The verifier is an LLM. It's not infallible. We publish eval results — see methodology results below.
📌 The corpus is curated, not exhaustive. Cite8's corpus is built from targeted PubMed ingests. Citations that fall outside the corpus get resolved via a live PubMed lookup, but topical breadth is constrained by what's been ingested.
Methodology results #
Cite8 maintains two published eval sets and reports results against each. Real numbers, no rounding for marketing.
MSK V1 — domain-coverage set
msk_v1.json — 20 hand-written clinical questions across the corpus's coverage with ground-truth abstention labels (does the question have sufficient corpus evidence to answer?). Last full run on the deployed verifier:
- Abstention precision: 1.00 (the verifier abstains exactly when it should)
- Abstention recall: 1.00 (no missed abstentions)
- Citation resolution rate: 1.00 (every
[N]marker in an answer resolves to a real PMID in the cited list)
These are precision/recall on the abstention decision, not "answer correctness." Per-question quality grades from a clinician's manual review are a separate metric tracked alongside.
Adversarial V1 — hand-crafted stress cases
adversarial_v1.json — 12 hand-crafted claims across five stress categories: recency reversal, positive-bias, real-topic-wrong-study, fabricated PMIDs, and evidence-of-absence reasoning. Last live run on the deployed verifier:
- 8 of 12 passing. Zero positive-bias
supportsfailures — the highest-concern miss mode (verifier credulously affirming a popular-but-unsupported claim) did not happen. - One prompt-issue failure on same-intervention/different-outcome claims (the
partial-vs-unrelateddistinction). Fix shipped in a subsequent verifier prompt update; expected 9/12 on re-run. - Two failures are corpus-bound (shockwave for plantar fasciitis, and vertebral-artery-stroke incidence) — the verifier honestly returned
unrelatedbecause the relevant evidence isn't in the deployed corpus. Resolvable by ingesting the relevant PubMed slices. - One failure is structurally hard for the V1 architecture (a positive-bias claim where the correct verdict requires reasoning about absence of evidence, which per-source verification can't do alone). Deferred to a future multi-source synthesis capability.
The breakdown matters: the 8/12 includes failures that are corpus-resolvable and one that's architecturally deferred — not failures where the verifier returned the wrong answer when it had the evidence it needed. The "no false supports on positive-bias claims" property is the headline result.
Reproducibility #
The integrity of "verified by cite8" rests on this property: any consumer of a verified article can re-run the same audit and get the same result. The verifier output is deterministic given a fixed model version + fixed input; cite8's API is openly callable with a key, so the re-verification path is available to any party who wants to audit a claim — a reader, a competitor, a publisher's QA team, a search-engine evaluator.
Re-verifying an article #
With a cite8 API key (request access via [email protected]):
curl -X POST https://cite8.dev/api/v1/verify-article \
-H "Authorization: Bearer cite8_sk_..." \
-H "Content-Type: application/json" \
-d '{"markdown": "<article body with embedded PMIDs>"}'
The response is a per-claim audit with the same verdicts the original verification produced. The audit shape, parameters, and field semantics are documented in the OpenAPI 3.1 spec.
For batch claim verification (without the markdown extraction step), see POST /api/v1/verify. For an editorial CI gate, partners integrate the API directly into their publishing pipeline; integration notes provided with key issuance.
Schema.org reference #
Health publishers integrating cite8 verification into their structured data should reference cite8's stable Organization @id:
{
"@context": "https://schema.org",
"@type": "MedicalWebPage",
"headline": "Article title",
"author": { "@type": "Person", "name": "Author Name" },
"reviewedBy": {
"@type": "Organization",
"@id": "https://cite8.dev/#org"
},
"lastReviewed": "2026-05-27"
}
The Organization entity itself is defined at cite8.dev/about. The @id is stable across hostname migrations and corporate changes — reference the @id, not the URL.
Methodology changelog #
Material changes to the verifier prompt, the passing rule, or the verdict taxonomy are versioned and listed below. Articles verified before a methodology change can be re-verified against the current methodology at any time via the API.
- 2026-05-26 — Passing rule tightened to require zero
unrelatedverdicts in addition to zero contradicts and zero unresolved. Catches fabricated/transcription-error PMIDs that the previous rule mis-passed. - 2026-05-26 — Verifier prompt refinement: same-intervention/different-outcome claims now correctly map to
partialrather thanunrelated. - 2026-05-26 — Verdict taxonomy split from
YES/NO/PARTIAL/UNKNOWNto the public five-waysupports / partial / contradicts / unrelated / unknown.