KB Build Note
This directory is a wiki-style synthesis layer over the canonical by-photo corpus. It is not a replacement for the corpus.
The corpus is data/processed/markdown/by-photo/, 447 direct-transcription
Markdown files (one per source photo). The KB exists to make those 447
pages discoverable, navigable, and cross-linked without summarising away
their detail.
What This KB Adds
| layer | path | role |
|---|---|---|
| Index / sidebar | index.md, _sidebar.md | Landing page and navigation. |
| Maps | maps/ | Whole-corpus indexes (document-outline, nav-path-index, evidence-map, pilot-source-coverage) and topic-specific planning maps (alpha-synuclein-source-boundary). |
| Sections | sections/ | Topic-cluster aggregator pages. Each section lists every source assigned to it, grouped by nav_path root. |
| Source catalog | maps/source-catalog.md | Flat catalog of all 447 source notes in chronological capture order. |
| Source notes | sources/<stem>.md | One stub per by-photo file, carrying provenance fields (page label, nav path, headings, uncertain spans). |
| Topics | topics/ | Narrative synthesis pages. GBA-PD pilot range: gba-pd, gba-therapeutics, biomarkers. Whole-section / cross-section syntheses: parkin, inflammation, mitochondria, biomarkers-outcomes, pet-imaging, alpha-synuclein (Tier 1; see boundary map for Tier 2 / Tier 3 delegation), therapeutic-programs (program-routing map). |
| Per-nav indexes | topics/by-nav/ | 154 generated indexes, one per first-level nav_path value. Complements sections/: section pages aggregate by topic cluster, by-nav pages aggregate by exact Word heading. |
| Entities | entities/compounds/, entities/programs/ | Per-entity pages. Compounds: eliglustat, ambroxol, venglustat. Programs: pr001, parkn-gt (PFR-4249-100), nlrp3-inhibitor (Marianthi, PFR-4231-100). |
| Templates | templates/ | Boilerplate for new topic pages. |
| Log | log.md | KB build log. |
Build Rules
- Canonical content stays in by-photo Markdown. KB pages link back; they do not copy table content verbatim and they do not paraphrase it. When a table or figure matters, link the by-photo file and let the reader open it.
- No new operator narration. Section pages, source notes, and the
catalog only state what is observable from front matter (
page_label,nav_path,source_headings,related_photos,quality_metrics) or the by-photo body’s own headings. They never describe the photo, the work, or the worker. - Provenance is required. Every section row, every source-note entry,
and every catalog row links to a by-photo file via
data/processed/markdown/by-photo/<stem>.md. - No raw photo staging.
data/raw/photos/is gitignored. KB pages may reference the raw-photo path as text metadata, but never copy the.jpginto the asset folder or commit it. - No image re-embedding. KB pages do not re-embed figure assets; the
embedded-image policy lives at the by-photo level
(
docs/decisions/2026-04-29-body-purity-and-figure-only-embeds.md). - Section assignment is heuristic, not authoritative. Sources are
bucketed into a single primary section based on their
nav_pathroot and selected keywords. The fullnav_pathis preserved in the source note and inmaps/nav-path-index.mdso a reader can disambiguate. - Synthesis is opt-in.
topics/andentities/pages contain narrative synthesis only when a human or future review has actually read across the sources. New synthesis follows the template attemplates/topic.mdand keeps source links next to claims.
Section Catalogue
Eighteen sections cover all 447 sources. The mapping is in
_sidebar.md. Boundaries follow the document’s own nav_path clusters
rather than externally imposed taxonomy:
gba-pd-asyn(198) - the dominantPipeline of GD & GBA-PDarc and its α-synuclein supplement (animal models, antibodies, postmortem, propagation, biobanks/CEI).parkin(46) - Parkin protein / PD / pS65-Ub, PARKN GT (PFR-4249-100), GAPFREE3, PINK-1.inflammation(42) - Pipeline of Inflammation, NLRP3, pyroptosis, CAPS, Complement / C5aR1, Havrda, In Vivo strategy (Katy), 4 LPS.biomarkers-outcomes(27) -[BIOMARKER]validation/qualification, clinical scales (UPDRS, MoCA, H&Y, SCOPA-AUT, RBD/RBDQ), NFL, SILK, retina, synaptic change.mitochondria(19) - mtDNA, mitophagy, 31P MRS, MC1 PET, structure / Complex I / MAM, MEG / metabolomics / MIBG, assessment summary.molecular-biology(18) -[MOLECULAR BIOLOGY]/[Protein], proteomics, transcriptome, omics, assays of protein.pet-imaging(16) - PET / tracer / DATscan / neuromelanin / 7T MRI / VMAT-2, immunoPET, PET for astrocyte.operations(14) - FY budgets, KPI-linked projects, milestones, workflow, sharefolder organisation, reactome safety, phospholipidosis.pk-gt-pharmacology(11) -[PK]/[PHARMACOLOGY]/[GT], AAV / capsid / promoter, ICM / route of administration, life-cycle.clinical-pd(9) - diagnosis of PD, prodromal PD, psychosis, dyskinesia, Pipeline of PD overview.samples-collaborations(9) - MJF / brain banks / biobanks, NDU, P2P, Burton/Greenamyre/Pittsburgh labs, shipment, secondment.genetics-pathway(9) - pathogenicity of variant, GWAS, PRS, eQTL, pathway analysis, genetic testing.msa(9) - Diagnosis / outcome measures (UMSARS), aSyn in MSA, pathology, Pipeline MSA.other-mechanisms(7) - 기타 MOA들 (TREM2, TAU, TDP43, TMEM, σ1R, TRAP1, UPS, PGRN).lysosome-autophagy(5) - lysosomal enzymes, macro / micro / CMA, TRPML1, Niemann-Pick / NPC.lrrk2(4) - DNL201 / DNL151, other LRRK2 pipeline.cgas-cgamp(2) - cGAS / cGAMP / AGS / senescence.microglia-imaging(2) - microglial imaging / TSPO.
Maintenance Rules
- When the canonical by-photo file is edited, the source note in
sources/<stem>.mdshould be checked fornav_path/ heading drift. - When new section or topic pages are added, update
_sidebar.mdandindex.md. - Do not add
evidence_images_not_embeddedpaths or helper crops here; the corpus-only baseline rule is indocs/decisions/2026-05-01-audit-status.md. - Do not add operator narration to KB pages either. The body-purity rule applies upstream (in by-photo Markdown), but the KB inherits the same spirit: report observable provenance, don’t editorialise.
References
- Workflow:
docs/workflow/direct-transcription.md - Body-purity decision:
docs/decisions/2026-04-29-body-purity-and-figure-only-embeds.md - Audit status / corpus-only baseline (also defines the Uncertain Spans retention policy that KB pages preserve as review targets):
docs/decisions/2026-05-01-audit-status.md - KB wiki v1 status / audit note (scope, completed inventory, verification snapshot, remaining follow-up):
docs/decisions/2026-05-03-kb-wiki-v1-status.md - Repo guide:
AGENTS.md