You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Brandon's contribution. Write the LLM prompt that powers the extractor module (M4).
The prompt receives:
A transcript chunk (or full episode — your call on which is better)
The known vocabulary: list of teacher names + slugs (from data/teachers/*.json) and tradition names + slugs (from data/traditions/*.mdx)
An alias table you maintain (e.g., "Joko" → joko-beck, "HHDL" → dalai-lama-14, "Luang Por" → ajahn-chah, "Adya" → adyashanti)
It returns structured JSON:
typeExtractedMention={entity_type: 'teacher'|'tradition'|'concept';entity_slug: string;// canonical slug; concept mentions can use a free-form slugsurface_form: string;// the exact text matchedconfidence: number;// 0..1}
The vocabulary IS the moat. Your domain knowledge of how teachers are referenced in dharma talks (nicknames, honorifics, transliteration variants) is what separates this from a generic NER pass. No amount of model upgrades will fix a bad prompt or missing alias table.
Acceptance criteria
buildExtractionPrompt function exists and is unit-tested for shape
aliases.json has at least 20 entries
docs/podcast/extraction.md records the posture and known limitations
labeled-chunks.json has 20 hand-labeled chunks for evaluation
One end-to-end smoke test feeds a labeled chunk through Claude and prints the result for manual review
Dependencies
None — can be drafted in parallel; #309 needs it to ship
What to build
Brandon's contribution. Write the LLM prompt that powers the
extractormodule (M4).The prompt receives:
data/teachers/*.json) and tradition names + slugs (fromdata/traditions/*.mdx)joko-beck, "HHDL" →dalai-lama-14, "Luang Por" →ajahn-chah, "Adya" →adyashanti)It returns structured JSON:
Deliverables:
src/lib/extract/prompt.tsexportsbuildExtractionPrompt(chunk, vocabulary, aliases).data/podcasts/aliases.json— your curated alias table for things you know about.docs/podcast/extraction.mdexplaining the precision/recall posture (we want ≥90% precision, accept lower recall).tests/fixtures/labeled-chunks.json) so Implement entity-extraction pipeline (writes to mentions table) #309 can score the prompt against ground truth.Why it matters
The vocabulary IS the moat. Your domain knowledge of how teachers are referenced in dharma talks (nicknames, honorifics, transliteration variants) is what separates this from a generic NER pass. No amount of model upgrades will fix a bad prompt or missing alias table.
Acceptance criteria
buildExtractionPromptfunction exists and is unit-tested for shapealiases.jsonhas at least 20 entriesdocs/podcast/extraction.mdrecords the posture and known limitationslabeled-chunks.jsonhas 20 hand-labeled chunks for evaluationDependencies
None — can be drafted in parallel; #309 needs it to ship