Skip to content

refactor: unify ProvenanceMetadata across ExtractedConcept and WikiFrontmatter#51

Merged
ethanj merged 2 commits intomainfrom
refactor/shared-provenance-metadata
May 1, 2026
Merged

refactor: unify ProvenanceMetadata across ExtractedConcept and WikiFrontmatter#51
ethanj merged 2 commits intomainfrom
refactor/shared-provenance-metadata

Conversation

@ethanj
Copy link
Copy Markdown
Contributor

@ethanj ethanj commented May 1, 2026

Second of four pre-0.6.0 audit-fix PRs.

What

ExtractedConcept and WikiFrontmatter independently re-declared the same four optional fields (confidence, provenanceState, contradictedBy, inferredParagraphs). They are intentionally a pipeline-boundary pair — extraction-time concept records vs. on-disk page frontmatter — but the lack of a shared shape meant any future field addition or rename risked drifting one surface from the other.

Define a single exported ProvenanceMetadata interface in src/utils/types.ts and compose both surfaces from it via interface extension. TypeScript erases the indirection at compile time, so the JSON shapes serialised on disk and over the LLM tool boundary stay byte-identical to the previous flat layout — pure refactor, no behaviour change.

Also drops the duplicate private ProvenanceMetadata interface that lived in src/utils/markdown.ts (used by parseProvenanceMetadata and the lint rules) and re-uses the canonical exported type instead, so the lint surface tracks the same shared shape.

Test plan

  • New compile-time pin in test/provenance-metadata-shape.test.ts asserts that both ExtractedConcept and WikiFrontmatter remain assignable to ProvenanceMetadata — future drift fails npx tsc --noEmit rather than silently re-creating the gap
  • npx tsc --noEmit clean
  • npm run build succeeds
  • npm test — 632 pass / 3 skipped (smoke), no regressions
  • npm run fallow:ci — 0 issues above threshold

Up next (remaining audit follow-ups)

  • Derive inferredParagraphs from rendered page bodies/citations rather than trusting extraction-time metadata
  • Dedupe checkSchemaCrossLinks / checkPageCrossLinks shared logic (lower priority)
  • Surface seed pages in generation.pages so downstream consumers know they changed (lower priority)

ethanj added 2 commits April 30, 2026 18:24
…ontmatter

Codex's post-merge schema-overlap audit flagged that ExtractedConcept
and WikiFrontmatter independently re-declared the same four optional
fields (confidence, provenanceState, contradictedBy,
inferredParagraphs). They are intentionally a pipeline-boundary pair
— extraction-time concept records vs. on-disk page frontmatter — but
the lack of a shared shape meant any future field addition or rename
risked drifting one surface from the other.

Define a single exported `ProvenanceMetadata` interface in
src/utils/types.ts and compose both surfaces from it via interface
extension. TypeScript erases the indirection at compile time, so the
JSON shapes serialised on disk and over the LLM tool boundary stay
byte-identical to the previous flat layout — pure refactor, no
behaviour change.

Also drops the duplicate private `ProvenanceMetadata` interface that
lived in src/utils/markdown.ts (used by parseProvenanceMetadata and
the lint rules) and re-uses the canonical exported type instead, so
the lint surface tracks the same shared shape.

A new compile-time pin in test/provenance-metadata-shape.test.ts
asserts that both ExtractedConcept and WikiFrontmatter remain
assignable to ProvenanceMetadata; future drift would fail
`npx tsc --noEmit` rather than silently re-creating the gap.
…-level pin

Codex review on PR #51 flagged two low-priority wording / strength
issues:

1. The ProvenanceMetadata jsdoc said "Composed (not extended) into …"
   but the implementation uses `interface … extends ProvenanceMetadata`.
   Comment now reads "Extended by … via `interface … extends
   ProvenanceMetadata`" so it matches the code.

2. The previous test assertion (`const x: ProvenanceMetadata =
   concept`) was weaker than its own comment claimed: every
   ProvenanceMetadata field is optional, so the assignment alone
   wouldn't have caught a future drop of a key from
   ExtractedConcept / WikiFrontmatter. Adds two type-level conditional
   assertions that compile only when every key on ProvenanceMetadata
   remains present on each of the consumer interfaces — a removal
   would break `npx tsc --noEmit`. Comment updated to match the new
   guarantee.
@ethanj ethanj merged commit 2409468 into main May 1, 2026
3 checks passed
@ethanj ethanj deleted the refactor/shared-provenance-metadata branch May 1, 2026 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant