Skip to content

fix(review): run provenance lint against compile --review candidates#50

Merged
ethanj merged 2 commits intomainfrom
fix/review-provenance-lint
May 1, 2026
Merged

fix(review): run provenance lint against compile --review candidates#50
ethanj merged 2 commits intomainfrom
fix/review-provenance-lint

Conversation

@ethanj
Copy link
Copy Markdown
Contributor

@ethanj ethanj commented May 1, 2026

First of four PRs bundling the post-merge audit fixes ahead of 0.6.0.

The bug

`compile --review` previously attached only schema-cross-link violations to candidates. Citation-level lint (malformed claim citations, broken-source / out-of-bounds line spans) only fired on the post-promotion compile, so reviewers approved candidates without seeing these issues — they showed up later as fresh findings on the next normal compile.

The fix

Refactored the citation rules in `src/linter/rules.ts` to expose pure-body helpers that the existing on-disk lint walker now calls into:

  • `checkPageMalformedCitations(content, filePath) → LintResult[]`
  • `checkPageBrokenCitations(content, filePath, sourcesDir, cache?) → Promise<LintResult[]>`

`persistReviewCandidate` (`src/compiler/index.ts`) runs both helpers plus the existing `checkPageCrossLinks` against the in-memory candidate body and persists the citation findings on a new `ReviewCandidate.provenanceViolations` field. `review show` surfaces them in a "Provenance violations" block parallel to the existing "Schema violations" block. Field is omitted when no findings exist, so existing on-disk candidates round-trip unchanged.

Test plan

  • Unit: per-page lint helpers detect malformed claim citations and broken / out-of-bounds spans on in-memory bodies
  • Unit: `writeCandidate` persists/omits `provenanceViolations`, `readCandidate` round-trips them
  • Unit: `review show` prints / suppresses the Provenance violations block based on candidate state
  • aimock CLI integration: `compile --review` with stubbed body containing malformed citations / missing-source citation produces a candidate JSON whose `provenanceViolations` matches the expected rule, and a clean body produces no `provenanceViolations` field
  • `npx tsc --noEmit` clean
  • `npm run build` succeeds
  • `npm test` — 630 pass / 3 skipped (smoke), no failures
  • `npm run fallow:ci` — 0 issues above threshold

Up next (other audit follow-ups)

  • Shared `ProvenanceMetadata` interface across `ExtractedConcept` and `WikiFrontmatter` to prevent drift
  • Derive `inferredParagraphs` from rendered page bodies/citations rather than trusting extraction-time metadata
  • Dedupe `checkSchemaCrossLinks` / `checkPageCrossLinks` shared logic (lower priority)
  • Surface seed pages in `generation.pages` so downstream consumers know they changed (lower priority)

ethanj added 2 commits April 30, 2026 18:06
The compile --review path used to attach only schema-cross-link
violations to candidates. Citation-level lint (malformed claim
citations, broken-source / out-of-bounds line spans) only ran on the
post-promotion compile, so reviewers approved candidates without
seeing these issues — they showed up later as fresh findings on a
subsequent compile.

Refactor the citation rules in src/linter/rules.ts to expose pure-body
helpers that the on-disk lint walker calls into:
  - checkPageMalformedCitations(content, filePath) → LintResult[]
  - checkPageBrokenCitations(content, filePath, sourcesDir, cache?) →
    Promise<LintResult[]>

persistReviewCandidate (src/compiler/index.ts) now runs both helpers
plus the existing checkPageCrossLinks against the in-memory candidate
body and persists the citation findings on a new
ReviewCandidate.provenanceViolations field. `review show` surfaces
them in a "Provenance violations" block parallel to the existing
"Schema violations" block. Field is omitted when no findings exist,
so existing on-disk candidates round-trip unchanged.

Tests:
  - Unit: per-page lint helpers detect malformed claim citations and
    broken/out-of-bounds spans on in-memory bodies.
  - Unit: writeCandidate persists / omits provenanceViolations,
    readCandidate round-trips them.
  - Unit: review show prints / suppresses the Provenance violations
    block based on whether the candidate has them.
  - aimock CLI integration: compile --review with stubbed body
    containing malformed citations / missing-source citation produces
    a candidate JSON whose provenanceViolations matches the expected
    rule, and a clean body produces no provenanceViolations field.

Addresses the review-mode lint gap identified in the post-merge
schema-overlap audit.
CI's auto-changed-since fallow flagged two clone groups my local pass
missed:

  - 12-line captureShowOutput helper duplicated between
    test/schema-violations.test.ts and test/provenance-violations.test.ts.
    Hoisted into test/fixtures/review-show-helpers.ts.
  - The "stub aimock + makeWorkspace + runCLI compile --review +
    readOnlyCandidate" boilerplate triplicated across the three
    integration tests. Hoisted into compileReviewWithStubbedBody().

Also addresses the codex low-priority finding that
`expect(candidate.body).toContain("")` was a no-op assertion. Replaced
with `expect(candidate.body).toContain("Body without any citation
markers")` so the stubbed body content actually round-tripping through
the candidate write is verified.
@ethanj ethanj merged commit c485145 into main May 1, 2026
3 checks passed
@ethanj ethanj deleted the fix/review-provenance-lint branch May 1, 2026 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant