Skip to content

feat: Support tracking issue impact over time (#156)#177

Open
duncanleo wants to merge 102 commits intomainfrom
duncanleo/data-overhaul
Open

feat: Support tracking issue impact over time (#156)#177
duncanleo wants to merge 102 commits intomainfrom
duncanleo/data-overhaul

Conversation

@duncanleo
Copy link
Copy Markdown
Member

@duncanleo duncanleo commented Mar 21, 2026

Summary

Overhauls mrtdown-data into the canonical reviewed data repository for MRTDown, with typed packages, deterministic file-backed tooling, and a migrated append-only issue dataset.

What Changed

  • Migrated canonical data into the new data/ layout:
    • static entities under data/{station,line,service,operator,town,landmark}
    • issues under data/issue/YYYY/MM/<issue_id>/
    • append-only evidence.ndjson and impact.ndjson per issue
  • Introduced the monorepo package structure:
    • @mrtdown/core for schemas and shared period/state helpers
    • @mrtdown/fs for file-backed repositories and writers
    • @mrtdown/cli for creation, validation, manifest, listing, show, and repair tooling
    • @mrtdown/triage for LLM-assisted evidence triage and replay utilities
  • Removed the old API/database runtime from this repo; runtime serving and Postgres import now belong in mrtdown-site.
  • Added GitHub Pages publishing for generated manifests and downloadable data archives.
  • Added Changesets/npm publishing workflow for shared packages.
  • Added architecture docs describing the canonical-data vs runtime-data split and the two-repo mrtdown-data / mrtdown-site model.
  • Replayed and normalized issue impact data, including repairs for empty impacts, degraded-service extraction, recurring maintenance periods, and legacy issue normalization.

Breaking Changes

  • This repo no longer ships the old Hono/API/DuckDB runtime.
  • Consumers should use the new packages and/or the generated GitHub Pages data artifacts instead of importing from the previous src/api, src/db, or legacy schema paths.

Review Notes

This is a large structural migration. The most important review areas are:

  • package boundaries and exports
  • canonical data layout compatibility with mrtdown-site
  • CLI validation behavior
  • generated manifest/archive workflow
  • issue impact replay correctness

Validation

TODO before merge:

  • npm ci
  • npm run build
  • npm run test
  • npm run cli -- -- validate
  • Confirm Pages artifact generation with npm run cli -- -- manifest and npm run cli -- -- pages-index

Fixes #156

@duncanleo duncanleo self-assigned this Mar 21, 2026
@duncanleo duncanleo changed the title feat: Support tracking issue impact over time feat: Support tracking issue impact over time (#156) Mar 21, 2026
@duncanleo duncanleo force-pushed the duncanleo/data-overhaul branch 13 times, most recently from ca15f1a to 965bee3 Compare March 22, 2026 16:43
@duncanleo duncanleo force-pushed the duncanleo/data-overhaul branch from 0b0ff52 to 78cdc9d Compare March 23, 2026 16:07
@duncanleo duncanleo force-pushed the duncanleo/data-overhaul branch from 2e64332 to 46a53bb Compare April 4, 2026 10:27
duncanleo and others added 26 commits May 2, 2026 21:43
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Add one-off collapse tooling that clusters maintenance/infra issues by slug, type, and overlapping or adjacent time ranges, then merges into deterministic canonical issues. Apply the merge to current duplicate groups and persist audit artifacts for traceability.

Made-with: Cursor
Drop the temporary collapse scripts and generated reports now that the one-off maintenance/infra merge has been applied to the data files.

Made-with: Cursor
Update multilingual render text and translation source metadata for evidence rows that were rewritten from verifiable source articles, and keep deduped entries consistent with the revised text.

Made-with: Cursor
Ensure service-impact claims without time hints are anchored to the evidence timestamp so impact events are not dropped, while still allowing maintenance/infra metadata-only cause updates to persist without requiring service periods.

Made-with: Cursor
Add a deterministic relative-date tool with RRULE-style weekday inputs and tighten extraction guidance so planned weekend constraints become explicit timed claims instead of being dropped as advisory text. Also bump the OpenAI SDK to the latest workspace version.

Made-with: Cursor
Persist regenerated impact events from the latest empty-impact replay run, including recovered weekend maintenance windows and related issue updates that now produce concrete events.

Made-with: Cursor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Delete the three CCL no-mobile-signals issue bundles to simplify the current cleanup pass and reduce unresolved issue validation failures.

Made-with: Cursor
Strengthen extraction guidance for single-platform degraded-service evidence, replay the affected bucket 2/3 issues, and update the temporary validation report to reflect the reduced failure count and remaining buckets.

Made-with: Cursor
Remove the temporary validation findings report from version control so it stays local-only and out of committed changes.

Made-with: Cursor
Prevent `mrtdown-cli show` from crashing when derived service/facility state omits a `causes` array by defaulting to an empty list before rendering.

Made-with: Cursor
Capture the source article's station-by-station rollout schedule in evidence and align impact periods so facilities are modeled in phased windows rather than one concurrent multi-year window.

Made-with: Cursor
@duncanleo
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

const DATA_DIR = resolve(import.meta.dirname, '../../../../data');

P1 Badge Point ingestContent to the repository data directory

ingestContent resolves DATA_DIR with four .. segments from packages/triage/src/util/ingestContent, which lands at <repo>/packages/data instead of <repo>/data; that directory does not exist in this repo, so the first repository read (via FileStore.listDir in triage flow) will throw ENOENT and break webhook ingestion. This is production-impacting because ingestViaWebhook imports and executes this code path for every incoming message.


'impact.scopeItems.serviceId',
'impact.scopeItems.stationId',
'impact.scopeItems.fromStationId',
'impact.scopeItems.toStationId',

P2 Badge Query against actual impact event fields in issue search

The Fuse search keys target impact.scopeItems.*, but IssueRepository.get() stores impact data under impactEvents with event-specific fields (serviceScopes, entity.serviceId, etc.), so these clauses never match and searches by affected services/stations silently fail. This degrades FindIssuesTool matching and can cause triage to miss existing related issues.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support tracking impact over time

1 participant