Skip to content

fix(afdocs): add llms.txt directive to OpenAPI mirrors; drop mcp.json from llms.txt#6

Merged
ethanj merged 1 commit intomainfrom
docs/afdocs-round3
May 3, 2026
Merged

fix(afdocs): add llms.txt directive to OpenAPI mirrors; drop mcp.json from llms.txt#6
ethanj merged 1 commit intomainfrom
docs/afdocs-round3

Conversation

@ethanj
Copy link
Copy Markdown
Contributor

@ethanj ethanj commented May 2, 2026

Round 3 of AFDocs fixes — directive coverage

Overview

Real AFDocs scorecard against the live site (after PR #5) shows 93/100 (A) with 4 remaining issues. This PR clears the 2 directive-coverage warnings — quick wins that close low-hanging gaps. The other 2 (content-start-position, markdown-content-parity residual, content-negotiation) need deeper work and are explicitly out of scope below.

Check Before This PR
llms-txt-directive-md WARN 28/49 (21 missing) All mirrors now have the directive
llms-txt-directive-html WARN 49/50 (1 missing) False positive removed (mcp.json was treated as a doc page)
content-start-position FAIL 16/50, worst 69% Out of scope — DOM swizzle
markdown-content-parity FAIL 9/49, avg 6% Out of scope — residual is code-block whitespace
content-negotiation FAIL Out of scope — F1 hosting migration

Changes

Add directive to OpenAPI mirrors

scripts/mirror-markdown.mjsrenderOpenapiOperation and renderOpenapiInfo now emit > Agent index: [llms.txt](/llms.txt) as the first blockquote (same wording the remark plugin injects on hand-authored pages). The "this page mirrors operation X" note moves to a regular paragraph below.

OpenAPI .md mirrors render from the vendored YAML, so the remark plugin's directive injection (which runs on MDX) never reached them. ~30 API operation pages plus the rolled-up info page were missing the directive in their .md form.

Drop .well-known/mcp.json from llms.txt

scripts/build-llms-txt.mjs — the "Optional" section listed .well-known/mcp.json as a markdown link, so AFDocs walked it as a doc page and flagged it for missing the HTML directive. Now a non-link prose line in llms.txt mentions the descriptor exists; agents that care will look at the well-known path directly.

Out of scope

content-start-position (16/50, worst 69%)

Almost entirely OpenAPI reference pages. Diagnosis:

DOM order:
  <nav>     at  3818  (navbar)
  <aside>   at  7889  (sidebar — lists ALL ~30 API operations)
  <main>    at 14569
  <article> at 14724  (page body)

For short operation pages (e.g. GET /v1/memories/{id}) the sidebar's text content outweighs the article body, so the first meaningful content element lands past 50% in the converted text.

Fix requires a Docusaurus theme swizzle of the docs layout to render <article> before <aside> in DOM, with CSS preserving visual sidebar-on-left layout. That's a separate, larger change touching theme components.

markdown-content-parity (9/49, avg 6%, max 42%)

PR #5 already brought avg 13% → 6% by switching to HTML→md rendering with cheerio + turndown + GFM. The residual gap is dominated by whitespace tokenization in code blocks (turndown joins Prism .token-line children with \n but inline tokens don't always have spaces between them, so word-level diff over-counts gaps).

Further wins from here need either turndown rule tuning per code-language, or theme-level data-markdown-ignore attributes on elements that exist only in HTML. Bigger investment than the round-3 directive fixes.

content-negotiation (FAIL)

GH Pages cannot honor Accept: text/markdown. F1 follow-up: migrate hosting to Cloudflare Pages / Vercel / Netlify with an edge function for the rewrite.

Verification

  • npm run build clean
  • npm run typecheck clean
  • All 4 AFDocs static artifacts present
  • 7/7 platform pages + 31/31 API ref pages + 1 home page have the directive in HTML
  • All 31 API ref .md mirrors now lead with > Agent index: [llms.txt](/llms.txt)
  • llms.txt "Optional" section no longer links .well-known/mcp.json

🤖 Generated with Claude Code

… from llms.txt

Clears two AFDocs warnings flagged by `npx afdocs check
https://docs.atomicmemory.ai` against PR #5 (round-2):

## llms-txt-directive-md (was 28/49 pages, 21 missing)

OpenAPI page mirrors are rendered from the vendored OpenAPI YAML in
`scripts/mirror-markdown.mjs`, not from rendered HTML, so the remark
plugin's directive injection never reached them. Their existing
"Machine-readable: this page mirrors operation X" blockquote also
didn't satisfy the AFDocs spec, which requires a blockquote that
links to llms.txt.

Both `renderOpenapiOperation` and `renderOpenapiInfo` now emit
`> Agent index: [llms.txt](/llms.txt)` as their first blockquote
(matching what the remark plugin injects on hand-authored pages),
and move the operation/info source-of-truth note to a regular
paragraph below.

## llms-txt-directive-html (was 49/50 pages, 1 missing)

The "missing" page was `/.well-known/mcp.json` — a JSON file that
AFDocs followed because llms.txt linked it as a documentation entry.
JSON files don't have an HTML directive blockquote, so listing it as
a doc link was a false positive against the directive check.

`build-llms-txt.mjs` no longer emits `.well-known/mcp.json` as a
markdown link in the Optional section. Discovery for the MCP
descriptor is via the well-known path itself; agents that care will
look at `/.well-known/mcp.json` directly. A short non-link line in
llms.txt still tells humans/agents the descriptor exists.

## Out of scope

- content-start-position (16 of 50 pages past 50%, all OpenAPI ref
  pages): the DOM has `<aside>` (sidebar) before `<article>`, and
  for short OpenAPI operation pages the sidebar text outweighs the
  article body — pushing the first content element past 50% in the
  HTML→text conversion. Real fix is a Docusaurus theme swizzle to
  reorder DOM (CSS keeps visual layout). Not in this PR.
- markdown-content-parity (~9 of 49 pages, avg 6% missing,
  improvement from PR #5's avg 13%): residual gap is mostly
  whitespace tokenization differences in code blocks; the spec
  threshold is 5% pass / 20% warn / >20% fail. Real wins from here
  require either turndown rule tuning or `data-markdown-ignore`
  annotations on theme elements. Not in this PR.
- content-negotiation (FAIL): GH Pages cannot honor
  `Accept: text/markdown`. Tracked as F1 (hosting migration).
ethanj added a commit that referenced this pull request May 3, 2026
…in mirror

Three blockers caught in PR review against the AFDocs work merged in
PR #5. Each lands with a focused fix.

## Blocker 1: AFDocs runtime deps removed from package.json

`6a97c0b` dropped `cheerio`, `turndown`, `turndown-plugin-gfm`,
`js-yaml`, `@types/js-yaml`, `@types/turndown` from devDependencies.
Those are required by `scripts/mirror-markdown.mjs`, so `npm run build`
fails with `Cannot find module 'turndown'` from the
`llms-and-mirror-plugin.mjs` postBuild path.

Restored all six in devDependencies (matching origin/main's pinned
versions).

## Blocker 2: prebuild/prestart auto-regen reintroduced

`6a97c0b` re-added `prestart`/`prebuild` hooks that run
`write:docs-mode && regen:api`. PR #5's commit `08c5312` explicitly
removed `regen:api` from those hooks because it dirtied the 29
committed `.api.mdx` files on every build (non-deterministic
re-encoding of the compressed `api:` blob). Verified the regression by
running `npm run build` on this branch and seeing 6 .api.mdx files
modified in the worktree afterward.

Kept `write:docs-mode` (it's needed for the new docs-mode flag) but
dropped the `regen:api` chain. Spec refresh stays the explicit
`vendor:spec → regen:api → commit` flow that
`scripts/vendor-core-spec.mjs:14` already documents.

## Blocker 3: custom Heading swizzle leaked into the markdown mirror

The new `<Heading>` component (`src/theme/Heading/index.tsx`) wraps
each heading's text in an anchor link with a `#` icon span — a copy-
to-clipboard affordance for humans. The wrapping anchor uses
CSS-module class names (`headingLink_no4V`, `headingIcon_Pk3T`) which
the mirror's noise selector strips list (which targets the default
`a.hash-link`) doesn't match. Result: every heading in every mirror
rendered as

    # [#Observability](#observability "Copy link to Observability")

instead of

    # Observability

Two-part fix:

1. `src/theme/Heading/index.tsx` — add `data-markdown-ignore` to the
   `#` icon span. AFDocs treats this attribute as the spec-compliant
   marker for HTML-only content that should not appear in markdown
   parity comparison; tooling that converts the page to markdown
   should also strip it.

2. `scripts/mirror-markdown.mjs` —
   - Add `[data-markdown-ignore]` to `ARTICLE_NOISE_SELECTORS` so
     the icon span is removed before turndown runs (defense for
     other tooling that might emit the same attribute).
   - Add a `clean-heading-text` turndown rule that intercepts
     `<h1>`–`<h6>` and emits clean `# Title` from `node.textContent`
     (with leading `#` chars stripped). This works for both default
     Docusaurus headings and the swizzled component, so the mirror
     no longer carries the wrapping anchor link as part of the
     heading text.

Verified: `npm run build` produces clean `# Observability` /
`## The summary shapes` in `build/platform/observability.md`.

## Verification

- `npm run typecheck`: pass
- `npm run build`: pass; worktree has 0 .api.mdx files dirtied
  (down from 6 before this commit)
- 7/7 platform pages + 31/31 API reference pages have the llms.txt
  directive in HTML
- All four AFDocs artifacts present: `llms.txt`, `llms-full.txt`,
  `skill.md`, `.well-known/mcp.json`
- Sample mirror (`build/platform/observability.md`) shows clean
  ATX headings with no anchor-link leak

## Note for the round-3 PR

PR #6 also touches `scripts/mirror-markdown.mjs` and
`scripts/build-llms-txt.mjs`. The two PRs don't overlap on the same
lines, but whichever lands second will need a small rebase.
@ethanj ethanj merged commit 88c283f into main May 3, 2026
@ethanj ethanj deleted the docs/afdocs-round3 branch May 3, 2026 07:17
ethanj added a commit that referenced this pull request May 3, 2026
…in mirror

Three blockers caught in PR review against the AFDocs work merged in
PR #5. Each lands with a focused fix.

## Blocker 1: AFDocs runtime deps removed from package.json

`6a97c0b` dropped `cheerio`, `turndown`, `turndown-plugin-gfm`,
`js-yaml`, `@types/js-yaml`, `@types/turndown` from devDependencies.
Those are required by `scripts/mirror-markdown.mjs`, so `npm run build`
fails with `Cannot find module 'turndown'` from the
`llms-and-mirror-plugin.mjs` postBuild path.

Restored all six in devDependencies (matching origin/main's pinned
versions).

## Blocker 2: prebuild/prestart auto-regen reintroduced

`6a97c0b` re-added `prestart`/`prebuild` hooks that run
`write:docs-mode && regen:api`. PR #5's commit `08c5312` explicitly
removed `regen:api` from those hooks because it dirtied the 29
committed `.api.mdx` files on every build (non-deterministic
re-encoding of the compressed `api:` blob). Verified the regression by
running `npm run build` on this branch and seeing 6 .api.mdx files
modified in the worktree afterward.

Kept `write:docs-mode` (it's needed for the new docs-mode flag) but
dropped the `regen:api` chain. Spec refresh stays the explicit
`vendor:spec → regen:api → commit` flow that
`scripts/vendor-core-spec.mjs:14` already documents.

## Blocker 3: custom Heading swizzle leaked into the markdown mirror

The new `<Heading>` component (`src/theme/Heading/index.tsx`) wraps
each heading's text in an anchor link with a `#` icon span — a copy-
to-clipboard affordance for humans. The wrapping anchor uses
CSS-module class names (`headingLink_no4V`, `headingIcon_Pk3T`) which
the mirror's noise selector strips list (which targets the default
`a.hash-link`) doesn't match. Result: every heading in every mirror
rendered as

    # [#Observability](#observability "Copy link to Observability")

instead of

    # Observability

Two-part fix:

1. `src/theme/Heading/index.tsx` — add `data-markdown-ignore` to the
   `#` icon span. AFDocs treats this attribute as the spec-compliant
   marker for HTML-only content that should not appear in markdown
   parity comparison; tooling that converts the page to markdown
   should also strip it.

2. `scripts/mirror-markdown.mjs` —
   - Add `[data-markdown-ignore]` to `ARTICLE_NOISE_SELECTORS` so
     the icon span is removed before turndown runs (defense for
     other tooling that might emit the same attribute).
   - Add a `clean-heading-text` turndown rule that intercepts
     `<h1>`–`<h6>` and emits clean `# Title` from `node.textContent`
     (with leading `#` chars stripped). This works for both default
     Docusaurus headings and the swizzled component, so the mirror
     no longer carries the wrapping anchor link as part of the
     heading text.

Verified: `npm run build` produces clean `# Observability` /
`## The summary shapes` in `build/platform/observability.md`.

## Verification

- `npm run typecheck`: pass
- `npm run build`: pass; worktree has 0 .api.mdx files dirtied
  (down from 6 before this commit)
- 7/7 platform pages + 31/31 API reference pages have the llms.txt
  directive in HTML
- All four AFDocs artifacts present: `llms.txt`, `llms-full.txt`,
  `skill.md`, `.well-known/mcp.json`
- Sample mirror (`build/platform/observability.md`) shows clean
  ATX headings with no anchor-link leak

## Note for the round-3 PR

PR #6 also touches `scripts/mirror-markdown.mjs` and
`scripts/build-llms-txt.mjs`. The two PRs don't overlap on the same
lines, but whichever lands second will need a small rebase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant