fix(afdocs): mirror parity via HTML→md, shrink directive, fix broken links#5
Merged
fix(afdocs): mirror parity via HTML→md, shrink directive, fix broken links#5
Conversation
…; fix broken links Three targeted fixes for the remaining AFDocs failures (post PR #3, score 93/100): ## Markdown Content Parity (was 6/15 pages, avg 13% missing) The .md mirror was reading from source MDX with JSX intact, so pages with `<Tabs>`, `<TabItem>`, code-block titles, mermaid blocks, or custom JSX components emitted markdown that didn't match the rendered HTML. Five pages had >15% gaps (`platform/observability` 28%, `platform/scope` 24%, `platform/providers` 20%, `sdk/quickstart` 25%, `integrations/frameworks/vercel-ai-sdk` 16%). `scripts/mirror-markdown.mjs` now reads each route's rendered HTML from `build/<route>.html`, extracts the `<article>` body via cheerio, strips Docusaurus / OpenAPI plugin chrome (breadcrumbs, "On this page" TOC, copy buttons, hash-link anchors, edit-this-page link, pagination nav), and converts to markdown via turndown + `turndown-plugin-gfm` for proper pipe-table rendering. A custom rule preserves fenced code blocks with language hints by reading `<code class="language-X">` `textContent` (so Prism's per-token `<span>` wrappers collapse cleanly back to source). By character count the mirrors now match the rendered HTML article within ±2% for most pages and within ±11% for the rest (the residual deltas are mostly newline / whitespace differences inside code blocks). The 28% gap on `platform/observability` is gone. OpenAPI operation pages and the rolled-up API info page still render from the vendored OpenAPI YAML directly — those rendered cleanly already and the OpenAPI plugin's HTML is heavy with `<MethodEndpoint>` / `<ParamsDetails>` JSX that doesn't round-trip nicely through turndown. New devDeps: `turndown`, `turndown-plugin-gfm`, `cheerio`, `@types/turndown`. (Removed: `node-html-parser` — cheerio handles Docusaurus's HTML output more reliably.) ## Content Start Position (was 7/15 pages past 50%, worst 77%) The directive blockquote was 12 words ("Machine-readable index: [llms.txt] · [llms-full.txt] · [skill.md]") which pushed the substantive content past the 50% mark on short pages. Compressed to 4 words ("Agent index: [llms.txt]") — still satisfies the AFDocs "blockquote near the top pointing at llms.txt" requirement, but takes up much less above-the-fold room on short pages. ## LLMS Full Links Resolve Two real broken markdown links in the corpus: 1. `https://github.com/atomicmemory/atomicmemory-integrations/tree/main/plugins/cursor` — 404 because the cursor plugin folder hasn't been added to the integrations repo yet. Both occurrences (`docs/integrations/ overview.md`, `docs/integrations/coding-agents/cursor.md`) now point at the integrations repo top-level, which exists, with prose that still tells readers where the plugin will live. 2. `https://openai.com/index/codex/` (in `docs/integrations/ coding-agents/codex.md`) — 403 because openai.com bot-blocks HEAD requests. Replaced with `https://github.com/openai/codex`, the actual repo for the OpenAI Codex CLI the docs page is about. More accurate target (the docs talk about the CLI, not the marketing page), and the GitHub URL doesn't bot-block. The previously-flagged URLs that have trailing punctuation (`'http://localhost:8000',`, `https://*.atomicmem.ai`, `https://core.example.com'`) are inside fenced code blocks. Those aren't markdown links — a strict link checker doesn't follow them, and replacing illustrative URLs in code samples would be lossy. ## Out of scope for this PR - Content Negotiation (`Accept: text/markdown`): GH Pages still cannot honor the header. Tracked as F1 (hosting migration). - MCP Server Discoverable: needs a hosted HTTP MCP endpoint. Tracked as F2.
Round-2 follow-up from Codex review of PR #5: code blocks in the generated `.md` mirrors were collapsing onto one line because Prism represents line breaks as HTML structure (each line wrapped in a `<span class="token-line">`), not as `\n` characters in text content. `codeEl.textContent` therefore returned all tokens concatenated with no newline boundaries. Fix: when the `<code>` element contains `.token-line` children, walk them and join with `\n`. Falls back to `textContent` for code blocks without Prism markup. Also pull the language class from both `<pre>` and `<code>` (Docusaurus puts it on either depending on config). Verified: `build/sdk/quickstart.md` now emits properly fenced multi-line bash and typescript blocks; npm run build + typecheck clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Round 2 of AFDocs fixes (93 → ≥97 expected)
Overview
Post-PR #3 the live AFDocs score is 93/100 (grade A) with 6 remaining failures. This PR lands fixes for the 3 we can address without infra changes; the other 3 are deferred follow-ups (one of which — Content Negotiation — was always known to be infra-dependent).
Key Features
📝 Markdown mirror rendered from built HTML
scripts/mirror-markdown.mjsswitches from source-MDX-strip to HTML→markdown via cheerio + turndown + turndown-plugin-gfm.build/<route>.html, extracts the<article>body, strips Docusaurus chrome (breadcrumbs, "On this page" TOC, copy buttons, edit-this-page, pagination nav, hash-link anchors), then runs turndown.<code class="language-X">textContentso Prism's per-token<span>wrappers collapse back to plain source.✂️ Directive compressed
src/remark/llms-directive.mjsshrinks the per-page directive blockquote from 12 words ("Machine-readable index: [llms.txt] · [llms-full.txt] · [skill.md]") to 4 ("Agent index: [llms.txt]"). Still satisfies the AFDocs "blockquote near the top pointing at llms.txt" requirement, but takes much less above-the-fold room on short pages.🔗 Two real broken links fixed
docs/integrations/overview.mdanddocs/integrations/coding-agents/cursor.md:plugins/cursorfolder doesn't exist yet on the integrations repo. Both link targets now point at the integrations repo top-level with prose telling readers where the plugin will live.docs/integrations/coding-agents/codex.md:https://openai.com/index/codex/was returning 403 (CDN bot-blocks HEAD). Replaced withhttps://github.com/openai/codex— more accurate target since the docs page is about the CLI, not the marketing page — and the GitHub URL doesn't bot-block.Implementation Details
Modified Files
scripts/mirror-markdown.mjs— HTML→markdown pipeline, chrome-stripping selectors, custom code-block rulesrc/remark/llms-directive.mjs— compact directivedocs/integrations/overview.md,docs/integrations/coding-agents/{cursor,codex}.md— link fixespackage.json+package-lock.json— new devDeps (turndown,turndown-plugin-gfm,cheerio,@types/turndown); removednode-html-parserCode Quality
Metrics
Testing
npm run buildsucceeds withonBrokenLinks: 'throw'intact, no.api.mdxworktree driftnpm run typecheckcleanllms.txt,llms-full.txt,skill.md,.well-known/mcp.json).mdmirror files (68 routes × 2 URL shapes)Out of scope (deferred follow-ups)
docs.atomicmemory.aito Cloudflare Pages / Vercel / Netlify; add edge function forAccept: text/markdownrewriting/foo→/foo.md. Closes Content Negotiation.@atomicmemory/mcp-serverover Streamable HTTP atmcp.atomicmemory.ai; updatemcp.jsontransporttohttp. Closes any strict reading of MCP Server Discoverable.🤖 Generated with Claude Code