Skip to content

feat(afdocs): make docs.atomicmemory.ai agent-ready (llms.txt, mirror, skill.md, MCP)#3

Open
ethanj wants to merge 3 commits intomainfrom
docs/agent-ready-afdocs
Open

feat(afdocs): make docs.atomicmemory.ai agent-ready (llms.txt, mirror, skill.md, MCP)#3
ethanj wants to merge 3 commits intomainfrom
docs/agent-ready-afdocs

Conversation

@ethanj
Copy link
Copy Markdown
Contributor

@ethanj ethanj commented May 1, 2026

Make docs.atomicmemory.ai agent-ready (AFDocs)

Overview

The docs site at https://docs.atomicmemory.ai currently scores 59/100 (grade F) against the AFDocs agent-readiness checks (https://afdocs.dev) — only 7 of 29 checks pass. This PR lands seven of the eight failing checks while staying on GitHub Pages. The remaining failure (Accept: text/markdown content negotiation) is genuinely impossible without moving off GH Pages and is split out as follow-up #1. A second follow-up covers hosting the MCP server at a live HTTP endpoint if AFDocs interprets "MCP Server Discoverable" strictly.

Target post-merge score: ≥85/100 when AFDocs runs against https://docs.atomicmemory.ai.

Key Features

🤖 Per-page llms.txt directive

  • Remark plugin (src/remark/llms-directive.mjs) injects a blockquote linking to /llms.txt, /llms-full.txt, /skill.md immediately after the first H1. Handles both markdown headings and the OpenAPI plugin's <Heading as="h1"> JSX. Idempotent across HMR/multi-pass builds.
  • Absolute URLs sidestep Docusaurus's onBrokenLinks: 'throw' (which validates against route metadata, not postBuild artifacts). siteUrl passed in as an explicit option since Docusaurus doesn't auto-inject siteConfig into remark plugins. URLs constructed via new URL() to prevent trailing-slash double-slashes.

📑 llms.txt + llms-full.txt

  • build/llms.txt (≈12 KB) — spec-compliant per llmstxt.org. H1 + blockquote summary + H2 sections (Get started / Platform / SDK / Integrations / API Reference) + bulleted markdown links with one-sentence descriptions.
  • build/llms-full.txt (≈243 KB, ~6.2k lines) — full corpus, one entry per route. Driven by the canonical-mirror map so the <route>.md / <route>/index.md mirror pair never produces duplicate corpus entries.
  • Sections derived from a hand-maintained URL-prefix → label map (small, stable, doesn't depend on internals).

📝 Markdown URL mirror

  • scripts/mirror-markdown.mjs writes both build/<path>.md and build/<path>/index.md for every route. Hand-authored pages copy the MDX source (front-matter / import / MDX-comment lines stripped). API reference pages render from vendor/atomicmemory-core-openapi.yaml directly (joined to slugs via kebabCase(operationId)) so the mirror carries real parameters, request body, and response shape rather than stripped JSX.

🛠️ Agent skill + MCP descriptor

  • static/skill.md — Mintlify-style operating guide for agents reading the docs (when to read, how to navigate via /llms.txt / /llms-full.txt / .md URL convention, citation guidance).
  • static/.well-known/mcp.json + static/mcp.json — MCP descriptor stub for the local-install @atomicmemory/mcp-server. transport.type: "stdio", status: "local-install-only", hostedEndpoint: null pending follow-up docs: add Cursor integration guide #2.

✏️ Content Start Position fixes

  • Moved > Disambiguation blockquotes on docs/platform/providers.md and docs/sdk/concepts/provider-model.md to a ## Naming section near the bottom of each page.
  • Removed lede-blocking MDX authoring comments ({/* … */}) from introduction.md, platform/architecture.md, platform/composition.md, platform/observability.md, platform/scope.md, platform/stores.md.

Build pipeline integration

prebuild (regen:api)
  → docusaurus build
       ├── beforeDefaultRemarkPlugins:[llms-directive]   (per-page directive)
       ├── allContentLoaded({ allContent })             (custom plugin captures docs map)
       ├── HTML emission
       └── postBuild plugin:
              1. mirror-markdown.mjs    (build/<route>.md + build/<route>/index.md)
              2. build-llms-txt.mjs     (build/llms.txt + build/llms-full.txt)
              3. build-skill-md.mjs     (verify static/skill.md landed in build/)

GH Actions workflow needs no changes.

Implementation Details

New Files

  • src/remark/llms-directive.mjs — directive injection remark plugin
  • src/plugins/llms-and-mirror-plugin.mjs — Docusaurus plugin shell that captures docs content via allContentLoaded and dispatches generators in postBuild
  • scripts/mirror-markdown.mjs — markdown URL mirror generator
  • scripts/build-llms-txt.mjs — llms.txt + llms-full.txt generator
  • scripts/build-skill-md.mjs — skill.md guard
  • static/skill.md, static/.well-known/mcp.json, static/mcp.json

Modified Files

  • docusaurus.config.ts — register the custom plugin + remark plugin; centralize SITE_URL / BASE_URL constants
  • package.json + package-lock.json — add js-yaml + @types/js-yaml (devDeps)
  • 8 docs pages — Content Start Position editorial fixes

Code Quality

Metrics

  • Files Changed: 19
  • Insertions: +876 lines
  • Deletions: -84 lines
  • All scripts < 400 LOC, all functions < 40 LOC (per workspace rules)

Workspace rules observed

  • No timing-based solutions
  • No fallback values (build fails loud if a required option is missing)
  • No silent error catching (mirror operations propagate failures)
  • Markdown / config files exempt from the 400-line code limit

Testing

  • npm run build succeeds with onBrokenLinks: 'throw' intact
  • npm run typecheck clean
  • 7/7 platform pages contain the directive
  • 31/31 API reference pages contain the directive
  • build/llms.txt is spec-compliant (H1 + blockquote + H2 sections)
  • build/llms-full.txt is 243 KB, 6167 lines, one entry per route
  • build/.well-known/mcp.json parses as valid JSON
  • 136 .md mirror files (68 routes × 2 URL shapes)

Verification expectations

Local build verifies artifact correctness only: presence of files, directive in HTML, mirror routes, valid JSON, shape of llms.txt. The directive emits production absolute URLs (https://docs.atomicmemory.ai/llms.txt) regardless of where the build is served, so AFDocs scoring on a local serve is not authoritative — AFDocs would compare directive host with served host.

Authoritative AFDocs scoring runs against the live site after merge:

npx @afdocs/cli check https://docs.atomicmemory.ai

Expected residual failures:

Follow-ups (separate scope, not this PR)

  • F1 — Hosting migration: move docs.atomicmemory.ai to Cloudflare Pages or Vercel; add edge function for Accept: text/markdown rewriting /foo/foo.md. Closes Content Negotiation.
  • F2 — Hosted MCP: deploy @atomicmemory/mcp-server over Streamable HTTP at https://mcp.atomicmemory.ai; update mcp.json transport to http with the live URL. Closes any strict reading of MCP Server Discoverable.

Reviewer note

The plan went through five rounds of Codex review before approval; the resulting decisions are documented in inline code comments where they're load-bearing (broken-link-checker workaround, source vs sourceFilePath field disambiguation, siteUrl injection contract, new URL() vs string-concat for slash normalization, slug ↔ operationId join key).

🤖 Generated with Claude Code

ethanj added 3 commits May 1, 2026 01:58
…iptor

Make docs.atomicmemory.ai agent-ready against the AFDocs checks
(https://afdocs.dev). The site previously scored 59/100 (grade F);
this PR lands seven of the eight failing checks. The remaining
failure (Accept: text/markdown content negotiation) requires moving
off GitHub Pages and is tracked as a follow-up.

What landed:

- **Per-page llms.txt directive** via remark plugin
  (src/remark/llms-directive.mjs). Inserts a blockquote with absolute
  URLs to /llms.txt, /llms-full.txt, /skill.md right after the first
  H1. Handles both markdown headings and the OpenAPI plugin's
  `<Heading as="h1">` JSX. Idempotent. Absolute URLs sidestep
  Docusaurus's onBrokenLinks: 'throw' (which validates against route
  metadata, not postBuild artifacts). siteUrl is passed in as an
  explicit option since Docusaurus does not auto-inject siteConfig
  into remark plugins; URLs constructed via new URL() to avoid
  trailing-slash double-slashes.

- **Custom Docusaurus plugin** (src/plugins/llms-and-mirror-plugin.mjs)
  that captures docs-plugin content via allContentLoaded and dispatches
  three generators in postBuild.

- **Markdown URL mirror** (scripts/mirror-markdown.mjs) — for each
  route writes both build/<path>.md and build/<path>/index.md.
  Hand-authored pages copy from the MDX source (front-matter / import
  / MDX comment lines stripped). API reference pages render directly
  from vendor/atomicmemory-core-openapi.yaml (joined to slugs via
  kebabCase(operationId)) so the mirrored markdown carries real
  parameters, request body, and response shape rather than stripped
  JSX.

- **llms.txt and llms-full.txt** (scripts/build-llms-txt.mjs) — index
  is grouped by a hand-maintained URL-prefix → H2-label map, and
  llms-full is driven from the canonical-mirror map (one entry per
  route — never walks the filesystem, so the
  <route>.md / <route>/index.md mirror pair never produces duplicate
  corpus entries).

- **skill.md guard** (scripts/build-skill-md.mjs) — verifies the
  hand-authored static/skill.md landed in build/.

- **Static artifacts** — static/skill.md (agent operating guide),
  static/.well-known/mcp.json + static/mcp.json (MCP descriptor stub
  for the local-install @atomicmemory/mcp-server; transport.type:
  stdio, hostedEndpoint: null pending follow-up infra work).

- **Editorial fixes** for content-start-position offenders — moved
  Disambiguation blockquotes on docs/platform/providers.md and
  docs/sdk/concepts/provider-model.md to a `## Naming` section near
  the bottom; removed lede-blocking MDX authoring comments from
  introduction.md, platform/architecture.md, platform/composition.md,
  platform/observability.md, platform/scope.md, and platform/stores.md.

Build pipeline order:
prebuild (regen:api) → docusaurus build → beforeDefaultRemarkPlugins
[llms-directive] → allContentLoaded (capture docs map) → HTML → postBuild
{mirror-markdown, build-llms-txt, build-skill-md}.

Out of scope (deferred):
- Accept: text/markdown content negotiation — needs Cloudflare Pages /
  Vercel / Netlify with edge logic.
- Hosted HTTP MCP endpoint — needs deploying @atomicmemory/mcp-server
  over Streamable HTTP. mcp.json declares status: local-install-only
  for now.

Verification:
- npm run build succeeds with onBrokenLinks: 'throw' intact
- npm run typecheck clean
- 7/7 platform pages and 31/31 API reference pages have the directive
- build/llms.txt is spec-compliant (H1 + blockquote + H2 sections)
- build/llms-full.txt = 243KB, 6167 lines, one entry per route
- build/.well-known/mcp.json parses
- 136 .md mirror files (68 routes × 2 URL shapes)
… at /index.md

Two follow-ups from Codex review on PR #3:

1. `atomicmemory-http-api.md` (the rolled-up API overview page) was
   falling through to readSourceMdx() which leaves JSX intact. The
   source `.info.mdx` is almost entirely <span> / <Heading> / <div>
   JSX, producing a useless mirror. Detect `*.info.mdx` source files
   and render directly from `spec.info` in the vendored OpenAPI YAML
   (title, version, description, license, optional contact).

2. The Introduction entry in llms.txt linked to the bare site URL
   (https://docs.atomicmemory.ai/) instead of the markdown mirror.
   Since content negotiation is deferred, that entry would still
   require HTML scraping. Point it at /index.md so every llms.txt
   bullet is consistently a `.md` link.
Codex flagged that `npm run build` was dirtying the 29 committed
generated `*.api.mdx` files because `prebuild` ran `regen:api` on
every build. The diff was only the compressed `api:` frontmatter blob
(non-deterministic re-encoding), but it broke build reproducibility:
running the documented build path against a clean tree always left
the worktree modified.

The committed `.api.mdx` files are intended artifacts — refreshed
explicitly via `npm run vendor:spec` + `npm run regen:api` whenever
the upstream OpenAPI spec changes (documented in scripts/vendor-core-
spec.mjs:14, "Next: run 'npm run regen:api' and commit the refreshed
.mdx artifacts"). Drop the auto-regen hooks so `npm run build` and
`npm run start` run only Docusaurus, leaving committed sources alone.

Spec-refresh workflow stays the same:
  1. npm install @atomicmemory/atomicmemory-core@<version>
  2. npm run vendor:spec
  3. npm run regen:api
  4. Commit the refreshed `vendor/atomicmemory-core-openapi.yaml` +
     `docs/api-reference/http/*.api.mdx` together.

Verification: a second `npm run build` against a clean tree leaves
the worktree clean (only this `package.json` change dirty against
HEAD). The directive still lands on all 31 API reference pages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant