Skip to content

docs: migrate documentation from MkDocs to Fern#581

Open
lbliii wants to merge 10 commits intomainfrom
lbliii/ctf
Open

docs: migrate documentation from MkDocs to Fern#581
lbliii wants to merge 10 commits intomainfrom
lbliii/ctf

Conversation

@lbliii
Copy link
Copy Markdown

@lbliii lbliii commented Apr 28, 2026

📋 Summary

Adds a Fern Docs build under fern/ alongside the existing MkDocs site. Production target is docs.nvidia.com/nemo/datadesigner with a floating-latest pointer (latest.yml symlink) at v0.5.8. All concept, recipe, plugin, dev-note, and tutorial pages are migrated to MDX with the NVIDIA theme; tutorial notebooks render via <NotebookViewer> with captured outputs (text, DataFrames, inline images).

🔄 Changes

✨ Added

  • fern/ Fern Docs build: docs.yml, fern.config.json, NVIDIA theme CSS, version-pinned nav (v0.5.8.yml), 64 MDX pages organized into Concepts, Tutorials, Recipes, Plugins, Code Reference, Dev Notes, plus landing.
  • Floating-latest version layout: versions/latest.yml is a Unix symlink to v0.5.8.yml; same MDX backs both slug: latest and slug: v0.5.8. New releases re-target the symlink (recipe in fern/README.md).
  • Custom MDX components: Authors, BadgeLinks, CustomCard, CustomFooter, ExpandableCode, MetricsTable, NotebookViewer, Tag, TrajectoryViewer — plus paired CSS in fern/styles/.
  • Dev Notes blog kit: 11-author registry (.authors.yml + typed authors-data.ts), each post now uses <Authors ids={[...]} />; deep-research-trajectories renders 31 turns / 50 tool calls via <TrajectoryViewer> driven by typed example data; benchmark tables in text-to-sql/rqa/structured-outputs-from-nemotron/search-agent use <MetricsTable> with auto best-value highlighting.
  • Notebook docs pipeline: fern/scripts/ipynb-to-fern-json.py converts .ipynbfern/components/notebooks/*.{json,ts}, auto-stripping the leading Colab badge cell; new make targets generate-fern-notebooks and generate-fern-notebooks-with-outputs drive the .py → executed .ipynb → Fern JSON+TS chain.
  • Python API reference via Fern libraries: pointing at packages/data-designer-config/src/data_designer/config; output (fern/code-reference/) is gitignored and regenerated locally with fern docs md generate.
  • fern/README.md maintainer guide: prereqs, first-time setup, versioning recipe, folder layout.

🔧 Changed

  • Makefile docs section pins to Python 3.13 via DOCS_PYTHON ?= 3.13 (default Python untouched) so pyarrow resolves wheels on machines running Python 3.14+; certifi-routed CA bundle (DOCS_CERTS) fixes SSL on python.org-installer Pythons; convert-execute-notebooks loops per-file with || failed=... so one notebook's missing API key doesn't kill the chain; generate-fern-notebooks reads from docs/notebooks/ (executed, outputs preserved) when present, falling back to docs/colab_notebooks/ otherwise.
  • .gitignore adds fern/code-reference/ (Fern libraries output).
  • docs/colab_notebooks/*.ipynb regenerated through convert-execute-notebooksgenerate-colab-notebooks so they reflect the latest source.

🧪 Testing

  • fern check passes (0 errors, 1 unrelated theme contrast warning)
  • make generate-fern-notebooks runs idempotently end-to-end; auto-detects docs/notebooks/ source
  • make generate-fern-notebooks-with-outputs produces real LLM outputs for tutorials 1–4 (12.3k captured cells across DataFrame HTML / plain text / inline image MIME types); 5–6 fall back to existing snapshots without OPENROUTER_API_KEY
  • fern docs dev boots locally; spot-checked landing, Dev Notes index, tutorials, recipes, deep-research-trajectories
  • make test not run (docs-only change; no Python source touched)

✅ Checklist

  • Follows commit message conventions (docs: prefix, imperative, ≤72 char subject)
  • Commits are signed off (DCO)
  • Architecture docs updated — N/A, docs-only

🤖 Generated with Claude Code

Adds a Fern Docs build under fern/ alongside the existing mkdocs site.
Production target docs.nvidia.com/nemo/datadesigner with floating-latest
pointer (latest.yml symlink) at v0.5.8. Migrated all concept, recipe, plugin,
dev-note, and tutorial pages to MDX with NVIDIA theme and custom components
(Authors, MetricsTable, TrajectoryViewer, NotebookViewer, BadgeLinks).
Tutorial notebooks now render via NotebookViewer with captured outputs (text,
DataFrames, inline images) - new make targets generate-fern-notebooks and
generate-fern-notebooks-with-outputs drive the .py -> executed .ipynb -> Fern
JSON+TS pipeline, pinning docs to Python 3.13 to dodge pyarrow wheel issues
on 3.14. Python API reference is configured via Fern libraries: pointing at
data-designer-config; output is gitignored and regenerated locally with
'fern docs md generate'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
@lbliii lbliii requested a review from a team as a code owner April 28, 2026 18:51
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 28, 2026

Thank you for your submission! We ask that you all sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by adding a comment below using this text:


I have read the DCO document and I hereby sign the DCO.


1 out of 2 committers have signed the DCO.
✅ (kirit93)[https://github.com/kirit93]
@lbliii
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the DCO Assistant Lite bot.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 28, 2026

Docs preview: https://be7a8a93.dd-docs-preview.pages.dev

Notebook tutorials are placeholder-only in previews.

@github-actions
Copy link
Copy Markdown
Contributor

PR #581 Review — docs: migrate documentation from MkDocs to Fern

Summary

This PR stands up a parallel Fern Docs build under fern/ alongside the existing MkDocs site, targeting docs.nvidia.com/nemo/datadesigner. It migrates all concept/recipe/plugin/dev-note/tutorial pages to MDX, adds 10 custom MDX components (rendered in Fern's sandbox), ships a notebook→Fern JSON converter, and pins the docs toolchain to Python 3.13 so pyarrow wheels resolve. The change is docs-only: no touches to packages/ or src/. 195 files changed, the great majority auto-generated (MDX pages, notebook JSON/TS pairs, static assets) — so the review focuses on the new machinery (Makefile, converter script, TSX components, Fern config) rather than the migrated prose content.

Scope is large but well-bounded. The diff is too large for gh pr diff to return in a single pass, so this review inspects the critical non-asset files and samples the rest.

Findings

Correctness & bugs

  • fern/versions/_nav_order.yml points at a non-existent directory. All 82 entries reference ./versions/latest/pages/..., but the directory on disk is versions/v0.5.8/pages/... (with versions/latest.yml being a symlink, not a directory). Either this file is stale metadata from a migration script and should be removed, or the paths need to be rewritten to v0.5.8/pages/.... No other config (docs.yml, v0.5.8.yml) appears to consume it, which suggests it's dead. Either way it's confusing — please delete it or document what reads it.

  • fern/components/BadgeLinks.tsx still contains scaffold placeholders (your-org/your-repo, your-package). If this component is actually rendered on a page, it will ship the placeholders to production; if it isn't used, delete the component so nothing picks it up accidentally. grep for <BadgeLinks across fern/versions/ before merging.

  • NotebookViewer.tsx custom markdown renderer is fragile. The \^@BR\^@ sentinel in renderMarkdown (components/NotebookViewer.tsx:222-226) will collide if any notebook markdown cell literally contains that string. It also doesn't handle fenced code blocks, tables, or nested lists. Notebook markdown is usually simple, so this is a latent risk rather than a current break — but the failure mode (silent corruption) is unpleasant. Consider swapping for a proper markdown lib (Fern's bundler almost certainly ships one) or narrowing the collision risk with a more distinctive sentinel like a escape.

  • Include.tsx also hand-rolls a markdown parser (lines 624+). Two independent mini-parsers in one PR is a maintenance smell — if a bug shows up in one, you'll fix it twice. Same recommendation: adopt a tested library.

  • ExpandableCode.tsx copy-button handler captures btn.textContent across an async setTimeout. If React re-renders the button during the 1500 ms window (e.g. route change), restoring orig can produce stale content. Minor, but easy to fix by using a local ref or deriving the original text from props instead of reading the live node.

  • fern/versions/latest.yml is a Unix symlink (target v0.5.8.yml, no trailing newline). Windows developers cloning the repo without core.symlinks=true will see a plain text file containing the string v0.5.8.yml rather than a resolved symlink, and Fern will fail with a confusing error. This is likely acceptable given that Fern/docs contributors all work on macOS/Linux, but the fern/README.md versioning section should at least call out the Windows caveat.

Security

  • NotebookViewer.tsx injects output cells of format: "html" via dangerouslySetInnerHTML without sanitization (components/NotebookViewer.tsx:493-497). Those are pandas DataFrame HTML reprs today, which is safe — but the source is whatever runs in docs/notebook_source/*.py at pipeline time. If a future notebook ever renders LLM-generated HTML or user-controlled content, this is an XSS sink. Since fern/components/notebooks/*.json is checked in, review happens at commit time, so this is a soft constraint rather than a live vulnerability. Worth a comment in the converter documenting the trust model.

  • Include.tsx markdown renderer escapes &<>" but not ' (line 514-520). Given content comes from build-time imports of MDX files in this repo, it's fine today — but the component is documented as reusable (“Copy to your repo's …”), so callers who hand it untrusted content will get partial escaping. Consider including ' or documenting the trusted-input assumption.

  • No secrets or tokens appear in the diff. The Adobe DTM launch-*.min.js script loaded via docs.yml is the standard NVIDIA analytics bundle — expected.

Makefile / build pipeline

  • DOCS_PYTHON ?= 3.13 and the DOCS_CERTS certifi routing are well-commented and clearly motivated. The per-notebook loop with || failed=... is a real UX improvement over the prior "one failure kills the chain" behavior.

  • Cleanup runs unconditionally after the loop: rm -rf docs/notebook_source/artifacts and rm -f docs/notebook_source/*.csv (Makefile:498-499). If no notebooks ran at all (say, uv failed upstream), this still fires — harmless given the paths, but the mv docs/notebook_source/*.ipynb … line before it (Makefile:497) will silently do nothing on total failure and leave behind any partially-executed files in docs/notebook_source/. Minor; current behavior is probably fine.

  • generate-fern-notebooks shells out to the converter in a loop inside a single @... recipe. If any single invocation fails, make will stop — but the auto-detection (if ls docs/notebooks/*.ipynb) fails silently to colab_notebooks/ on any non-zero exit, not just "no files." Consider -f docs/notebooks/1-the-basics.ipynb for a more specific probe.

fern/scripts/ipynb-to-fern-json.py

  • Clean, typed, SPDX-headered, respects the project's from __future__ import annotations convention. Good.

  • Manual sys.argv parsing (if "-o" in args: idx = args.index("-o")) instead of argparse. Trivial, but argparse would give you --help, error messages, and type-checking for free.

  • No tests. The converter has real logic (colab-badge detection regex, output-type dispatch, HTML/plain-text selection). A couple of unit tests against a fixture .ipynb would pay for themselves the next time someone tweaks extract_outputs. Not blocking for a docs PR, but worth a follow-up.

  • get_language treats language: "python3" as "python" but any other value passes through verbatim ("julia-1.9""julia-1.9"). If Pygments doesn't have a lexer for the unnormalized name, highlight_code returns None and the fallback renders unescaped source into dangerouslySetInnerHTML… wait, no: it's source_html ?? escapeHtml(cell.source), so the fallback is escaped. Good.

Fern config (docs.yml, fern.config.json)

  • Redirect rules are thorough (Sphinx index.html → slug, foo.htmlfoo, both versioned and unversioned). Order is correct — :path*/index.html must precede :path*.html, which it does.

  • versions: registers both latest and v0.5.8 slugs that point to the same content through the symlink. Nice trick, well documented in fern/README.md.

  • fern.config.json pins fern-api@4.106.0. Pinning is good for reproducibility; make sure CI (or whoever publishes) uses the same version.

Style / minor

  • SPDX headers on NotebookViewer.tsx, Authors.tsx, CustomCard.tsx, etc. say (c) 2024 NVIDIA but new files created in 2026 — project convention seems to accept either, but worth aligning.

  • Tag.tsx uses React.ReactNode in its signature without a type-only import; other components in the same directory do import type { ReactNode } from "react". Inconsistent. Either works under Fern's JSX runtime, but pick one.

  • Many components inline style={{ padding: "0.75rem 1rem", ... }} objects for error states (IncludeError, NotebookViewerError). Those would be better in styles/ CSS files alongside the other styling, matching the pattern already established for non-error cells.

  • ExpandableCode.tsx and NotebookViewer.tsx both implement copy-to-clipboard with near-identical logic. Pull into a shared useCopyButton helper or component once a third caller appears.

  • fern/components/CustomCard.tsx uses Tailwind utility classes (block p-6 rounded-lg border …). This assumes Fern's bundler ships Tailwind. It apparently does, but worth confirming via fern docs dev that these render as intended, since other components in the PR use plain class names + CSS files.

Test coverage

  • Testing section in the PR body is honest: fern check passes; make generate-fern-notebooks* runs idempotently; fern docs dev renders. make test appropriately skipped (no Python runtime code touched). Good.

Verdict

Approve with follow-ups. The migration is well-structured, the build tooling is solid, and the custom components are scoped, typed, and commented. The main blockers to clean up before merge:

  1. Resolve fern/versions/_nav_order.yml — delete it or fix the latest/pages vs v0.5.8/pages discrepancy.
  2. Confirm BadgeLinks.tsx is either used (with real values substituted) or removed.
  3. Document or narrow the XSS trust boundary on NotebookViewer's HTML output-cell rendering.

Nothing in this PR touches production Python code, so the blast radius of issues is confined to the docs site. Ship it after (1) is resolved; (2) and (3) can be fast-follows.

lbliii and others added 2 commits April 28, 2026 14:56
Captures the patterns established in the Fern migration so agents (and humans)
can maintain fern/ confidently. Modeled after NVIDIA-NeMo/Gym's
nemo-gym-docs SKILL.md, adapted for our floating-latest versioning,
notebook-with-outputs pipeline, dev-notes kit components, and the MDX gotchas
hit during migration (pymdown attr_list, --8<-- snippet syntax, frontmatter
authors-as-JSX-scope-variable, etc.). Routes triggers like "edit docs", "add
doc page", "regenerate notebooks", "update dev note", "add API reference" to
this skill.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
- Delete stale fern/versions/_nav_order.yml (references non-existent
  ./versions/latest/pages/ — paths were never updated when latest/ was
  renamed to v0.5.8/, no consumer found in docs.yml or v0.5.8.yml).
- Remove unused custom components: Tag.tsx, CustomCard.tsx, Include.tsx
  (had its own untested markdown parser), ExpandableCode.tsx (broken in
  Fern SSR runtime). Drop expandable-code.css from docs.yml. Authors,
  BadgeLinks, MetricsTable, NotebookViewer, TrajectoryViewer remain
  (each has at least one call site).
- BadgeLinks: remove DEFAULT_BADGES with placeholder URLs; make `badges`
  prop required so we can never accidentally ship 'your-org/your-repo'.
- NotebookViewer: document the XSS trust boundary on output cells of
  format: "html". Outputs flow .py source → jupytext --execute → committed
  *.ts (review boundary). Add an inline comment at the dangerouslySetInnerHTML
  call site pointing back to the trust-model section.
- README: add Windows caveat on the latest.yml symlink — Windows users need
  core.symlinks=true before clone or Fern will reject the version config.
- Makefile: tighten generate-fern-notebooks source probe from `ls .../*.ipynb`
  (which can return success on non-file errors) to `[ -f docs/notebooks/1-the-basics.ipynb ]`,
  matching the reviewer's suggestion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Copy link
Copy Markdown

@aschilling-nv aschilling-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General Fern feedback

Comment thread fern/docs.yml Outdated
Comment thread fern/docs.yml Outdated
Comment thread fern/docs.yml
Comment on lines +42 to +44
experimental:
mdx-components:
- ./components
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
experimental:
mdx-components:
- ./components
experimental:
mdx-components:
- ./components
basepath-aware: true

Three suggestions from the Fern review, all matching Curator's docs.yml
conventions:

- instances[0].url: drop the https:// protocol prefix to match Curator's
  shape (e.g. nemo-curator.docs.buildwithfern.com/nemo/curator).
- logo.href: was '/'; now points at /nemo/datadesigner/getting-started/welcome
  (the actual landing page) so clicking the logo lands on real content
  instead of the bare basepath.
- experimental.basepath-aware: true — opts into Fern's basepath-aware
  routing so internal links don't double-prefix the /nemo/datadesigner
  segment.
- redirects: also fix /nemo/datadesigner/index.html → getting-started/welcome
  (was bouncing to /latest, which is just the version slug); add
  /getting-started → /getting-started/welcome to mirror Curator's
  /home → /home/welcome convention.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
lbliii and others added 6 commits April 28, 2026 15:29
Signed-off-by: Kirit93 <kthadaka@nvidia.com>
Made-with: Cursor
Replaces the generic <CardGroup>/<Card> grid (same green icon × 10, date
glued to bottom of description) with a purpose-built BlogCard for the
dev-notes landing page.

Each card now has:
- Hero image (16:9, lazy-loaded, click-to-zoom via Fern's rmiz wrapper)
- ALL-CAPS date eyebrow as proper subtitle styling
- Title, 3-line clamped description
- Author byline at the bottom: avatar stack (overlapping) + first author
  name + "+N", pulling from the existing devnotes/.authors.yml registry
- Hover: NVIDIA-green border + subtle lift

Posts without a hero image fall back to a deterministic hash-based
gradient placeholder + monogram (DJB2 hash of href → HSL hue, with the
muddy-yellow band 40–90° remapped). Same post always gets the same look.

Notes:
- Image prop is React.ReactNode (not string) — pass <img> JSX from MDX
  so Fern's link rewriter can resolve the src to /_local/... in dev and
  /nemo/datadesigner/assets/... in prod. Raw string props bypass the
  rewriter and 404 in dev.
- Card href runs through a small withBasepath() helper since the <a>
  also bypasses Fern's link rewriter.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Fern's prose stylesheet applies a top margin to <img> tags, and the
click-to-zoom wrapper Fern injects around each image (<span data-rmiz>)
inherits that margin too. Result: a ~1rem gap between the card's top
edge and the hero image.

Reset margin/padding on the rmiz wrapper spans + the img itself inside
.blog-card__media so the image renders flush against the top border.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
When an <img> appears in MDX, Fern auto-wraps it with a click-to-zoom
shell (<span data-rmiz>...). On the dev-notes index that shell intercepts
clicks meant for the card's <a> wrapper, so clicking a hero opens a
lightbox AND tries to navigate.

Set pointer-events: none on the rmiz spans + img inside .blog-card__media
so clicks bubble straight to the parent <a> and the card behaves as a
single, predictable link target. Hover still works because pointer-events
on children doesn't block :hover on the ancestor <a>.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Replaces NotebookViewer's hand-rolled JS markdown parser (the one with
the ^@br^@ sentinel the reviewer flagged as fragile) with build-time
rendering in the converter.

ipynb-to-fern-json.py now uses markdown-it-py (CommonMark + tables +
strikethrough + raw HTML) to render each markdown cell's source into
source_html, mirroring how code cells already store Pygments-highlighted
source_html. NotebookViewer's markdown branch becomes a single
dangerouslySetInnerHTML on the pre-rendered HTML, with a plain-escape
fallback for old snapshots.

Removes the dead JS helpers (renderMarkdown, isSafeUrl, UL_CLASS,
OL_CLASS) — ~60 lines of brittle regex-based markdown parsing.

Fixes broken rendering of:
- Blockquotes (showed literal > characters before)
- Nested content inside blockquotes (e.g. blockquote with bullet list)
- Fenced code blocks
- Tables
- Multi-paragraph list items

Includes regenerated fern/components/notebooks/*.{json,ts} for all 6
tutorials.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
lbliii added a commit to NVIDIA-NeMo/Gym that referenced this pull request May 1, 2026
## Summary
- Mirrors review nits from
[NVIDIA-NeMo/DataDesigner#581](NVIDIA-NeMo/DataDesigner#581)
on Fern setup.
- Drops `https://` from `instances.url` (Fern expects a host, not a full
URL).
- Adds `experimental.basepath-aware: true` so links resolve correctly
under the `/nemo/gym` basepath.

## Test plan
- [x] `fern check` error count unchanged from main (61 pre-existing
broken-link errors in `versions/latest/pages/index.mdx`, not introduced
here)
- [ ] Verify Fern preview/build succeeds in CI

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants