Skip to content

[Docs] Close late code-freeze docs and release-note gaps #2144

Description

@lbliii

Context

Child of #2118. The original epic used the v1.2.0 publication timestamp as its cutoff. Sara's follow-up inventory identified release-relevant changes merged after the 26.04 code freeze but before that publication timestamp. These commits are reachable from v1.2.0..main and need a final documentation/release disposition.

This issue bundles the smaller gaps that do not justify independent feature pages.

Documentation update

Semantic deduplication

Document #1927 in the semantic-deduplication guide:

  • KMeansStage.fit_data_fraction
  • SemanticDeduplicationWorkflow.fit_data_fraction
  • TextSemanticDeduplicationWorkflow.kmeans_fit_data_fraction
  • Centroid cache_path behavior
  • Two-pass memory/IO tradeoffs, file-level sampling, validation, and output artifact

26.06 release-note coverage

Add or verify concise entries for:

  • #1746 — repository Getting Started agent skill
  • #1762JsonlWriter preserves UTF-8 by default
  • #1763 — distinct Nemotron-CC stage names in metrics and diagnostics
  • #1774 — interleaved image-text getting-started workflow
  • #1870 — interactive DNS Challenge Read Speech tutorial
  • #1880 — corrected URL matching for URL-ratio/repetition filters
  • #1888 — lazy text-classifier imports and lower cold-import cost
  • #1890 — lazy OpenAI client initialization
  • #1927 — memory-efficient KMeans fitting and optional centroid caching
  • #1895 / #1957 — final CUDA, Ray, PyTorch, vLLM, HAProxy, and ai-dynamo dependency/runtime changes

Existing workstream attribution

Update the existing children and release notes rather than create duplicate feature pages:

  • #1427 belongs to the Nemotron-CLIMB/Megatron output workstream
  • #1679 belongs to the audio-tagging workstream
  • #1844 belongs to the translation workstream
  • #1855 belongs to the 26.06 Python migration

Acceptance criteria

  • Semantic-deduplication docs explain all #1927 controls and their memory/IO consequences
  • Every PR listed above has an explicit release-note or existing-workstream disposition
  • Release dependency versions reflect final main, not an intermediate PR description
  • Existing interleaved and Read Speech pages are linked rather than duplicated
  • #1427, #1679, #1844, and #1855 are recorded on their existing child issues
  • The parent epic records the supplemental code-freeze audit
  • Fern validation and link checks pass

Related PRs: #1427, #1679, #1746, #1762, #1763, #1774, #1844, #1855, #1870, #1880, #1888, #1890, #1895, #1927, #1957

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions