Context
Child of #2118. The original epic used the v1.2.0 publication timestamp as its cutoff. Sara's follow-up inventory identified release-relevant changes merged after the 26.04 code freeze but before that publication timestamp. These commits are reachable from v1.2.0..main and need a final documentation/release disposition.
This issue bundles the smaller gaps that do not justify independent feature pages.
Documentation update
Semantic deduplication
Document #1927 in the semantic-deduplication guide:
KMeansStage.fit_data_fraction
SemanticDeduplicationWorkflow.fit_data_fraction
TextSemanticDeduplicationWorkflow.kmeans_fit_data_fraction
- Centroid
cache_path behavior
- Two-pass memory/IO tradeoffs, file-level sampling, validation, and output artifact
26.06 release-note coverage
Add or verify concise entries for:
- #1746 — repository Getting Started agent skill
- #1762 —
JsonlWriter preserves UTF-8 by default
- #1763 — distinct Nemotron-CC stage names in metrics and diagnostics
- #1774 — interleaved image-text getting-started workflow
- #1870 — interactive DNS Challenge Read Speech tutorial
- #1880 — corrected URL matching for URL-ratio/repetition filters
- #1888 — lazy text-classifier imports and lower cold-import cost
- #1890 — lazy OpenAI client initialization
- #1927 — memory-efficient KMeans fitting and optional centroid caching
- #1895 / #1957 — final CUDA, Ray, PyTorch, vLLM, HAProxy, and ai-dynamo dependency/runtime changes
Existing workstream attribution
Update the existing children and release notes rather than create duplicate feature pages:
- #1427 belongs to the Nemotron-CLIMB/Megatron output workstream
- #1679 belongs to the audio-tagging workstream
- #1844 belongs to the translation workstream
- #1855 belongs to the 26.06 Python migration
Acceptance criteria
Related PRs: #1427, #1679, #1746, #1762, #1763, #1774, #1844, #1855, #1870, #1880, #1888, #1890, #1895, #1927, #1957
Context
Child of #2118. The original epic used the v1.2.0 publication timestamp as its cutoff. Sara's follow-up inventory identified release-relevant changes merged after the 26.04 code freeze but before that publication timestamp. These commits are reachable from
v1.2.0..mainand need a final documentation/release disposition.This issue bundles the smaller gaps that do not justify independent feature pages.
Documentation update
Semantic deduplication
Document #1927 in the semantic-deduplication guide:
KMeansStage.fit_data_fractionSemanticDeduplicationWorkflow.fit_data_fractionTextSemanticDeduplicationWorkflow.kmeans_fit_data_fractioncache_pathbehavior26.06 release-note coverage
Add or verify concise entries for:
JsonlWriterpreserves UTF-8 by defaultExisting workstream attribution
Update the existing children and release notes rather than create duplicate feature pages:
Acceptance criteria
main, not an intermediate PR descriptionRelated PRs: #1427, #1679, #1746, #1762, #1763, #1774, #1844, #1855, #1870, #1880, #1888, #1890, #1895, #1927, #1957