[codex] docs: expand Nemotron-Parse tuning guidance by lbliii · Pull Request #2131 · NVIDIA-NeMo/Curator

lbliii · 2026-06-29T20:03:28Z

What changed

document stage-level engine_kwargs tuning with gpu_memory_utilization and max_num_batched_tokens
clarify which controls belong to the vLLM initializer versus vllm.LLM
explain PDFPartitioningStage fanout behavior with Ray Data and its relationship to pdfs_per_task
add a complete vLLM/HF metric reference and TaskPerfUtils aggregation example
show end-to-end pages/s and output-tokens/s calculations using pipeline wall time
distinguish startup port-collision retries from inference-engine resets
expand benchmark guidance without depending on internal nightly infrastructure

Why

PR #2054 added vLLM engine passthrough, additive inference metrics, Ray Data fanout, and broader engine-startup retry handling. The published Nemotron-Parse guide described none of those behaviors, leaving users without a supported tuning or observability path.

User impact

Users can tune vLLM memory and batching, understand backend-specific metrics, measure throughput correctly under parallel execution, and diagnose retry exhaustion without masking non-retryable GPU or configuration errors.

Validation

npm run check from fern/: 0 errors
fern docs broken-links: no errors in the changed page; 22 existing errors remain in older API-reference pages
git diff --check
targeted source tests could not import on this macOS host because NeMo Curator intentionally supports Linux only; the documented behaviors are covered by existing unit tests in test_stages.py and test_vllm_utils.py

Closes #2128
Parent tracking issue: #2118

Signed-off-by: Lawrence Lane <llane@nvidia.com>

copy-pr-bot · 2026-06-29T20:03:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-07-02T14:59:11Z

Greptile Summary

This documentation-only PR expands the Nemotron-Parse PDF guide to cover behaviors added in PR #2054: stage-level engine_kwargs vLLM tuning, PDFPartitioningStage Ray Data fanout semantics, additive inference metrics with TaskPerfUtils aggregation, and a detailed vLLM retry-path breakdown.

Adds a "Tune the vLLM Engine" section with a complete pipeline composition example, a tuning control table, and a <Note> explaining engine_kwargs precedence over max_num_seqs/enforce_eager.
Adds an "Inspect Inference Metrics" section with a working code sample and a full metric reference table covering both the vLLM and HF backends.
Adds a "vLLM Retry Behavior" section distinguishing startup port-collision retries from inference-engine resets, and updates the retry description and default model path in the existing reference table.

Confidence Score: 5/5

Documentation-only change with no runtime impact; all added descriptions, metric names, default values, and code examples were verified against the implementation.

Every factual claim in the new sections was cross-checked against the source: metric key construction in TaskPerfUtils.aggregate_task_metrics, StagePerfStats.items() custom prefix, default values in create_vllm_llm and NemotronParseInferenceStage, the 9000-token SamplingParams limit, retry counts and jitter range, the IS_FANOUT_STAGE marker on PDFPartitioningStage, and the InterleavedParquetWriterStage import path. No discrepancies were found.

No files require special attention.

Important Files Changed

Filename	Overview
fern/versions/main/pages/curate-text/load-data/nemotron-parse-pdf.mdx	Documentation expansion adding three new sections (vLLM tuning, Ray Data fanout, inference metrics) and updating retry/model-path descriptions. All metric names, key structures, default values, and API shapes verified against the implementation.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant User
    participant Pipeline
    participant Partition as PDFPartitioningStage
    participant Preprocess as PDFPreprocessStage
    participant Inference as NemotronParseInferenceStage
    participant Postprocess as NemotronParsePostprocessStage
    participant TPU as TaskPerfUtils

    User->>Pipeline: run(executor)
    Pipeline->>Partition: process(EmptyTask)
    Note over Partition: 1 worker, IS_FANOUT_STAGE=True
    Partition-->>Pipeline: list of FileGroupTask (one per pdfs_per_task group)
    Pipeline->>Preprocess: process(FileGroupTask) in parallel blocks
    Preprocess-->>Pipeline: InterleavedBatch with rendered page images
    Pipeline->>Inference: process(InterleavedBatch) on GPU
    Note over Inference: create_vllm_llm retries port collisions (max_port_retries=3)
    Note over Inference: vLLM generate retried up to 3x with engine reset on failure
    Inference-->>Pipeline: InterleavedBatch with raw model text + custom metrics
    Pipeline->>Postprocess: process(InterleavedBatch)
    Postprocess-->>Pipeline: InterleavedBatch with interleaved rows
    Pipeline-->>User: list of Task with _stage_perf
    User->>TPU: "aggregate_task_metrics(results, prefix="task")"
    TPU-->>User: "task_nemotron_parse_inference_custom.<metric>_sum/mean/std"

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant User
    participant Pipeline
    participant Partition as PDFPartitioningStage
    participant Preprocess as PDFPreprocessStage
    participant Inference as NemotronParseInferenceStage
    participant Postprocess as NemotronParsePostprocessStage
    participant TPU as TaskPerfUtils

    User->>Pipeline: run(executor)
    Pipeline->>Partition: process(EmptyTask)
    Note over Partition: 1 worker, IS_FANOUT_STAGE=True
    Partition-->>Pipeline: list of FileGroupTask (one per pdfs_per_task group)
    Pipeline->>Preprocess: process(FileGroupTask) in parallel blocks
    Preprocess-->>Pipeline: InterleavedBatch with rendered page images
    Pipeline->>Inference: process(InterleavedBatch) on GPU
    Note over Inference: create_vllm_llm retries port collisions (max_port_retries=3)
    Note over Inference: vLLM generate retried up to 3x with engine reset on failure
    Inference-->>Pipeline: InterleavedBatch with raw model text + custom metrics
    Pipeline->>Postprocess: process(InterleavedBatch)
    Postprocess-->>Pipeline: InterleavedBatch with interleaved rows
    Pipeline-->>User: list of Task with _stage_perf
    User->>TPU: "aggregate_task_metrics(results, prefix="task")"
    TPU-->>User: "task_nemotron_parse_inference_custom.<metric>_sum/mean/std"

_{Reviews (2): Last reviewed commit: "Merge branch 'main' into codex/docs-nemo..." | Re-trigger Greptile}

docs: expand Nemotron-Parse tuning guidance

e400a83

Signed-off-by: Lawrence Lane <llane@nvidia.com>

lbliii self-assigned this Jun 29, 2026

lbliii mentioned this pull request Jun 30, 2026

[codex] publish 26.06 release notes and migration checklist #2143

Open

lbliii marked this pull request as ready for review July 2, 2026 14:53

lbliii requested a review from a team as a code owner July 2, 2026 14:53

lbliii requested review from suiyoubi and removed request for a team July 2, 2026 14:53

Merge branch 'main' into codex/docs-nemotron-parse-tuning

2141d4e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[codex] docs: expand Nemotron-Parse tuning guidance#2131

[codex] docs: expand Nemotron-Parse tuning guidance#2131
lbliii wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
lbliii:codex/docs-nemotron-parse-tuning

lbliii commented Jun 29, 2026

Uh oh!

copy-pr-bot Bot commented Jun 29, 2026

Uh oh!

greptile-apps Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lbliii commented Jun 29, 2026

What changed

Why

User impact

Validation

Uh oh!

copy-pr-bot Bot commented Jun 29, 2026

Uh oh!

greptile-apps Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Jul 2, 2026 •

edited

Loading