[codex] docs: add audio tagging pipeline guide#2135
Conversation
Signed-off-by: Lawrence Lane <llane@nvidia.com>
| | Additional split signals | Pause longer than `max_pause`; bandwidth change | Randomized duration boundary | | ||
| | Reproducibility | Deterministic for an input entry | Randomized boundary is deterministically seeded from audio path or ID | | ||
|
|
||
| `terminal_punct_marks` defaults to the value in the YAML, `. ! ? 。 ? ! 。` without spaces. If `punctuation_split_only: true`, the stage returns no prepared segments when it cannot find a punctuation boundary. With the supplied `false`, duration and TTS pause/bandwidth heuristics remain available. |
There was a problem hiding this comment.
terminal_punct_marks is displayed with spaces between each character in the inline code (. ! ? 。 ? ! 。), then immediately qualified with "without spaces" — these two parts contradict each other and will confuse readers trying to copy the value. The actual YAML string is .!?。?!。; use that verbatim so the clarification is unnecessary.
| `terminal_punct_marks` defaults to the value in the YAML, `. ! ? 。 ? ! 。` without spaces. If `punctuation_split_only: true`, the stage returns no prepared segments when it cannot find a punctuation boundary. With the supplied `false`, duration and TTS pause/bandwidth heuristics remain available. | |
| `terminal_punct_marks` defaults to the value in the YAML, `.!?。?!。`. If `punctuation_split_only: true`, the stage returns no prepared segments when it cannot find a punctuation boundary. With the supplied `false`, duration and TTS pause/bandwidth heuristics remain available. |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| | `metrics.wer.wer`, `metrics.cer.cer` | Second-pass disagreement ratios, not percentages. Each object also records token and edit rates. | | ||
| | `metrics.start_cer.cer`, `metrics.end_cer.cer` | CER at the configured beginning and ending character windows. | |
There was a problem hiding this comment.
The output table below the JSON example omits
word_rate and char_rate, which are visible in the example object. A reader seeing these fields in a real manifest will have no documentation to explain them. Both are produced by ComputeWERStage (characters/words per second, derived from the hypothesis text and segment duration).
| | `metrics.wer.wer`, `metrics.cer.cer` | Second-pass disagreement ratios, not percentages. Each object also records token and edit rates. | | |
| | `metrics.start_cer.cer`, `metrics.end_cer.cer` | CER at the configured beginning and ending character windows. | | |
| | `metrics.wer.wer`, `metrics.cer.cer` | Second-pass disagreement ratios, not percentages. Each object also records token and edit rates. | | |
| | `metrics.start_cer.cer`, `metrics.end_cer.cer` | CER at the configured beginning and ending character windows. | | |
| | `metrics.word_rate` | Words per second for the segment, computed from the hypothesis text and segment duration. | | |
| | `metrics.char_rate` | Characters per second for the segment, computed from the hypothesis text and segment duration. | |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Summary
PrepareModuleSegmentsStage, bandwidth, TorchSQUIM, second-pass WER/CER, inverse text normalization, and Chinese conversionPnC and normalization implementation boundary
Issue #2123 was written from the description of #1863, but the final merged source does not contain
PNCwithvLLMInferenceStage,CleanLLMOutputStage,VLLMInference, or an Arabic diacritic-removal stage. This guide does not invent configuration for those unavailable APIs. It explicitly documents that:ComputeWERStage.compute_pnc_werevaluates punctuation-sensitive agreement but does not generate PnCThis keeps the published workflow runnable against the implementation on
mainwhile still addressing the operational questions in the issue.Validation
fern check— 0 errors (103 existing warnings)fern docs broken-links— no errors in the changed pages; reports 22 pre-existing API-reference errors elsewherepython3 -m py_compile tutorials/audio/tagging/main.pygit diff --cached --checkCloses #2123