docs(voice-clone): clarify --prompt-text scope, length tuning, profile reuse (#9) by MukundaKatta · Pull Request #13 · OpenMOSS/MOSS-TTS-Nano

MukundaKatta · 2026-04-15T06:26:09Z

Summary

Closes #9.

The reporter asked three concrete questions about voice cloning that the docs don't currently answer cleanly. This PR addresses all three.

1. `--prompt-text` help was misleading

In both `infer.py` and `moss_tts_nano/cli.py`, the help text on `--prompt-text` said it was "used by continuation mode". But `model.inference` actually receives `prompt_text` regardless of mode (`infer.py:344-362`), and supplying it improves voice-clone quality because the model can text-align the prompt clip.

→ Updated both CLI help strings to make this explicit.

2. Length recommendation

We don't enforce a hard limit on prompt-audio length, but the reporter found that 3 s gave decent results while 2 / 6 / 10 / 30 s did not. Added an honest note in both READMEs:

Empirically, short clips (≈ 3–10 seconds) of clean speech tend to give the best results: long clips spend more of the model's prompt budget on acoustic context, and very short ones (< 2 s) often don't carry enough timbre. If you see degraded output, try clipping a clean, single-speaker passage at around 5 seconds.

3. Voice-profile caching

There's no profile abstraction in the codebase yet, so the cleanest pattern is to keep the model in process and call `inference()` repeatedly with the same prompt args. Documented exactly that, plus the three obvious ways to do it (`python -i infer.py`, `moss-tts-nano serve`, `MossTtsNanoRuntime` reuse).

Files changed

`infer.py` — corrected `--prompt-text` / `--prompt-text-file` help.
`moss_tts_nano/cli.py` — same correction on the packaged CLI.
`README.md` — new "Voice cloning details" subsection under Voice Clone with infer.py.
`README_zh.md` — same subsection translated.

Test plan

`python3 -c "import ast; ast.parse(open('infer.py').read()); ast.parse(open('moss_tts_nano/cli.py').read())"` — syntax clean.
No behavioural change; help-text + docs only.

AI-assisted disclosure

Drafted with Claude Code. I traced `prompt_text` through `infer.py:300-362` to confirm it reaches `model.inference` for both modes, then wrote the docs around what the code actually does (rather than asserting a length recipe I can't ground — the README phrasing is intentionally hedged, citing the issue reporter's empirical observation).

…e reuse (OpenMOSS#9) The CLI help on `--prompt-text` (in both `infer.py` and `moss_tts_nano/cli.py`) said it was "used by continuation mode" — but `model.inference` accepts `prompt_text` for voice_clone mode too, and supplying it improves cloning quality. Update the help to reflect that. Also adds a "Voice cloning details" subsection to README.md and README_zh.md that addresses the three questions from OpenMOSS#9 directly: 1. Yes, you can pass the source audio's transcript via --prompt-text / --prompt-text-file. It works for both modes. 2. Reference audio length: no enforced limit, but ~3–10 seconds of clean single-speaker speech tends to give the best results. Acknowledges the empirical observation that very short or very long clips degrade output, with a concrete suggestion (clip ~5 s). 3. There's no separate "voice profile" cache yet — keep the model loaded in process (via `python -i infer.py`, `moss-tts-nano serve`, or a reused `MossTtsNanoRuntime`) and call inference repeatedly with the same prompt args. No behavioural change; help text + docs only. Closes OpenMOSS#9.

MukundaKatta force-pushed the docs/voice-cloning-clarifications branch from 72ba67f to 3e7465f Compare April 18, 2026 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(voice-clone): clarify --prompt-text scope, length tuning, profile reuse (#9)#13

docs(voice-clone): clarify --prompt-text scope, length tuning, profile reuse (#9)#13
MukundaKatta wants to merge 1 commit intoOpenMOSS:mainfrom
MukundaKatta:docs/voice-cloning-clarifications

MukundaKatta commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MukundaKatta commented Apr 15, 2026

Summary

1. `--prompt-text` help was misleading

2. Length recommendation

3. Voice-profile caching

Files changed

Test plan

AI-assisted disclosure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant