feat: add MiniMax TTS provider support#430
Conversation
- Add MiniMax cloud TTS backend (backend/backends/minimax_backend.py) using the MiniMax T2A v2 API with SSE streaming and PCM audio output - Register minimax engine in TTS_ENGINES and get_tts_backend_for_engine - Add minimax to GenerationRequest engine field validation - Add minimax preset voice ID support in profile service - Add comprehensive unit tests (25 tests) including live integration test - No local model download required; uses MINIMAX_API_KEY env variable API docs: - TTS: https://platform.minimax.io/docs/api-reference/speech-t2a-http
📝 WalkthroughWalkthroughA new MiniMax TTS backend is added as a cloud-only speech synthesis engine. The integration includes registry updates for engine discovery, a complete backend implementation with PCM audio decoding and SSE stream parsing, model validation support, voice profile integration, and comprehensive test coverage. Changes
Sequence DiagramsequenceDiagram
actor Client
participant GenerationRequest
participant BackendRegistry
participant MiniMaxBackend
participant MiniMaxAPI
Client->>GenerationRequest: POST /generate with engine="minimax"
GenerationRequest->>BackendRegistry: validate & get_tts_backend_for_engine("minimax")
BackendRegistry->>MiniMaxBackend: instantiate MiniMaxTTSBackend
MiniMaxBackend->>MiniMaxBackend: load_model() - verify API key
MiniMaxBackend->>MiniMaxBackend: create_voice_prompt() - return preset voice
MiniMaxBackend->>MiniMaxAPI: _generate_sync(text, voice_id, model)
MiniMaxAPI-->>MiniMaxBackend: SSE stream with audio chunks
MiniMaxBackend->>MiniMaxBackend: _pcm_bytes_to_numpy() - decode hex PCM to float32
MiniMaxBackend-->>GenerationRequest: return (audio_array, sample_rate)
GenerationRequest-->>Client: audio response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/models.py`:
- Around line 79-81: The model selection for MiniMax is never exposed: the
Pydantic Field model_size only allows local-model values and
backend/routes/generations.py clears model_size for engines without sizes, while
MiniMaxTTSBackend.generate() falls back to MINIMAX_TTS_DEFAULT_MODEL when
voice_prompt["tts_model"] is missing; fix by (1) updating the model schema (the
model_size Field or add a new tts_model optional field) to accept MiniMax model
names like "speech-2.8-hd" and "speech-2.8-turbo", (2) stop clearing/preserving
that value in backend/routes/generations.py for engine == "minimax" so the
client-provided model passes through, and (3) ensure
MiniMaxTTSBackend.generate() uses voice_prompt["tts_model"] if present (and only
falls back to MINIMAX_TTS_DEFAULT_MODEL when absent) so requests can select
speech-2.8-hd vs speech-2.8-turbo.
In `@backend/tests/test_minimax_backend.py`:
- Around line 374-404: Change the live MiniMax test
(test_integration_generate_speech) so it only runs when explicitly opted-in:
instead of skipping based solely on _load_api_key() returning falsy, require an
additional opt-in flag (e.g., environment variable RUN_MINIMAX_INTEGRATION or a
pytest marker) and check that flag alongside MINIMAX_API_KEY before calling
pytest.skip; update the test to read that flag and call pytest.skip unless both
the API key and the opt-in flag/marker are present, and document the new opt-in
requirement in the test comment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4138bf72-bf10-4682-8b08-0d6f53a0fca7
📒 Files selected for processing (5)
backend/backends/__init__.pybackend/backends/minimax_backend.pybackend/models.pybackend/services/profiles.pybackend/tests/test_minimax_backend.py
| model_size: Optional[str] = Field(default="1.7B", pattern="^(1\\.7B|0\\.6B|1B|3B)$") | ||
| instruct: Optional[str] = Field(None, max_length=500) | ||
| engine: Optional[str] = Field(default="qwen", pattern="^(qwen|qwen_custom_voice|luxtts|chatterbox|chatterbox_turbo|tada|kokoro)$") | ||
| engine: Optional[str] = Field(default="qwen", pattern="^(qwen|qwen_custom_voice|luxtts|chatterbox|chatterbox_turbo|tada|kokoro|minimax)$") |
There was a problem hiding this comment.
MiniMax model selection still isn't exposed.
This change allows engine="minimax", but clients still have no way to pick speech-2.8-hd vs speech-2.8-turbo: model_size only accepts the local-model values, backend/routes/generations.py clears it for engines without model sizes, and MiniMaxTTSBackend.generate() falls back to MINIMAX_TTS_DEFAULT_MODEL when voice_prompt["tts_model"] is absent. As written, every MiniMax request is effectively pinned to speech-2.8-hd.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/models.py` around lines 79 - 81, The model selection for MiniMax is
never exposed: the Pydantic Field model_size only allows local-model values and
backend/routes/generations.py clears model_size for engines without sizes, while
MiniMaxTTSBackend.generate() falls back to MINIMAX_TTS_DEFAULT_MODEL when
voice_prompt["tts_model"] is missing; fix by (1) updating the model schema (the
model_size Field or add a new tts_model optional field) to accept MiniMax model
names like "speech-2.8-hd" and "speech-2.8-turbo", (2) stop clearing/preserving
that value in backend/routes/generations.py for engine == "minimax" so the
client-provided model passes through, and (3) ensure
MiniMaxTTSBackend.generate() uses voice_prompt["tts_model"] if present (and only
falls back to MINIMAX_TTS_DEFAULT_MODEL when absent) so requests can select
speech-2.8-hd vs speech-2.8-turbo.
| @pytest.mark.asyncio | ||
| async def test_integration_generate_speech(): | ||
| """ | ||
| Live integration test: call MiniMax TTS API and verify we get audio back. | ||
|
|
||
| Skipped automatically when MINIMAX_API_KEY is not configured. | ||
| """ | ||
| from backend.backends.minimax_backend import MiniMaxTTSBackend, _load_api_key, _pcm_bytes_to_numpy, MINIMAX_TTS_SAMPLE_RATE | ||
|
|
||
| api_key = _load_api_key() | ||
| if not api_key: | ||
| pytest.skip("MINIMAX_API_KEY not configured") | ||
|
|
||
| backend = MiniMaxTTSBackend() | ||
| await backend.load_model() | ||
|
|
||
| voice_prompt = { | ||
| "voice_type": "preset", | ||
| "preset_engine": "minimax", | ||
| "preset_voice_id": "English_Graceful_Lady", | ||
| } | ||
|
|
||
| audio, sr = await backend.generate( | ||
| text="Hello, this is a MiniMax TTS integration test.", | ||
| voice_prompt=voice_prompt, | ||
| ) | ||
|
|
||
| assert isinstance(audio, np.ndarray) | ||
| assert audio.dtype == np.float32 | ||
| assert len(audio) > 0 | ||
| assert sr == MINIMAX_TTS_SAMPLE_RATE |
There was a problem hiding this comment.
Make the live MiniMax test explicitly opt-in.
This runs on any machine or CI job that happens to have MINIMAX_API_KEY, which makes the default test suite non-hermetic and can introduce network flakiness plus paid API usage. Please gate it behind an explicit marker or a second env flag instead of enabling it solely from credential presence.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/test_minimax_backend.py` around lines 374 - 404, Change the
live MiniMax test (test_integration_generate_speech) so it only runs when
explicitly opted-in: instead of skipping based solely on _load_api_key()
returning falsy, require an additional opt-in flag (e.g., environment variable
RUN_MINIMAX_INTEGRATION or a pytest marker) and check that flag alongside
MINIMAX_API_KEY before calling pytest.skip; update the test to read that flag
and call pytest.skip unless both the API key and the opt-in flag/marker are
present, and document the new opt-in requirement in the test comment.
Summary
This PR adds MiniMax TTS (Text-to-Audio) as a new voice generation backend for Voicebox.
What's added
backend/backends/minimax_backend.py— Cloud TTS backend that calls the MiniMax T2A v2 API. Streams PCM audio via SSE, decodes hex-encoded chunks directly to NumPy float32 — no local model download required.backend/backends/__init__.py— RegistersminimaxinTTS_ENGINESandget_tts_backend_for_engine.backend/models.py— Addsminimaxto theGenerationRequest.enginevalidation pattern.backend/services/profiles.py— Exposes MiniMax preset voice IDs for profile validation.backend/tests/test_minimax_backend.py— 25 unit tests covering API key loading, SSE parsing, error handling, PCM decoding, engine registration, and profile service integration. Includes a live integration test (auto-skipped whenMINIMAX_API_KEYis absent).How it works
MiniMax is a cloud API backend (no local model download). Users create a preset voice profile using one of the MiniMax system voices, then generate speech via the Voicebox UI or API — the backend streams PCM audio from
api.minimax.ioand returns it as a NumPy array at 32 kHz.Environment variable:
MINIMAX_API_KEY(set in your environment or~/.env.local).Available models:
speech-2.8-hd(default, highest quality),speech-2.8-turbo(faster).Available voices (6 built-in, English + Chinese):
English_Graceful_Lady,English_Insightful_Speaker,English_radiant_girlEnglish_Persuasive_Man,English_Lucky_Robot,English_expressive_narratorChinese_Gentle_and_Clear,Chinese_Energetic_Boy,Chinese_Elegant_Lady, and moreKey design decisions
load_model()validates the API key on first call and is otherwise a no-op (no download progress tracking needed).CLONING_ENGINES.API documentation
Summary by CodeRabbit