feat: add MiniMax TTS provider support by octo-patch · Pull Request #430 · jamiepine/voicebox

octo-patch · 2026-04-16T08:11:36Z

Summary

This PR adds MiniMax TTS (Text-to-Audio) as a new voice generation backend for Voicebox.

What's added

backend/backends/minimax_backend.py — Cloud TTS backend that calls the MiniMax T2A v2 API. Streams PCM audio via SSE, decodes hex-encoded chunks directly to NumPy float32 — no local model download required.
backend/backends/__init__.py — Registers minimax in TTS_ENGINES and get_tts_backend_for_engine.
backend/models.py — Adds minimax to the GenerationRequest.engine validation pattern.
backend/services/profiles.py — Exposes MiniMax preset voice IDs for profile validation.
backend/tests/test_minimax_backend.py — 25 unit tests covering API key loading, SSE parsing, error handling, PCM decoding, engine registration, and profile service integration. Includes a live integration test (auto-skipped when MINIMAX_API_KEY is absent).

How it works

MiniMax is a cloud API backend (no local model download). Users create a preset voice profile using one of the MiniMax system voices, then generate speech via the Voicebox UI or API — the backend streams PCM audio from api.minimax.io and returns it as a NumPy array at 32 kHz.

Environment variable: MINIMAX_API_KEY (set in your environment or ~/.env.local).

Available models: speech-2.8-hd (default, highest quality), speech-2.8-turbo (faster).

Available voices (6 built-in, English + Chinese):

English_Graceful_Lady, English_Insightful_Speaker, English_radiant_girl
English_Persuasive_Man, English_Lucky_Robot, English_expressive_narrator
Chinese_Gentle_and_Clear, Chinese_Energetic_Boy, Chinese_Elegant_Lady, and more

Key design decisions

Requests PCM format (16-bit LE) instead of MP3 so audio decodes with a single NumPy call — no codec dependency.
load_model() validates the API key on first call and is otherwise a no-op (no download progress tracking needed).
MiniMax does not support voice cloning — it is a preset-only engine and is intentionally excluded from CLONING_ENGINES.

API documentation

TTS: https://platform.minimax.io/docs/api-reference/speech-t2a-http

Summary by CodeRabbit

New Features
- Added MiniMax TTS engine support, enabling text-to-audio speech synthesis using MiniMax's cloud service with preset voice options. Requires API key configuration.

- Add MiniMax cloud TTS backend (backend/backends/minimax_backend.py) using the MiniMax T2A v2 API with SSE streaming and PCM audio output - Register minimax engine in TTS_ENGINES and get_tts_backend_for_engine - Add minimax to GenerationRequest engine field validation - Add minimax preset voice ID support in profile service - Add comprehensive unit tests (25 tests) including live integration test - No local model download required; uses MINIMAX_API_KEY env variable API docs: - TTS: https://platform.minimax.io/docs/api-reference/speech-t2a-http

coderabbitai · 2026-04-16T08:11:55Z

📝 Walkthrough

Walkthrough

A new MiniMax TTS backend is added as a cloud-only speech synthesis engine. The integration includes registry updates for engine discovery, a complete backend implementation with PCM audio decoding and SSE stream parsing, model validation support, voice profile integration, and comprehensive test coverage.

Changes

Cohort / File(s)	Summary
Backend Registry & Routing `backend/backends/__init__.py`	Added `"minimax"` engine to `TTS_ENGINES` registry; updated `ensure_model_cached_or_raise()` with minimax-specific validation via `backend._is_model_cached()` and HTTP 400 error handling; updated `get_tts_backend_for_engine()` to instantiate `MiniMaxTTSBackend`.
MiniMax Backend Implementation `backend/backends/minimax_backend.py`	New cloud-only backend implementing `MiniMaxTTSBackend` with API key loading from environment/`.env.local`, preset voice handling (no voice cloning), SSE stream parsing for audio chunks, PCM hex-decoding to float32 numpy arrays with proper sample rate handling, and async/sync bridging via `asyncio.to_thread`.
Model & Service Integration `backend/models.py`, `backend/services/profiles.py`	Updated `GenerationRequest.engine` validation to accept `"minimax"`; added minimax support to `_get_preset_voice_ids()` with conditional import of `MINIMAX_VOICES`.
Test Suite `backend/tests/test_minimax_backend.py`	Comprehensive pytest coverage including API key loading scenarios, PCM decoding correctness, SSE stream parsing with mocked responses, backend state validation, registry integration checks, preset voice profile verification, and optional live integration test.

Sequence Diagram

sequenceDiagram
    actor Client
    participant GenerationRequest
    participant BackendRegistry
    participant MiniMaxBackend
    participant MiniMaxAPI
    
    Client->>GenerationRequest: POST /generate with engine="minimax"
    GenerationRequest->>BackendRegistry: validate & get_tts_backend_for_engine("minimax")
    BackendRegistry->>MiniMaxBackend: instantiate MiniMaxTTSBackend
    MiniMaxBackend->>MiniMaxBackend: load_model() - verify API key
    MiniMaxBackend->>MiniMaxBackend: create_voice_prompt() - return preset voice
    MiniMaxBackend->>MiniMaxAPI: _generate_sync(text, voice_id, model)
    MiniMaxAPI-->>MiniMaxBackend: SSE stream with audio chunks
    MiniMaxBackend->>MiniMaxBackend: _pcm_bytes_to_numpy() - decode hex PCM to float32
    MiniMaxBackend-->>GenerationRequest: return (audio_array, sample_rate)
    GenerationRequest-->>Client: audio response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix: GUI startup with external server + data refresh on server switch #319: Modifies backend registry and engine dispatch logic (TTS_ENGINES, get_tts_backend_for_engine) to recognize new engines—same pattern and integration points as this PR's minimax additions.

Poem

🐰 A whisker-twitching hop of joy!
Cloud voices sing without a toy,
Preset tones, no clones to train,
MiniMax flows through PCM's lane,
Streaming bytes decoded right—
Audio magic, oh what delight! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely describes the main change: adding MiniMax as a new TTS provider. It accurately reflects the primary objective of the changeset across all modified and new files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/models.py`:
- Around line 79-81: The model selection for MiniMax is never exposed: the
Pydantic Field model_size only allows local-model values and
backend/routes/generations.py clears model_size for engines without sizes, while
MiniMaxTTSBackend.generate() falls back to MINIMAX_TTS_DEFAULT_MODEL when
voice_prompt["tts_model"] is missing; fix by (1) updating the model schema (the
model_size Field or add a new tts_model optional field) to accept MiniMax model
names like "speech-2.8-hd" and "speech-2.8-turbo", (2) stop clearing/preserving
that value in backend/routes/generations.py for engine == "minimax" so the
client-provided model passes through, and (3) ensure
MiniMaxTTSBackend.generate() uses voice_prompt["tts_model"] if present (and only
falls back to MINIMAX_TTS_DEFAULT_MODEL when absent) so requests can select
speech-2.8-hd vs speech-2.8-turbo.

In `@backend/tests/test_minimax_backend.py`:
- Around line 374-404: Change the live MiniMax test
(test_integration_generate_speech) so it only runs when explicitly opted-in:
instead of skipping based solely on _load_api_key() returning falsy, require an
additional opt-in flag (e.g., environment variable RUN_MINIMAX_INTEGRATION or a
pytest marker) and check that flag alongside MINIMAX_API_KEY before calling
pytest.skip; update the test to read that flag and call pytest.skip unless both
the API key and the opt-in flag/marker are present, and document the new opt-in
requirement in the test comment.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4138bf72-bf10-4682-8b08-0d6f53a0fca7

📥 Commits

Reviewing files that changed from the base of the PR and between 75abbb0 and 61ec847.

📒 Files selected for processing (5)

backend/backends/__init__.py
backend/backends/minimax_backend.py
backend/models.py
backend/services/profiles.py
backend/tests/test_minimax_backend.py

coderabbitai · 2026-04-16T08:16:51Z

    model_size: Optional[str] = Field(default="1.7B", pattern="^(1\\.7B|0\\.6B|1B|3B)$")
    instruct: Optional[str] = Field(None, max_length=500)
-    engine: Optional[str] = Field(default="qwen", pattern="^(qwen|qwen_custom_voice|luxtts|chatterbox|chatterbox_turbo|tada|kokoro)$")
+    engine: Optional[str] = Field(default="qwen", pattern="^(qwen|qwen_custom_voice|luxtts|chatterbox|chatterbox_turbo|tada|kokoro|minimax)$")


⚠️ Potential issue | 🟠 Major

MiniMax model selection still isn't exposed.

This change allows engine="minimax", but clients still have no way to pick speech-2.8-hd vs speech-2.8-turbo: model_size only accepts the local-model values, backend/routes/generations.py clears it for engines without model sizes, and MiniMaxTTSBackend.generate() falls back to MINIMAX_TTS_DEFAULT_MODEL when voice_prompt["tts_model"] is absent. As written, every MiniMax request is effectively pinned to speech-2.8-hd.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/models.py` around lines 79 - 81, The model selection for MiniMax is never exposed: the Pydantic Field model_size only allows local-model values and backend/routes/generations.py clears model_size for engines without sizes, while MiniMaxTTSBackend.generate() falls back to MINIMAX_TTS_DEFAULT_MODEL when voice_prompt["tts_model"] is missing; fix by (1) updating the model schema (the model_size Field or add a new tts_model optional field) to accept MiniMax model names like "speech-2.8-hd" and "speech-2.8-turbo", (2) stop clearing/preserving that value in backend/routes/generations.py for engine == "minimax" so the client-provided model passes through, and (3) ensure MiniMaxTTSBackend.generate() uses voice_prompt["tts_model"] if present (and only falls back to MINIMAX_TTS_DEFAULT_MODEL when absent) so requests can select speech-2.8-hd vs speech-2.8-turbo.

coderabbitai · 2026-04-16T08:16:51Z

+@pytest.mark.asyncio
+async def test_integration_generate_speech():
+    """
+    Live integration test: call MiniMax TTS API and verify we get audio back.
+
+    Skipped automatically when MINIMAX_API_KEY is not configured.
+    """
+    from backend.backends.minimax_backend import MiniMaxTTSBackend, _load_api_key, _pcm_bytes_to_numpy, MINIMAX_TTS_SAMPLE_RATE
+
+    api_key = _load_api_key()
+    if not api_key:
+        pytest.skip("MINIMAX_API_KEY not configured")
+
+    backend = MiniMaxTTSBackend()
+    await backend.load_model()
+
+    voice_prompt = {
+        "voice_type": "preset",
+        "preset_engine": "minimax",
+        "preset_voice_id": "English_Graceful_Lady",
+    }
+
+    audio, sr = await backend.generate(
+        text="Hello, this is a MiniMax TTS integration test.",
+        voice_prompt=voice_prompt,
+    )
+
+    assert isinstance(audio, np.ndarray)
+    assert audio.dtype == np.float32
+    assert len(audio) > 0
+    assert sr == MINIMAX_TTS_SAMPLE_RATE


⚠️ Potential issue | 🟠 Major

Make the live MiniMax test explicitly opt-in.

This runs on any machine or CI job that happens to have MINIMAX_API_KEY, which makes the default test suite non-hermetic and can introduce network flakiness plus paid API usage. Please gate it behind an explicit marker or a second env flag instead of enabling it solely from credential presence.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_minimax_backend.py` around lines 374 - 404, Change the live MiniMax test (test_integration_generate_speech) so it only runs when explicitly opted-in: instead of skipping based solely on _load_api_key() returning falsy, require an additional opt-in flag (e.g., environment variable RUN_MINIMAX_INTEGRATION or a pytest marker) and check that flag alongside MINIMAX_API_KEY before calling pytest.skip; update the test to read that flag and call pytest.skip unless both the API key and the opt-in flag/marker are present, and document the new opt-in requirement in the test comment.

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add MiniMax TTS provider support#430

feat: add MiniMax TTS provider support#430
octo-patch wants to merge 1 commit intojamiepine:mainfrom
octo-patch:feature/add-minimax-tts-provider

octo-patch commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 16, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 16, 2026

Uh oh!

coderabbitai bot Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

octo-patch commented Apr 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's added

How it works

Key design decisions

API documentation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

octo-patch commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 16, 2026 •

edited

Loading