Skip to content

Docs add NVIDIA Magpie Multilingual tts service #216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 19 additions & 18 deletions server/services/supported-services.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -52,24 +52,25 @@ description: "AI services integrated with Pipecat and their setup requirements"

## Text-to-Speech

| Service | Setup |
| -------------------------------------------------- | -------------------------------------- |
| [Amazon Polly](/server/services/tts/aws) | `pip install "pipecat-ai[aws]"` |
| [Azure](/server/services/tts/azure) | `pip install "pipecat-ai[azure]"` |
| [Cartesia](/server/services/tts/cartesia) | `pip install "pipecat-ai[cartesia]"` |
| [Deepgram](/server/services/tts/deepgram) | `pip install "pipecat-ai[deepgram]"` |
| [ElevenLabs](/server/services/tts/elevenlabs) | `pip install "pipecat-ai[elevenlabs]"` |
| [Fish](/server/services/tts/fish) | `pip install "pipecat-ai[fish]"` |
| [Google](/server/services/tts/google) | `pip install "pipecat-ai[google]"` |
| [Groq](/server/services/tts/groq) | `pip install "pipecat-ai[groq]"` |
| [LMNT](/server/services/tts/lmnt) | `pip install "pipecat-ai[lmnt]"` |
| [Neuphonic](/server/services/tts/neuphonic) | `pip install "pipecat-ai[neuphonic]"` |
| [NVIDIA FastPitch](/server/services/tts/fastpitch) | `pip install "pipecat-ai[riva]"` |
| [OpenAI](/server/services/tts/openai) | `pip install "pipecat-ai[openai]"` |
| [Piper](/server/services/tts/piper) | No dependencies required |
| [PlayHT](/server/services/tts/playht) | `pip install "pipecat-ai[playht]"` |
| [Rime](/server/services/tts/rime) | `pip install "pipecat-ai[rime]"` |
| [XTTS](/server/services/tts/xtts) | `pip install "pipecat-ai[xtts]"` |
| Service | Setup |
| --------------------------------------------------------- | -------------------------------------- |
| [Amazon Polly](/server/services/tts/aws) | `pip install "pipecat-ai[aws]"` |
| [Azure](/server/services/tts/azure) | `pip install "pipecat-ai[azure]"` |
| [Cartesia](/server/services/tts/cartesia) | `pip install "pipecat-ai[cartesia]"` |
| [Deepgram](/server/services/tts/deepgram) | `pip install "pipecat-ai[deepgram]"` |
| [ElevenLabs](/server/services/tts/elevenlabs) | `pip install "pipecat-ai[elevenlabs]"` |
| [Fish](/server/services/tts/fish) | `pip install "pipecat-ai[fish]"` |
| [Google](/server/services/tts/google) | `pip install "pipecat-ai[google]"` |
| [Groq](/server/services/tts/groq) | `pip install "pipecat-ai[groq]"` |
| [LMNT](/server/services/tts/lmnt) | `pip install "pipecat-ai[lmnt]"` |
| [Neuphonic](/server/services/tts/neuphonic) | `pip install "pipecat-ai[neuphonic]"` |
| [NVIDIA FastPitch](/server/services/tts/fastpitch) | `pip install "pipecat-ai[riva]"` |
| [NVIDIA Magpie Multilingual](/server/services/tts/magpie) | `pip install "pipecat-ai[riva]"` |
| [OpenAI](/server/services/tts/openai) | `pip install "pipecat-ai[openai]"` |
| [Piper](/server/services/tts/piper) | No dependencies required |
| [PlayHT](/server/services/tts/playht) | `pip install "pipecat-ai[playht]"` |
| [Rime](/server/services/tts/rime) | `pip install "pipecat-ai[rime]"` |
| [XTTS](/server/services/tts/xtts) | `pip install "pipecat-ai[xtts]"` |

## Speech-to-Speech

Expand Down
162 changes: 162 additions & 0 deletions server/services/tts/magpie.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
---
title: "NVIDIA Magpie"
description: "Text-to-speech service implementation using NVIDIA’s Magpie model"
---

## Overview

`MagpieTTSService` converts text to speech using NVIDIA's Riva `magpie-tts-multilingual` model. It provides high-quality text-to-speech synthesis with configurable voice options, including multilingual voices.

## Installation

To use `MagpieTTSService`, install the required dependencies:

```bash
pip install "pipecat-ai[riva]"
```

You'll also need to set up your NVIDIA API key as an environment variable: `NVIDIA_API_KEY`

## Configuration

### Constructor Parameters

<ParamField path="api_key" type="str" required>
Your NVIDIA API key
</ParamField>

<ParamField path="server" type="str" default="grpc.nvcf.nvidia.com:443">
NVIDIA Riva server address
</ParamField>

<ParamField path="voice_id" type="str" default="English-US.Female-1">
Voice identifier to use for synthesis
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
Output audio sample rate in Hz
</ParamField>

<ParamField
path="function_id"
type="str"
default="0149dedb-2be8-4195-b9a0-e57e0e14f972"
>
NVIDIA function identifier for the TTS service
</ParamField>

<ParamField path="params" type="InputParams" default="InputParams()">
Additional configuration parameters (language and quality)
</ParamField>

### InputParams

<ParamField path="language" type="Language" default="Language.EN_US">
The language for TTS generation
</ParamField>

<ParamField path="quality" type="int" default="20">
Quality level for the generated audio
</ParamField>

## Input

The service accepts text input through its TTS pipeline.

## Output Frames

### TTSStartedFrame

Signals the start of audio generation.

### TTSAudioRawFrame

Contains generated audio data:

<ParamField path="audio" type="bytes">
Raw audio data chunk
</ParamField>

<ParamField path="sample_rate" type="int">
Audio sample rate
</ParamField>

<ParamField path="num_channels" type="int">
Number of audio channels (1 for mono)
</ParamField>

### TTSStoppedFrame

Signals the completion of audio generation.

## Methods

See the [TTS base class methods](/server/base-classes/speech#ttsservice) for additional functionality.

## Language Support

Magpie TTS primarily supports English with various regional accents:

| Language Code | Description | Service Codes |
| ---------------- | --------------- | ------------- |
| `Language.EN_US` | English (US) | `en-US` |
| `Language.ES-US` | Spanish (US) | `es-US` |
| `Language.FR-FR` | French (France) | `fr-FR` |

## Usage Example

```python
from pipecat.services.riva.tts import MagpieTTSService
from pipecat.transcriptions.language import Language

# Configure service
tts = MagpieTTSService(
api_key="your-nvidia-api-key",
voice_id="Magpie-Multilingual.ES-US.Male.Male-1",
params=MagpieTTSService.InputParams(
language=Language.ES_US,
quality=20
)
)

# Use in pipeline
pipeline = Pipeline([
...,
llm,
tts,
transport.output(),
])
```

## Frame Flow

```mermaid
graph TD
A[TextFrame] --> B[MagpieTTSService]
B --> C[TTSStartedFrame]
B --> D[TTSAudioRawFrame]
B --> E[TTSStoppedFrame]
B --> F[ErrorFrame]
```

## Metrics Support

The service supports metrics collection:

- Time to First Byte (TTFB)
- TTS usage metrics
- Processing duration

## Audio Processing

- Processes audio through the Riva API
- Generates mono audio output
- Handles asynchronous audio streaming
- Configurable sampling rate

## Notes

- Uses NVIDIA's Riva AI Services platform
- Streams audio in chunks
- Requires valid NVIDIA API key
- Thread-safe processing with asyncio