diff --git a/mint.json b/mint.json
index 01d1efe..adddb42 100644
--- a/mint.json
+++ b/mint.json
@@ -162,8 +162,8 @@
"server/services/stt/gladia",
"server/services/stt/google",
"server/services/stt/groq",
+ "server/services/stt/riva",
"server/services/stt/openai",
- "server/services/stt/parakeet",
"server/services/stt/ultravox",
"server/services/stt/whisper"
]
@@ -198,12 +198,12 @@
"server/services/tts/cartesia",
"server/services/tts/deepgram",
"server/services/tts/elevenlabs",
- "server/services/tts/fastpitch",
"server/services/tts/fish",
"server/services/tts/google",
"server/services/tts/groq",
"server/services/tts/lmnt",
"server/services/tts/neuphonic",
+ "server/services/tts/riva",
"server/services/tts/openai",
"server/services/tts/piper",
"server/services/tts/playht",
diff --git a/server/services/stt/parakeet.mdx b/server/services/stt/parakeet.mdx
deleted file mode 100644
index dc5096f..0000000
--- a/server/services/stt/parakeet.mdx
+++ /dev/null
@@ -1,188 +0,0 @@
----
-title: "NVIDIA Parakeet"
-description: "Speech-to-text service implementation using NVIDIA’s Parakeet speech recognition model"
----
-
-## Overview
-
-`ParakeetSTTService` provides real-time speech-to-text capabilities using NVIDIA's Riva Parakeet model. It supports interim results and configurable recognition parameters for enhanced accuracy.
-
-## Installation
-
-To use `ParakeetSTTService`, install the required dependencies:
-
-```bash
-pip install "pipecat-ai[riva]"
-```
-
-You'll also need to set up your NVIDIA API key as an environment variable: `NVIDIA_API_KEY`.
-
-
- You can obtain an NVIDIA API key by signing up through [NVIDIA's developer
- portal](https://developer.nvidia.com).
-
-
-## Configuration
-
-### Constructor Parameters
-
-
- Your NVIDIA API key
-
-
-
- NVIDIA Riva server address
-
-
-
- NVIDIA function identifier for the STT service
-
-
-
- Audio sample rate in Hz
-
-
-
- Additional configuration parameters
-
-
-### InputParams
-
-
- The language for speech recognition
-
-
-## Input
-
-The service processes audio frames containing:
-
-- Raw PCM audio data
-- 16-bit depth
-- Single channel (mono)
-
-## Output Frames
-
-### TranscriptionFrame
-
-Generated for final transcriptions, containing:
-
-
- Transcribed text
-
-
-
- User identifier
-
-
-
- ISO 8601 formatted timestamp
-
-
-
- Language used for transcription
-
-
-### InterimTranscriptionFrame
-
-Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results.
-
-## Methods
-
-See the [STT base class methods](/server/base-classes/speech#methods) for additional functionality.
-
-## Usage Example
-
-```python
-from pipecat.services.riva.stt import ParakeetSTTService
-from pipecat.transcriptions.language import Language
-
-# Configure service
-stt = ParakeetSTTService(
- api_key="your-nvidia-api-key",
- params=ParakeetSTTService.InputParams(
- language=Language.EN_US
- )
-)
-
-# Use in pipeline
-pipeline = Pipeline([
- transport.input(),
- stt,
- llm,
- ...
-])
-```
-
-## Language Support
-
-Parakeet STT primarily supports English with various regional accents:
-
-| Language Code | Description | Service Codes |
-| ---------------- | ------------ | ------------- |
-| `Language.EN_US` | English (US) | `en-US` |
-
-## Frame Flow
-
-```mermaid
-graph TD
- A[InputAudioRawFrame] --> B[ParakeetSTTService]
- B --> C[InterimTranscriptionFrame]
- B --> D[TranscriptionFrame]
- B --> E[ErrorFrame]
- C --> F[Real-time Processing]
- D --> G[Final Processing]
-```
-
-## Advanced Configuration
-
-The service supports several advanced configuration options that can be adjusted:
-
-
- Filter profanity from transcription
-
-
-
- Automatically add punctuation
-
-
-
- Whether to disable verbatim transcripts
-
-
-
- List of words to boost in the language model
-
-
-
- Score applied to boosted words
-
-
-## Example with Advanced Configuration
-
-```python
-# Configure service with advanced parameters
-stt = ParakeetSTTService(
- api_key="your-nvidia-api-key",
- params=ParakeetSTTService.InputParams(
- language=Language.EN_US
- )
-)
-
-# Configure advanced options
-stt._profanity_filter = True
-stt._automatic_punctuation = True
-stt._boosted_lm_words = ["PipeCat", "AI", "speech"]
-```
-
-## Notes
-
-- Uses NVIDIA's Riva AI Services platform
-- Handles streaming audio input
-- Provides real-time transcription results
-- Manages connection lifecycle
-- Uses asyncio for asynchronous processing
-- Automatically cleans up resources on stop/cancel
diff --git a/server/services/stt/riva.mdx b/server/services/stt/riva.mdx
new file mode 100644
index 0000000..03eddad
--- /dev/null
+++ b/server/services/stt/riva.mdx
@@ -0,0 +1,292 @@
+---
+title: "NVIDIA Riva"
+description: "Speech-to-text service implementation using NVIDIA Riva"
+---
+
+## Overview
+
+`RivaSTTService` provides real-time speech-to-text capabilities using NVIDIA's Riva Parakeet model. It supports interim results and configurable recognition parameters for enhanced accuracy. `RivaSegmentedSTTService` provides speech-to-text capabilities via NVIDIA's Riva Canary model.
+
+## Installation
+
+To use `RivaSTTService` or `RivaSegmentedSTTService`, install the required dependencies:
+
+```bash
+pip install "pipecat-ai[riva]"
+```
+
+You'll also need to set up your NVIDIA API key as an environment variable: `NVIDIA_API_KEY`.
+
+
+ You can obtain an NVIDIA API key by signing up through [NVIDIA's developer
+ portal](https://developer.nvidia.com).
+
+
+## RivaSTTService
+
+### Configuration
+
+
+ Your NVIDIA API key
+
+
+
+ NVIDIA Riva server address
+
+
+
+ A mapping of the NVIDIA function identifier for the STT service with the model name.
+
+
+
+ Audio sample rate in Hz
+
+
+
+ Additional configuration parameters
+
+
+#### InputParams
+
+
+ The language for speech recognition
+
+
+### Input
+
+The service processes audio frames containing:
+
+- Raw PCM audio data
+- 16-bit depth
+- Single channel (mono)
+
+### Output Frames
+
+#### TranscriptionFrame
+
+Generated for final transcriptions, containing:
+
+
+ Transcribed text
+
+
+
+ User identifier
+
+
+
+ ISO 8601 formatted timestamp
+
+
+
+ Language used for transcription
+
+
+#### InterimTranscriptionFrame
+
+Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results.
+
+## RivaSegmentedSTTService
+
+### Configuration
+
+
+ Your NVIDIA API key
+
+
+
+ NVIDIA Riva server address
+
+
+
+ A mapping of the NVIDIA function identifier for the STT service with the model name.
+
+
+
+ Audio sample rate in Hz
+
+
+
+ Additional configuration parameters
+
+
+#### InputParams
+
+
+ The language for speech recognition
+
+
+### Input
+
+The service processes audio frames containing:
+
+- Raw audio bytes in WAV format
+
+### Output Frames
+
+#### TranscriptionFrame
+
+Generated for final transcriptions, containing:
+
+
+ Transcribed text
+
+
+
+ User identifier
+
+
+
+ ISO 8601 formatted timestamp
+
+
+
+ Language used for transcription
+
+
+#### InterimTranscriptionFrame
+
+Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results.
+
+## Methods
+
+See the [STT base class methods](/server/base-classes/speech#methods) for additional functionality.
+
+## Models
+
+| Model | Pipecat Class | Model Card Link |
+| ------------------------- | ----------------------- | ------------------------------------------------------------------------------------ |
+| `parakeet-ctc-1.1b-asr` | RivaSTTService | [NVIDIA Model Card](https://build.nvidia.com/nvidia/parakeet-ctc-1_1b-asr/modelcard) |
+| `canary-1b-asr` | RivaSegmentedSTTService | [NVIDIA Model Card](https://build.nvidia.com/nvidia/canary-1b-asr/modelcard) |
+
+## Usage Examples
+
+### RivaSTTService
+
+```python
+from pipecat.services.riva.stt import RivaSTTService
+from pipecat.transcriptions.language import Language
+
+# Configure service
+stt = RivaSTTService(
+ api_key="your-nvidia-api-key",
+ params=RivaSTTService.InputParams(
+ language=Language.EN_US
+ )
+)
+
+# Use in pipeline
+pipeline = Pipeline([
+ transport.input(),
+ stt,
+ llm,
+ ...
+])
+```
+
+### RivaSegmentedSTTService
+
+```python
+from pipecat.services.riva.stt import RivaSegmentedSTTService
+from pipecat.transcriptions.language import Language
+
+# Configure service
+stt = RivaSegmentedSTTService(
+ api_key="your-nvidia-api-key",
+ params=RivaSegmentedSTTService.InputParams(
+ language=Language.EN_US
+ )
+)
+
+# Use in pipeline
+pipeline = Pipeline([
+ transport.input(),
+ stt,
+ llm,
+ ...
+])
+```
+
+## Language Support
+
+Riva model `parakeet-ctc-1.1b-asr` (default) primarily supports English with various regional accents:
+
+| Language Code | Description | Service Codes |
+| ---------------- | ------------ | ------------- |
+| `Language.EN_US` | English (US) | `en-US` |
+
+## Frame Flow
+
+```mermaid
+graph TD
+ A[InputAudioRawFrame] --> B[RivaSTTService]
+ B --> C[InterimTranscriptionFrame]
+ B --> D[TranscriptionFrame]
+ B --> E[ErrorFrame]
+ C --> F[Real-time Processing]
+ D --> G[Final Processing]
+```
+
+## Advanced Configuration
+
+The service supports several advanced configuration options that can be adjusted:
+
+
+ Filter profanity from transcription
+
+
+
+ Automatically add punctuation
+
+
+
+ Whether to disable verbatim transcripts
+
+
+
+ List of words to boost in the language model
+
+
+
+ Score applied to boosted words
+
+
+## Example with Advanced Configuration
+
+```python
+# Configure service with advanced parameters
+stt = RivaSTTService(
+ api_key="your-nvidia-api-key",
+ params=RivaSTTService.InputParams(
+ language=Language.EN_US
+ )
+)
+
+# Configure advanced options
+stt._profanity_filter = True
+stt._automatic_punctuation = True
+stt._boosted_lm_words = ["PipeCat", "AI", "speech"]
+```
+
+## Notes
+
+- Uses NVIDIA's Riva AI Services platform
+- Handles streaming audio input
+- Provides real-time transcription results
+- Manages connection lifecycle
+- Uses asyncio for asynchronous processing
+- Automatically cleans up resources on stop/cancel
diff --git a/server/services/supported-services.mdx b/server/services/supported-services.mdx
index 6d45937..175597f 100644
--- a/server/services/supported-services.mdx
+++ b/server/services/supported-services.mdx
@@ -14,19 +14,19 @@ description: "AI services integrated with Pipecat and their setup requirements"
## Speech-to-Text
-| Service | Setup |
-| ------------------------------------------------ | -------------------------------------- |
-| [AssemblyAI](/server/services/stt/assemblyai) | `pip install "pipecat-ai[assemblyai]"` |
-| [Azure](/server/services/stt/azure) | `pip install "pipecat-ai[azure]"` |
-| [Deepgram](/server/services/stt/deepgram) | `pip install "pipecat-ai[deepgram]"` |
-| [Fal Wizper](/server/services/stt/fal) | `pip install "pipecat-ai[fal]"` |
-| [Gladia](/server/services/stt/gladia) | `pip install "pipecat-ai[gladia]"` |
-| [Google](/server/services/stt/google) | `pip install "pipecat-ai[google]"` |
-| [Groq (Whisper)](/server/services/stt/groq) | `pip install "pipecat-ai[groq]"` |
-| [NVIDIA Parakeet](/server/services/stt/parakeet) | `pip install "pipecat-ai[riva]"` |
-| [OpenAI (Whisper)](/server/services/stt/openai) | `pip install "pipecat-ai[openai]"` |
-| [Ultravox](/server/services/stt/ultravox) | `pip install "pipecat-ai[ultravox]"` |
-| [Whisper](/server/services/stt/whisper) | `pip install "pipecat-ai[whisper]"` |
+| Service | Setup |
+| ----------------------------------------------- | -------------------------------------- |
+| [AssemblyAI](/server/services/stt/assemblyai) | `pip install "pipecat-ai[assemblyai]"` |
+| [Azure](/server/services/stt/azure) | `pip install "pipecat-ai[azure]"` |
+| [Deepgram](/server/services/stt/deepgram) | `pip install "pipecat-ai[deepgram]"` |
+| [Fal Wizper](/server/services/stt/fal) | `pip install "pipecat-ai[fal]"` |
+| [Gladia](/server/services/stt/gladia) | `pip install "pipecat-ai[gladia]"` |
+| [Google](/server/services/stt/google) | `pip install "pipecat-ai[google]"` |
+| [Groq (Whisper)](/server/services/stt/groq) | `pip install "pipecat-ai[groq]"` |
+| [NVIDIA Riva](/server/services/stt/riva) | `pip install "pipecat-ai[riva]"` |
+| [OpenAI (Whisper)](/server/services/stt/openai) | `pip install "pipecat-ai[openai]"` |
+| [Ultravox](/server/services/stt/ultravox) | `pip install "pipecat-ai[ultravox]"` |
+| [Whisper](/server/services/stt/whisper) | `pip install "pipecat-ai[whisper]"` |
## Large Language Models
@@ -52,24 +52,24 @@ description: "AI services integrated with Pipecat and their setup requirements"
## Text-to-Speech
-| Service | Setup |
-| -------------------------------------------------- | -------------------------------------- |
-| [Amazon Polly](/server/services/tts/aws) | `pip install "pipecat-ai[aws]"` |
-| [Azure](/server/services/tts/azure) | `pip install "pipecat-ai[azure]"` |
-| [Cartesia](/server/services/tts/cartesia) | `pip install "pipecat-ai[cartesia]"` |
-| [Deepgram](/server/services/tts/deepgram) | `pip install "pipecat-ai[deepgram]"` |
-| [ElevenLabs](/server/services/tts/elevenlabs) | `pip install "pipecat-ai[elevenlabs]"` |
-| [Fish](/server/services/tts/fish) | `pip install "pipecat-ai[fish]"` |
-| [Google](/server/services/tts/google) | `pip install "pipecat-ai[google]"` |
-| [Groq](/server/services/tts/groq) | `pip install "pipecat-ai[groq]"` |
-| [LMNT](/server/services/tts/lmnt) | `pip install "pipecat-ai[lmnt]"` |
-| [Neuphonic](/server/services/tts/neuphonic) | `pip install "pipecat-ai[neuphonic]"` |
-| [NVIDIA FastPitch](/server/services/tts/fastpitch) | `pip install "pipecat-ai[riva]"` |
-| [OpenAI](/server/services/tts/openai) | `pip install "pipecat-ai[openai]"` |
-| [Piper](/server/services/tts/piper) | No dependencies required |
-| [PlayHT](/server/services/tts/playht) | `pip install "pipecat-ai[playht]"` |
-| [Rime](/server/services/tts/rime) | `pip install "pipecat-ai[rime]"` |
-| [XTTS](/server/services/tts/xtts) | `pip install "pipecat-ai[xtts]"` |
+| Service | Setup |
+| --------------------------------------------- | -------------------------------------- |
+| [Amazon Polly](/server/services/tts/aws) | `pip install "pipecat-ai[aws]"` |
+| [Azure](/server/services/tts/azure) | `pip install "pipecat-ai[azure]"` |
+| [Cartesia](/server/services/tts/cartesia) | `pip install "pipecat-ai[cartesia]"` |
+| [Deepgram](/server/services/tts/deepgram) | `pip install "pipecat-ai[deepgram]"` |
+| [ElevenLabs](/server/services/tts/elevenlabs) | `pip install "pipecat-ai[elevenlabs]"` |
+| [Fish](/server/services/tts/fish) | `pip install "pipecat-ai[fish]"` |
+| [Google](/server/services/tts/google) | `pip install "pipecat-ai[google]"` |
+| [Groq](/server/services/tts/groq) | `pip install "pipecat-ai[groq]"` |
+| [LMNT](/server/services/tts/lmnt) | `pip install "pipecat-ai[lmnt]"` |
+| [Neuphonic](/server/services/tts/neuphonic) | `pip install "pipecat-ai[neuphonic]"` |
+| [NVIDIA Riva](/server/services/tts/riva) | `pip install "pipecat-ai[riva]"` |
+| [OpenAI](/server/services/tts/openai) | `pip install "pipecat-ai[openai]"` |
+| [Piper](/server/services/tts/piper) | No dependencies required |
+| [PlayHT](/server/services/tts/playht) | `pip install "pipecat-ai[playht]"` |
+| [Rime](/server/services/tts/rime) | `pip install "pipecat-ai[rime]"` |
+| [XTTS](/server/services/tts/xtts) | `pip install "pipecat-ai[xtts]"` |
## Speech-to-Speech
diff --git a/server/services/tts/fastpitch.mdx b/server/services/tts/riva.mdx
similarity index 54%
rename from server/services/tts/fastpitch.mdx
rename to server/services/tts/riva.mdx
index 5a409ec..4373c07 100644
--- a/server/services/tts/fastpitch.mdx
+++ b/server/services/tts/riva.mdx
@@ -1,15 +1,15 @@
---
-title: "NVIDIA FastPitch"
-description: "Text-to-speech service implementation using NVIDIA’s FastPitch model"
+title: "NVIDIA Riva"
+description: "Text-to-speech service implementation using NVIDIA Riva"
---
## Overview
-`FastPitchTTSService` converts text to speech using NVIDIA's Riva FastPitch TTS model. It provides high-quality text-to-speech synthesis with configurable voice options.
+`RivaTTSService` converts text to speech using NVIDIA's Riva. It provides high-quality text-to-speech synthesis with configurable voice options, including multilingual voices.
## Installation
-To use `FastPitchTTSService`, install the required dependencies:
+To use `RivaTTSService`, install the required dependencies:
```bash
pip install "pipecat-ai[riva]"
@@ -29,7 +29,7 @@ You'll also need to set up your NVIDIA API key as an environment variable: `NVID
NVIDIA Riva server address
-
+
Voice identifier to use for synthesis
@@ -38,11 +38,14 @@ You'll also need to set up your NVIDIA API key as an environment variable: `NVID
- NVIDIA function identifier for the TTS service
+ A mapping of the NVIDIA function identifier for the TTS service with the model name.
@@ -93,26 +96,37 @@ Signals the completion of audio generation.
See the [TTS base class methods](/server/base-classes/speech#ttsservice) for additional functionality.
+## Models
+
+| Model | Model Card Link |
+| ------------------------- | -------------------------------------------------------------------------------------- |
+| `magpie-tts-multilingual` | [NVIDIA Model Card](https://build.nvidia.com/nvidia/magpie-tts-multilingual/modelcard) |
+| `fastpitch-hifigan-tts` | [NVIDIA Model Card](https://build.nvidia.com/nvidia/fastpitch-hifigan-tts/modelcard) |
+
## Language Support
-FastPitch TTS primarily supports English with various regional accents:
+Riva model `magpie-tts-multilingual` (default) supports English, Spanish, and French:
+
+| Language Code | Description | Service Codes |
+| ---------------- | --------------- | ------------- |
+| `Language.EN_US` | English (US) | `en-US` |
+| `Language.ES-US` | Spanish (US) | `es-US` |
+| `Language.FR-FR` | French (France) | `fr-FR` |
-| Language Code | Description | Service Codes |
-| ---------------- | ------------ | ------------- |
-| `Language.EN_US` | English (US) | `en-US` |
+## Usage Examples
-## Usage Example
+### TTS Language and Voice Configuration
```python
-from pipecat.services.riva.tts import FastPitchTTSService
+from pipecat.services.riva.tts import RivaTTSService
from pipecat.transcriptions.language import Language
# Configure service
-tts = FastPitchTTSService(
+tts = RivaTTSService(
api_key="your-nvidia-api-key",
- voice_id="English-US.Female-1",
- params=FastPitchTTSService.InputParams(
- language=Language.EN_US,
+ voice_id="Magpie-Multilingual.FR-FR.Louise",
+ params=RivaTTSService.InputParams(
+ language=Language.FR_FR,
quality=20
)
)
@@ -126,11 +140,33 @@ pipeline = Pipeline([
])
```
+### Model, function ID, and Voice configuration
+
+```python
+# Configure TTS with specific language
+tts = RivaTTSService(
+ api_key="your-nvidia-api-key",
+ voice_id="English-US.Female-1",
+ model_function_map={
+ "function_id": "0149dedb-2be8-4195-b9a0-e57e0e14f972",
+ "model_name": "fastpitch-hifigan-tts",
+ }
+)
+
+# Use in pipeline
+pipeline = Pipeline([
+ ...,
+ llm,
+ tts,
+ transport.output(),
+])
+```
+
## Frame Flow
```mermaid
graph TD
- A[TextFrame] --> B[FastPitchTTSService]
+ A[TextFrame] --> B[RivaTTSService]
B --> C[TTSStartedFrame]
B --> D[TTSAudioRawFrame]
B --> E[TTSStoppedFrame]