GitHub - kyutai-labs/invincible-voice: To bring back voice to those who lost it

InvincibleVoice is a real-time voice communication system designed to help people who cannot speak communicate naturally with others. The system uses speech-to-text (STT), large language model (LLM) and text-to-speech (TTS) technologies to enable fluid conversations.

Get more context about this project on our official page.

How it works

It is very similar to the Unmute project; The main difference is that, instead of having the TTS reads out whatever the LLM answers, we ask the LLM to provide multiple possible answers and the TTS only utters the one selected by the user. This experimental system has more features than just this, in particular in the ways the LLM can be personnalized and the UI allows additional guidance of its answers, but this is the core idea.

🚀 Getting Started

We plan to open soon an online version of the system, with TTS freely provived by Gradium. In the meantime, we explain below how to insall and run it locally, for developpers to start exploring it right away.

We provide two ways of doing this (see below); we recommend starting with the Gradium STT/TTS + Cerebras LLM option, as it is easier to set up, and moving to the fully self-hosted option (Kyutai STT/TTS + vLLM) once you are more confortable with the project.

Using Gradium for STT/TTS and an LLM service compatible with the OpenAI API

This is the easiest way to get started since it doesn't require a GPU nor much setup.

The LLM service

For the LLM, we recommend grabbing a key from Cerebras as their service has a very low latency and high throughput API, which is great for fast suggestions in the InvincibleVoice UI. The free tier is enough to get you started. We recommend:

export KYUTAI_LLM_URL=https://api.cerebras.ai/v1
export KYUTAI_LLM_MODEL=qwen-3-235b-a22b-instruct-2507
export KYUTAI_LLM_API_KEY=<your_cerebras_api_key>

Of course anything else works as long as it is OpenAI compatible. Latency and throughput are important here to have a fluid experience, keep that in mind when choosing your provider and model.

The audio services

You can use Gradium for STT and TTS by grabbing an API key from them, the free tier should be enough to get you started. Then you need to set the following environment variables:

export GRADIUM_API_KEY=<your_gradium_api_key>
export TTS_VOICE_ID=<desired_voice_id_it_is_optional>
export TTS_SERVER=https://eu.api.gradium.ai/api/
export TTS_IS_GRADIUM=true
export KYUTAI_STT_URL=wss://eu.api.gradium.ai/api/speech/asr
export STT_IS_GRADIUM=true

and then

docker compose up

Using STT/TTS from Kyutai along with a self-hosted LLM

If you have enough compute power, you can run the entire stack locally. This has the advantage of complete privacy and complete ownership of stack and data, but is more involved. This is nice for commercial use.

The LLM service

We recommend using vLLM or an equivalent LLM engine. The environment variables you need to set are:

export KYUTAI_LLM_URL=http://localhost:8000
export KYUTAI_LLM_MODEL=qwen-3-235b-a22b-instruct-2507  # or similar

The audio services

You'll need to set up both STT and TTS servers from Kyutai. For the moment we only support the server of Delayed Stream Modelling. Then you need to set the following environment variables:

export STT_IS_GRADIUM=false
export TTS_IS_GRADIUM=false
export TTS_SERVER=<your_tts_server_url>
export KYUTAI_STT_URL=<your_stt_server_url>
export KYUTAI_API_KEY=<your_kyutai_api_key>

You can then start the project with docker:

docker compose up

Getting involved with the project

We welcome contributions from everyone! Whether you're a seasoned developer or new to open source, there are many ways to get involved. We recommend heading to the issues page to see what needs to be done. When we see that something is a good fit for the project, we'll tag it with the "Help wanted" label. Issues that can be done by newcomers will be tagged with the "Good first issue" label. Also don't hesitate to open issues yourself if you see something that could be improved or if you have ideas for new features.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
images		images
services		services
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
swarm-config.yml		swarm-config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How it works

🚀 Getting Started

Using Gradium for STT/TTS and an LLM service compatible with the OpenAI API

The LLM service

The audio services

Using STT/TTS from Kyutai along with a self-hosted LLM

The LLM service

The audio services

Getting involved with the project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

kyutai-labs/invincible-voice

Folders and files

Latest commit

History

Repository files navigation

How it works

🚀 Getting Started

Using Gradium for STT/TTS and an LLM service compatible with the OpenAI API

The LLM service

The audio services

Using STT/TTS from Kyutai along with a self-hosted LLM

The LLM service

The audio services

Getting involved with the project

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages