A fully offline voice assistant that runs entirely on your local network. It captures audio via an ESP32, streams it over UDP to a Laptop over the network, and processes it using OpenAI Whisper (Speech-to-Text), Gemma 2 / Qwen (LLM), and Kokoro (Text-to-Speech).
Author: GameoCoder
- Local & Private: Audio is not sent to the cloud. Everything runs on your hardware.
- CUDA Enabled: Uses
faster-whisperand CUDA 12 for near-instant transcription. - OpenWakeWord: Listens for wake words (like "Hey Jarvis", "Alexa" etc.) efficiently using
openwakeword(ONNX). - AI Model for Text-To-Speech: Speaks back using Kokoro v1.0 (ONNX), a high-quality, lightweight 82M parameter voice model.
- UDP Audio Streaming: Real-time raw audio transmission from ESP32 to Python.
- Self-Correcting Environment: Automatically fixes NVIDIA library paths and symlinks on startup.
- Microcontroller: ESP32 (I have the DOIT DEVKIT V1)
- Microphone: INMP441 (I2S Omnidirectional Mic)
- Server PC:
- OS: Arch Linux (Recommended) / Linux
- GPU: NVIDIA RTX 20 Series or higher (Tested on RTX 2050 with 4gb of VRAM)
- RAM: 16GB+ recommended
Before setting up the project, ensure you have the following installed on your system:
You need espeak-ng for the TTS engine to work.
sudo pacman -S python base-devel cuda espeak-ng ffmpegThis project uses PDM for dependency management.
pip install pdmYou need Ollama running to handle the intelligence.
- Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh - Pull the recommended models (Gemma 2 is used for more interactive chats):
ollama pull gemma2:2b- (NOTE) I used the Gemma2 model with version
gemma-2-2b-it-Q4_K_Mand manually installed it in ollama by creating a Modelfile.
- Start the server:
ollama servegit clone https://github.com/GameoCoder/ESP32-ChatBot.git
cd ESP32-ChatBotpdm init
⚠️ Important - Select Python 3.12 when prompted
onnxruntime so we need to install openwakeword carefully to avoid conflicts with tflite-runtime on Linux. Follow the steps exactly:
- After doing
pdm init, pyproject.toml will be created, where you have to paste this block of code (if it exists, skip to step 2) to excludetflite-runtime
[tool.pdm.resolution]
excludes = ["tflite-runtime"]- Now, install the necessary modules
pdm add faster-whisper ollama nvidia-cublas-cu12 nvidia-cudnn-cu12 onnxruntime kokoro-onnx soundfile openwakewordYou need to manually download the Kokoro v1.0 model files.
mkdir models
# Model
wget -P models/ https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx
# Voices
wget -P models/ https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin- Open the
arduino_code/folder in Arduino IDE. - Update the
ssidandpasswordwith your WiFi credentials. - Update the
UDP_IPwith your Laptop's Local IP (Runip addrto find it). - Upload to ESP32.
Ensure your ESP32 is powered on and your computer is connected to the same network your ESP32 is connected on. After verifying everything start the server.
pdm run startNote: The script includes a self-correction block. If it's the first run, it may restart itself once to apply NVIDIA driver paths.
It should look like this after all the initial setup steps
.
├── arduino_code/ # C++ code for ESP32
│ └── mic_stream.ino # Handles I2S mic & UDP streaming
├── models/ # AI Model files
│ ├── kokoro-v1.0.onnx # TTS Model
│ └── voices-v1.0.bin # TTS Voices
├── microphone.py # Main Python server (Wake Word + Logic + TTS)
├── pyproject.toml # PDM dependency file
├── pdm.lock # Locked versions
└── README.md # You are here
1. "Library libcublas.so.12 not found"
- Fix: The script auto-fixes this. If it persists, ensure you have run
pdm add nvidia-cublas-cu12.
2. "Wake word triggers repeatedly"
- Fix: Ensure
ww_model.reset()is called in the Python script logic.
3. "TTS Error / No Audio"
- Fix: Ensure you installed
espeak-ngvia pacman. Also check thatffplayis installed (sudo pacman -S ffmpeg).