ESP32: Local AI Chatbot

A fully offline voice assistant that runs entirely on your local network. It captures audio via an ESP32, streams it over UDP to a Laptop over the network, and processes it using OpenAI Whisper (Speech-to-Text), Gemma 2 / Qwen (LLM), and Kokoro (Text-to-Speech).

Author: GameoCoder

Features

Local & Private: Audio is not sent to the cloud. Everything runs on your hardware.
CUDA Enabled: Uses faster-whisper and CUDA 12 for near-instant transcription.
OpenWakeWord: Listens for wake words (like "Hey Jarvis", "Alexa" etc.) efficiently using openwakeword (ONNX).
AI Model for Text-To-Speech: Speaks back using Kokoro v1.0 (ONNX), a high-quality, lightweight 82M parameter voice model.
UDP Audio Streaming: Real-time raw audio transmission from ESP32 to Python.
Self-Correcting Environment: Automatically fixes NVIDIA library paths and symlinks on startup.

Hardware Requirements

Microcontroller: ESP32 (I have the DOIT DEVKIT V1)
Microphone: INMP441 (I2S Omnidirectional Mic)
Server PC:
- OS: Arch Linux (Recommended) / Linux
- GPU: NVIDIA RTX 20 Series or higher (Tested on RTX 2050 with 4gb of VRAM)
- RAM: 16GB+ recommended

Software Requirements

Before setting up the project, ensure you have the following installed on your system:

1. System Dependencies (Arch Linux)

You need espeak-ng for the TTS engine to work.

sudo pacman -S python base-devel cuda espeak-ng ffmpeg

2. Python Package Manager

This project uses PDM for dependency management.

pip install pdm

3. LLM Backend (Ollama)

You need Ollama running to handle the intelligence.

Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
Pull the recommended models (Gemma 2 is used for more interactive chats):

ollama pull gemma2:2b

(NOTE) I used the Gemma2 model with version gemma-2-2b-it-Q4_K_M and manually installed it in ollama by creating a Modelfile.

Start the server:

ollama serve

Installation & Setup

1. Clone the Repository

git clone https://github.com/GameoCoder/ESP32-ChatBot.git
cd ESP32-ChatBot

2. Initialize Python Environment

pdm init

⚠️ Important - Select Python 3.12 when prompted

3. Install Python Dependencies

⚠️ Important: I have used the onnxruntime so we need to install openwakeword carefully to avoid conflicts with tflite-runtime on Linux. Follow the steps exactly:

After doing pdm init, pyproject.toml will be created, where you have to paste this block of code (if it exists, skip to step 2) to exclude tflite-runtime

[tool.pdm.resolution]
excludes = ["tflite-runtime"]

Now, install the necessary modules

pdm add faster-whisper ollama nvidia-cublas-cu12 nvidia-cudnn-cu12 onnxruntime kokoro-onnx soundfile openwakeword

4. Download TTS Models

You need to manually download the Kokoro v1.0 model files.

mkdir models

# Model
wget -P models/ https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx

# Voices
wget -P models/ https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin

How to Run

1. Flash the ESP32

Open the arduino_code/ folder in Arduino IDE.
Update the ssid and password with your WiFi credentials.
Update the UDP_IP with your Laptop's Local IP (Run ip addr to find it).
Upload to ESP32.

2. Start the Python Server

Ensure your ESP32 is powered on and your computer is connected to the same network your ESP32 is connected on. After verifying everything start the server.

pdm run start

Note: The script includes a self-correction block. If it's the first run, it may restart itself once to apply NVIDIA driver paths.

Project Structure

It should look like this after all the initial setup steps

.
├── arduino_code/           # C++ code for ESP32
│   └── mic_stream.ino      # Handles I2S mic & UDP streaming
├── models/                 # AI Model files
│   ├── kokoro-v1.0.onnx    # TTS Model
│   └── voices-v1.0.bin     # TTS Voices
├── microphone.py           # Main Python server (Wake Word + Logic + TTS)
├── pyproject.toml          # PDM dependency file
├── pdm.lock                # Locked versions
└── README.md               # You are here

Troubleshooting

1. "Library libcublas.so.12 not found"

Fix: The script auto-fixes this. If it persists, ensure you have run pdm add nvidia-cublas-cu12.

2. "Wake word triggers repeatedly"

Fix: Ensure ww_model.reset() is called in the Python script logic.

3. "TTS Error / No Audio"

Fix: Ensure you installed espeak-ng via pacman. Also check that ffplay is installed (sudo pacman -S ffmpeg).

Built with ❤️ by GameoCoder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ESP32: Local AI Chatbot

Features

Hardware Requirements

Software Requirements

1. System Dependencies (Arch Linux)

2. Python Package Manager

3. LLM Backend (Ollama)

Installation & Setup

1. Clone the Repository

2. Initialize Python Environment

3. Install Python Dependencies

4. Download TTS Models

How to Run

1. Flash the ESP32

2. Start the Python Server

Project Structure

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
arduino_code		arduino_code
README.md		README.md
microphone.py		microphone.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

ESP32: Local AI Chatbot

Features

Hardware Requirements

Software Requirements

1. System Dependencies (Arch Linux)

2. Python Package Manager

3. LLM Backend (Ollama)

Installation & Setup

1. Clone the Repository

2. Initialize Python Environment

3. Install Python Dependencies

4. Download TTS Models

How to Run

1. Flash the ESP32

2. Start the Python Server

Project Structure

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages