Skip to content

GameoCoder/ESP32-ChatBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ESP32: Local AI Chatbot

Python Arch Linux NVIDIA Status

A fully offline voice assistant that runs entirely on your local network. It captures audio via an ESP32, streams it over UDP to a Laptop over the network, and processes it using OpenAI Whisper (Speech-to-Text), Gemma 2 / Qwen (LLM), and Kokoro (Text-to-Speech).

Author: GameoCoder


Features

  • Local & Private: Audio is not sent to the cloud. Everything runs on your hardware.
  • CUDA Enabled: Uses faster-whisper and CUDA 12 for near-instant transcription.
  • OpenWakeWord: Listens for wake words (like "Hey Jarvis", "Alexa" etc.) efficiently using openwakeword (ONNX).
  • AI Model for Text-To-Speech: Speaks back using Kokoro v1.0 (ONNX), a high-quality, lightweight 82M parameter voice model.
  • UDP Audio Streaming: Real-time raw audio transmission from ESP32 to Python.
  • Self-Correcting Environment: Automatically fixes NVIDIA library paths and symlinks on startup.

Hardware Requirements

  • Microcontroller: ESP32 (I have the DOIT DEVKIT V1)
  • Microphone: INMP441 (I2S Omnidirectional Mic)
  • Server PC:
    • OS: Arch Linux (Recommended) / Linux
    • GPU: NVIDIA RTX 20 Series or higher (Tested on RTX 2050 with 4gb of VRAM)
    • RAM: 16GB+ recommended

Software Requirements

Before setting up the project, ensure you have the following installed on your system:

1. System Dependencies (Arch Linux)

You need espeak-ng for the TTS engine to work.

sudo pacman -S python base-devel cuda espeak-ng ffmpeg

2. Python Package Manager

This project uses PDM for dependency management.

pip install pdm

3. LLM Backend (Ollama)

You need Ollama running to handle the intelligence.

  1. Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
  2. Pull the recommended models (Gemma 2 is used for more interactive chats):
ollama pull gemma2:2b
  • (NOTE) I used the Gemma2 model with version gemma-2-2b-it-Q4_K_M and manually installed it in ollama by creating a Modelfile.
  1. Start the server:
ollama serve

Installation & Setup

1. Clone the Repository

git clone https://github.com/GameoCoder/ESP32-ChatBot.git
cd ESP32-ChatBot

2. Initialize Python Environment

pdm init

⚠️ Important - Select Python 3.12 when prompted

3. Install Python Dependencies

⚠️ Important: I have used the onnxruntime so we need to install openwakeword carefully to avoid conflicts with tflite-runtime on Linux. Follow the steps exactly:

  1. After doing pdm init, pyproject.toml will be created, where you have to paste this block of code (if it exists, skip to step 2) to exclude tflite-runtime
[tool.pdm.resolution]
excludes = ["tflite-runtime"]
  1. Now, install the necessary modules
pdm add faster-whisper ollama nvidia-cublas-cu12 nvidia-cudnn-cu12 onnxruntime kokoro-onnx soundfile openwakeword

4. Download TTS Models

You need to manually download the Kokoro v1.0 model files.

mkdir models

# Model
wget -P models/ https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx

# Voices
wget -P models/ https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin

How to Run

1. Flash the ESP32

  1. Open the arduino_code/ folder in Arduino IDE.
  2. Update the ssid and password with your WiFi credentials.
  3. Update the UDP_IP with your Laptop's Local IP (Run ip addr to find it).
  4. Upload to ESP32.

2. Start the Python Server

Ensure your ESP32 is powered on and your computer is connected to the same network your ESP32 is connected on. After verifying everything start the server.

pdm run start

Note: The script includes a self-correction block. If it's the first run, it may restart itself once to apply NVIDIA driver paths.


Project Structure

It should look like this after all the initial setup steps

.
├── arduino_code/           # C++ code for ESP32
│   └── mic_stream.ino      # Handles I2S mic & UDP streaming
├── models/                 # AI Model files
│   ├── kokoro-v1.0.onnx    # TTS Model
│   └── voices-v1.0.bin     # TTS Voices
├── microphone.py           # Main Python server (Wake Word + Logic + TTS)
├── pyproject.toml          # PDM dependency file
├── pdm.lock                # Locked versions
└── README.md               # You are here


Troubleshooting

1. "Library libcublas.so.12 not found"

  • Fix: The script auto-fixes this. If it persists, ensure you have run pdm add nvidia-cublas-cu12.

2. "Wake word triggers repeatedly"

  • Fix: Ensure ww_model.reset() is called in the Python script logic.

3. "TTS Error / No Audio"

  • Fix: Ensure you installed espeak-ng via pacman. Also check that ffplay is installed (sudo pacman -S ffmpeg).

Built with ❤️ by GameoCoder

About

A basic chatbot integrating INMP441 and ESP32 and possibly MAX98357 to output sound using a 4ohm 5w speaker

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors