Push-to-Talk Voice Chat for OpenClaw Channels
Connect to Telegram, Discord, Slack, or any OpenClaw channel and interact using voice or text.
Messages are transcribed via STT, sent to the AI agent, and responses stream back with configurable TTS.
Install with AI · Features · Quick Start · TTS · STT · Config · AI Guide · 한국어
Just paste this to your AI coding agent (Claude Code, Codex, Cursor, Gemini CLI, etc.):
Install claw-voice-chat following the guide at: https://github.com/GreenSheep01201/claw-voice-chatThe AI will read this README and handle everything automatically.
- Features
- Architecture
- Requirements
- Quick Start
- TTS Providers
- Usage
- Remote Access (Mobile)
- Environment Variables
- AI Setup Guide
- License
- Push-to-talk voice input with real-time STT (faster-whisper)
- Channel bridge — select any active OpenClaw session and talk to it
- Streaming transcript — agent responses arrive token by token
- Configurable TTS — Browser (Web Speech API), OpenAI, Qwen/DashScope, or Custom endpoint
- STT language selection — language hint for faster-whisper (Korean, English, Japanese, Chinese, etc.)
- Local TTS server — included edge-tts wrapper for high-quality TTS without API keys
- Voice preview — test TTS voices before saving
- Model catalog — browse models from connected providers
- Text input — type messages with
Ctrl+Enter/Cmd+Enter - Standalone LLM mode — works without channel connection using a local LLM backend
Browser (React + Tailwind)
|
| port 8888 (HTTP + WebSocket)
v
Express Server (Node.js)
|
|--- /bridge/* --> OpenClaw Gateway (port 18789)
|--- /bridge/tts --> TTS Proxy (OpenAI / Qwen / Custom / Local)
|--- /api/* /ws/* --> STT/TTS Backend (port 8766) [optional]
|
v
OpenClaw Gateway --> Telegram, Discord, Slack, Signal, ...
| Mode | Requirements | Description |
|---|---|---|
| Channel Bridge | Node.js + OpenClaw Gateway | Text/voice to channels. Browser or external TTS for responses. |
| Standalone LLM | Node.js + Python STT/TTS backend | Full voice pipeline: push-to-talk, local STT, LLM, audio TTS. |
Both modes can run simultaneously. The Python backend is only needed for push-to-talk STT.
- Node.js 22+ (download)
- OpenClaw Gateway running locally (for channel bridge)
- Python 3.10+ (only for local TTS server or STT backend — optional)
git clone https://github.com/GreenSheep01201/claw-voice-chat.git
cd claw-voice-chat
npm install && cd client && npm install && cd ../server && npm install && cd ..
npm run stt:install # Python STT backend dependenciesnpm install -g openclaw
openclaw setup # connect channels, create config
openclaw gateway run # starts on port 18789cp .env.example .envEdit .env:
PORT=8888
NODE_ENV=production
OPENCLAW_GATEWAY_URL=http://127.0.0.1:18789
OPENCLAW_GATEWAY_TOKEN=your-token-here
# Model catalog — path to openclaw CLI binary or openclaw.mjs entry point.
# Required for OAuth provider models (GitHub Copilot, Google Antigravity, etc.)
# to appear in the Options model picker.
OPENCLAW_CLI=openclawGet your token:
- macOS/Linux:
cat ~/.openclaw/openclaw.json | grep token - Windows:
type %USERPROFILE%\.openclaw\openclaw.json | findstr token
Or extract it programmatically:
python -c "import json; print(json.load(open('$HOME/.openclaw/openclaw.json'))['gateway']['auth']['token'])"npm run build
npm start # starts Express (8888) + STT backend (8766) concurrentlyTo run only the Express server without STT:
npm run start:server
npm run dev # Vite (5173) + Express (8888) + STT (8766) concurrentlyConfigure in Options > TTS / STT tab.
| Provider | Setup | Quality | Latency |
|---|---|---|---|
| Browser | Built-in, no setup | Varies by OS | Instant |
| OpenAI | API key required | Excellent | ~1s |
| Qwen/DashScope | API key required | Good | ~1s |
| Custom | Any OpenAI-compatible endpoint | Varies | Varies |
| Local (edge-tts) | pip install edge-tts |
Excellent | ~2s |
High-quality TTS without API keys. Works on macOS, Linux, and Windows.
Setup:
pip install edge-tts fastapi uvicorn
python tts-local/server.pyConnect in UI:
- Options > TTS / STT tab
- Select Custom
- URL:
http://localhost:5050/v1/audio/speech - Leave API Key empty
- Voice:
sunhi(Korean),echo(English),nanami(Japanese) - Click Preview Voice to test
Available voices:
| Language | Voices |
|---|---|
| Korean | sunhi, inwoo, hyunsu |
| English | alloy, nova, echo, onyx, shimmer |
| Japanese | nanami, keita |
| Chinese | xiaoxiao, yunxi, xiaoyi |
Run in background:
# macOS/Linux
nohup python tts-local/server.py > /tmp/tts-local.log 2>&1 &
# Windows (PowerShell)
Start-Process -NoNewWindow python -ArgumentList "tts-local/server.py"Verify:
curl http://127.0.0.1:5050/health
# {"ok":true,"backend":"edge"}The included stt-backend/ provides real-time speech-to-text using faster-whisper. It starts automatically with npm start.
Manual startup (if running separately):
npm run stt:install # pip install -r stt-backend/requirements.txt
npm run stt:start # starts on port 8766Configuration:
STT model size and language can be configured in the Options > TTS / STT tab in the UI. Changes take effect on the next WebSocket connection (reconnect).
| Setting | Options | Default | Description |
|---|---|---|---|
| Model Size | Tiny, Base, Small, Medium, Large v3 | Medium | Accuracy vs speed trade-off |
| Language | Auto-detect, Korean, English, Japanese, + 12 more | Auto (browser locale) | Language hint for recognition |
Environment variables (.env) set the server-side defaults:
| Variable | Default | Description |
|---|---|---|
STT_MODEL_SIZE |
medium |
Default model when client doesn't specify |
STT_DEVICE |
auto |
Device: auto, cpu, cuda |
STT_COMPUTE_TYPE |
int8 |
Compute type: int8, float16, float32 |
Models are cached in memory — switching sizes in the UI loads the new model once and reuses it for subsequent connections.
- Click Connect to establish the WebSocket connection
- Click Enable Audio to unlock browser audio
- Select a channel from the dropdown (e.g., Telegram bot session)
- Hold to Speak — hold the button, speak, release to send
- Or type in the text box and press
Ctrl+Enter/Cmd+Enter - Toggle TTS On/Off to control voice output
Microphone access requires a secure context (HTTPS or localhost). When accessing from a phone, tablet, or another machine over plain HTTP, the browser blocks microphone input silently.
Recommended: Tailscale HTTPS
Tailscale provides automatic HTTPS certificates for devices on your tailnet.
# Expose the voice-chat server (port 8888) over Tailscale HTTPS
tailscale serve --bg 8888Access from mobile: https://your-machine.tail12345.ts.net/
Important: Do NOT append
:8888to the Tailscale URL. Tailscale serves HTTPS on port 443 and proxies internally to 8888. Accessinghttp://your-machine:8888directly is plain HTTP and microphone will not work.
Verify HTTPS is active:
curl -sk https://your-machine.tail12345.ts.net/healthz
# Expected: {"ok":true,"port":8888,...}Stop Tailscale serve:
tailscale serve --https=443 off| Variable | Required | Default | Description |
|---|---|---|---|
PORT |
No | 8888 |
Express server port |
NODE_ENV |
No | development |
production serves built client |
VCB_BACKEND_HTTP |
No | http://127.0.0.1:8766 |
STT/TTS backend URL |
OPENCLAW_GATEWAY_URL |
For bridge | http://127.0.0.1:18789 |
OpenClaw gateway URL |
OPENCLAW_GATEWAY_TOKEN |
For bridge | — | Gateway auth token |
OPENCLAW_CLI |
For models | — | Path to openclaw binary or openclaw.mjs. Required for OAuth models (GitHub Copilot, etc.) |
STT_MODEL_SIZE |
No | medium |
faster-whisper model size (tiny/base/small/medium/large-v3) |
STT_DEVICE |
No | auto |
STT device (auto/cpu/cuda) |
STT_COMPUTE_TYPE |
No | int8 |
STT compute type (int8/float16/float32) |
All variables are loaded from .env at project root via dotenv.
See launchd/README.md for background service templates.
For AI coding assistants (Claude, Cursor, Copilot, etc.) setting up this project on a user's machine. Run each step sequentially. All commands are copy-paste ready. This guide is designed for one-shot execution — no manual editing required.
Detect the platform first. All subsequent steps use $IS_WIN to branch.
macOS / Linux (bash/zsh):
IS_WIN=falseWindows (PowerShell):
$IS_WIN = $true# macOS / Linux
node --version # Must be 22+. If missing: https://nodejs.org/
python --version # Must be 3.10+. If missing: https://www.python.org/downloads/# Windows PowerShell
node --version # Must be 22+. If missing: https://nodejs.org/
python --version # Must be 3.10+. If missing: https://www.python.org/downloads/If
nodeorpythonis not found, install them before proceeding. On Windows, install Python from python.org (not Microsoft Store) to avoid PATH issues.
The channel bridge requires a running OpenClaw gateway. Skip this step if the gateway is already installed and running.
# macOS / Linux
npm install -g openclaw
openclaw setup # interactive — connect channels, configure providers
openclaw gateway run & # starts gateway on port 18789 in background
sleep 3 # wait for gateway to initialize# Windows PowerShell
npm install -g openclaw
openclaw setup
Start-Process -NoNewWindow -FilePath "openclaw" -ArgumentList "gateway","run"
Start-Sleep -Seconds 3If you already have
openclawinstalled, just ensure the gateway is running:curl -s http://127.0.0.1:18789/healthzshould return{"ok":true,...}
# macOS / Linux
git clone https://github.com/GreenSheep01201/claw-voice-chat.git
cd claw-voice-chat
npm install && cd client && npm install && cd ../server && npm install && cd ..
pip install -r stt-backend/requirements.txt# Windows PowerShell
git clone https://github.com/GreenSheep01201/claw-voice-chat.git
cd claw-voice-chat
npm install; cd client; npm install; cd ..\server; npm install; cd ..
pip install -r stt-backend\requirements.txt# macOS / Linux — fully automatic, no manual editing needed
cp .env.example .env
# Auto-detect gateway token
TOKEN=$(python -c "import json,os; print(json.load(open(os.path.expanduser('~/.openclaw/openclaw.json')))['gateway']['auth']['token'])" 2>/dev/null || echo "")
if [ -n "$TOKEN" ]; then
sed -i.bak "s/^OPENCLAW_GATEWAY_TOKEN=.*/OPENCLAW_GATEWAY_TOKEN=$TOKEN/" .env && rm -f .env.bak
echo "OK: Token configured (${TOKEN:0:8}...)"
else
echo "WARNING: Gateway token not found. Set OPENCLAW_GATEWAY_TOKEN in .env manually."
echo " Hint: cat ~/.openclaw/openclaw.json | grep token"
fi
# Auto-detect openclaw CLI path
CLI_PATH=$(which openclaw 2>/dev/null || echo "")
if [ -z "$CLI_PATH" ]; then
for p in ../openclaw/openclaw.mjs ../../openclaw/openclaw.mjs /usr/local/lib/node_modules/openclaw/openclaw.mjs; do
if [ -f "$p" ]; then CLI_PATH=$(cd "$(dirname "$p")" && pwd)/$(basename "$p"); break; fi
done
fi
if [ -n "$CLI_PATH" ]; then
sed -i.bak "s|^OPENCLAW_CLI=.*|OPENCLAW_CLI=$CLI_PATH|" .env && rm -f .env.bak
echo "OK: OPENCLAW_CLI=$CLI_PATH"
else
echo "NOTE: openclaw CLI not found in PATH. Model catalog will be empty."
fi# Windows PowerShell — fully automatic
Copy-Item .env.example .env
# Auto-detect gateway token
try {
$config = Get-Content "$env:USERPROFILE\.openclaw\openclaw.json" | ConvertFrom-Json
$token = $config.gateway.auth.token
if ($token) {
(Get-Content .env) -replace '^OPENCLAW_GATEWAY_TOKEN=.*', "OPENCLAW_GATEWAY_TOKEN=$token" | Set-Content .env
Write-Host "OK: Token configured ($($token.Substring(0,8))...)"
}
} catch {
Write-Host "WARNING: Gateway token not found. Set OPENCLAW_GATEWAY_TOKEN in .env manually."
}
# Auto-detect openclaw CLI path
$cliPath = (Get-Command openclaw -ErrorAction SilentlyContinue).Source
if (-not $cliPath) {
foreach ($p in "..\openclaw\openclaw.mjs", "..\..\openclaw\openclaw.mjs") {
if (Test-Path $p) { $cliPath = (Resolve-Path $p).Path; break }
}
}
if ($cliPath) {
(Get-Content .env) -replace '^OPENCLAW_CLI=.*', "OPENCLAW_CLI=$cliPath" | Set-Content .env
Write-Host "OK: OPENCLAW_CLI=$cliPath"
} else {
Write-Host "NOTE: openclaw CLI not found. Model catalog will be empty."
}# macOS / Linux
npm run build
npm start & # starts Express (8888) + STT backend (8766) concurrently
sleep 5 # wait for servers to initialize (STT model downloads on first run)# Windows PowerShell
npm run build
Start-Process -NoNewWindow npm -ArgumentList "start"
Start-Sleep -Seconds 5First-run note: The STT backend downloads the whisper model on first launch. The
mediummodel is ~1.5 GB — download may take 1-3 minutes depending on bandwidth. Subsequent starts are instant (model is cached locally).
# Health check — should return {"ok":true,"port":8888,...}
curl -s http://127.0.0.1:8888/healthz
# Channel targets — should return {"ok":true,"targets":[...]}
curl -s 'http://127.0.0.1:8888/bridge/targets?limit=5'
# If token error: check OPENCLAW_GATEWAY_TOKEN in .env
# Model catalog — should show available model count
curl -s http://127.0.0.1:8888/bridge/models | python -c "import json,sys; d=json.load(sys.stdin); print(f'Models: {len(d.get(\"models\",[]))} available')"# Windows PowerShell
Invoke-RestMethod http://127.0.0.1:8888/healthz
Invoke-RestMethod 'http://127.0.0.1:8888/bridge/targets?limit=5'If all checks pass, open http://127.0.0.1:8888 in a browser.
Long-running tasks (image generation, video rendering, etc.) can block the voice chat agent from responding.
This step prepends a background task policy to the user's AGENTS.md so the agent delegates heavy work
to subagents via sessions_spawn and stays responsive.
Important: This must prepend (not overwrite) the existing AGENTS.md.
# macOS / Linux
AGENTS_FILE="$HOME/.openclaw/workspace/AGENTS.md"
MARKER="<!-- BEGIN claw-voice-chat background task policy -->"
# Only add if not already present
if [ ! -f "$AGENTS_FILE" ] || ! grep -q "$MARKER" "$AGENTS_FILE" 2>/dev/null; then
mkdir -p "$(dirname "$AGENTS_FILE")"
POLICY=$(cat <<'POLICY_EOF'
<!-- BEGIN claw-voice-chat background task policy -->
## Background Task Policy (Voice Chat Responsiveness)
> Added by claw-voice-chat setup.
> Ensures the agent stays responsive during long-running tasks.
When a user requests a task estimated to take **more than 10 seconds** (image generation,
video rendering, large file processing, web scraping, etc.), **ALWAYS** delegate it to a subagent
using `sessions_spawn`:
1. **Spawn** the task with a descriptive label and appropriate timeout
2. **Respond immediately** in the user's language (e.g. "Started! I'll let you know when it's done.")
3. **Continue handling** new messages while the subagent works in the background
4. The subagent will **announce results** back to this chat when done
**Always respond in the same language the user is speaking.**
**NEVER block the conversation** with synchronous long-running tool calls.
If unsure whether a task is long-running, default to spawning a subagent.
---
<!-- END claw-voice-chat background task policy -->
POLICY_EOF
)
if [ -f "$AGENTS_FILE" ]; then
# Prepend policy to existing file
EXISTING=$(cat "$AGENTS_FILE")
printf '%s\n\n%s' "$POLICY" "$EXISTING" > "$AGENTS_FILE"
else
echo "$POLICY" > "$AGENTS_FILE"
fi
echo "OK: Background task policy added to AGENTS.md"
else
echo "OK: Background task policy already present in AGENTS.md"
fi# Windows PowerShell
$agentsFile = "$env:USERPROFILE\.openclaw\workspace\AGENTS.md"
$marker = "<!-- BEGIN claw-voice-chat background task policy -->"
$exists = $false
if (Test-Path $agentsFile) {
$exists = (Get-Content $agentsFile -Raw) -match [regex]::Escape($marker)
}
if (-not $exists) {
$dir = Split-Path $agentsFile
if (-not (Test-Path $dir)) { New-Item -ItemType Directory -Path $dir -Force | Out-Null }
$policy = @"
<!-- BEGIN claw-voice-chat background task policy -->
## Background Task Policy (Voice Chat Responsiveness)
> Added by claw-voice-chat setup.
> Ensures the agent stays responsive during long-running tasks.
When a user requests a task estimated to take **more than 10 seconds** (image generation,
video rendering, large file processing, web scraping, etc.), **ALWAYS** delegate it to a subagent
using ``sessions_spawn``:
1. **Spawn** the task with a descriptive label and appropriate timeout
2. **Respond immediately** in the user's language (e.g. "Started! I'll let you know when it's done.")
3. **Continue handling** new messages while the subagent works in the background
4. The subagent will **announce results** back to this chat when done
**Always respond in the same language the user is speaking.**
**NEVER block the conversation** with synchronous long-running tool calls.
If unsure whether a task is long-running, default to spawning a subagent.
---
<!-- END claw-voice-chat background task policy -->
"@
if (Test-Path $agentsFile) {
$existing = Get-Content $agentsFile -Raw
Set-Content $agentsFile -Value "$policy`n`n$existing"
} else {
Set-Content $agentsFile -Value $policy
}
Write-Host "OK: Background task policy added to AGENTS.md"
} else {
Write-Host "OK: Background task policy already present in AGENTS.md"
}This ensures the AI agent delegates heavy tasks to background subagents and keeps responding to voice input during long operations.
High-quality TTS without API keys using edge-tts:
pip install edge-tts fastapi uvicorn
python tts-local/server.py &
sleep 2
curl -s http://127.0.0.1:5050/health
# Expected: {"ok":true,"backend":"edge"}Then configure in the UI: Options > TTS/STT > Custom > URL: http://localhost:5050/v1/audio/speech
Microphone requires HTTPS. Use Tailscale for automatic HTTPS certificates:
tailscale serve --bg 8888
curl -sk https://your-machine.tail12345.ts.net/healthzAccess from mobile: https://your-machine.tail12345.ts.net/
Do NOT use
http://your-machine:8888— plain HTTP blocks microphone access.
| Endpoint | Method | Description |
|---|---|---|
/healthz |
GET | Server health |
/bridge/healthz |
GET | Bridge health |
/bridge/targets |
GET | List channel sessions |
/bridge/attach |
POST | Attach to session (returns bridgeId) |
/bridge/stream |
GET | SSE event stream |
/bridge/inject |
POST | Send message to session (async 202) |
/bridge/models |
GET | List available models |
/bridge/tts |
POST | TTS proxy (OpenAI/Qwen/Custom) |
/api/* |
* | Proxy to STT/TTS backend |
/ws/chat |
WS | Voice chat WebSocket |
client/src/App.tsx # React UI (voice, chat, bridge, TTS/STT settings)
client/src/lib/audio.ts # PCM audio encoding (downsample, base64)
client/src/types.ts # TypeScript interfaces
server/src/index.ts # Express server (bridge, TTS proxy, static)
server/src/openclaw.ts # OpenClaw gateway client
server/src/bridge-inject.ts # Session resolution + message delivery
stt-backend/ # Python STT backend (faster-whisper)
stt-backend/app/stt.py # Whisper transcriber + streaming VAD
stt-backend/app/main.py # FastAPI + WebSocket entry point
stt-backend/requirements.txt # Python dependencies
tts-local/server.py # Local TTS server (edge-tts / CosyVoice)
.env.example # Environment template
| Symptom | Cause | Fix |
|---|---|---|
Cannot POST /bridge/tts |
Server running old build | npm run build then restart server |
OPENCLAW_GATEWAY_TOKEN is required |
Missing .env or token | Check .env file exists with valid token |
(no channel selected) |
Gateway not running or token wrong | Run openclaw gateway run, verify token |
| Models empty in Options | OPENCLAW_CLI not set in .env |
Set OPENCLAW_CLI=openclaw (or full path to openclaw.mjs) |
| Mic not working on mobile | Accessing via HTTP, not HTTPS | Use tailscale serve --bg 8888 and access via the HTTPS URL (no :8888 suffix) |
| Mic not working on localhost | Browser permission denied | Allow microphone in browser settings |
| TTS preview silent | Audio not unlocked | Click "Enable Audio" first |
Apache-2.0