Skip to content

GreenSheep01201/claw-voice-chat

Repository files navigation

Claw Voice Chat

Claw-Voice-Chat

Push-to-Talk Voice Chat for OpenClaw Channels
Connect to Telegram, Discord, Slack, or any OpenClaw channel and interact using voice or text.
Messages are transcribed via STT, sent to the AI agent, and responses stream back with configurable TTS.

Version Node.js 22+ Python 3.10+ License Platform STT TTS

Install with AI · Features · Quick Start · TTS · STT · Config · AI Guide · 한국어


Install with AI

Just paste this to your AI coding agent (Claude Code, Codex, Cursor, Gemini CLI, etc.):

Install claw-voice-chat following the guide at:
https://github.com/GreenSheep01201/claw-voice-chat

The AI will read this README and handle everything automatically.


Table of Contents


Features

  • Push-to-talk voice input with real-time STT (faster-whisper)
  • Channel bridge — select any active OpenClaw session and talk to it
  • Streaming transcript — agent responses arrive token by token
  • Configurable TTS — Browser (Web Speech API), OpenAI, Qwen/DashScope, or Custom endpoint
  • STT language selection — language hint for faster-whisper (Korean, English, Japanese, Chinese, etc.)
  • Local TTS server — included edge-tts wrapper for high-quality TTS without API keys
  • Voice preview — test TTS voices before saving
  • Model catalog — browse models from connected providers
  • Text input — type messages with Ctrl+Enter / Cmd+Enter
  • Standalone LLM mode — works without channel connection using a local LLM backend

Architecture

Browser (React + Tailwind)
   |
   | port 8888 (HTTP + WebSocket)
   v
Express Server (Node.js)
   |
   |--- /bridge/*      --> OpenClaw Gateway (port 18789)
   |--- /bridge/tts    --> TTS Proxy (OpenAI / Qwen / Custom / Local)
   |--- /api/* /ws/*   --> STT/TTS Backend (port 8766) [optional]
   |
   v
OpenClaw Gateway --> Telegram, Discord, Slack, Signal, ...

Operating Modes

Mode Requirements Description
Channel Bridge Node.js + OpenClaw Gateway Text/voice to channels. Browser or external TTS for responses.
Standalone LLM Node.js + Python STT/TTS backend Full voice pipeline: push-to-talk, local STT, LLM, audio TTS.

Both modes can run simultaneously. The Python backend is only needed for push-to-talk STT.

Requirements

  • Node.js 22+ (download)
  • OpenClaw Gateway running locally (for channel bridge)
  • Python 3.10+ (only for local TTS server or STT backend — optional)

Quick Start

1. Clone and install

git clone https://github.com/GreenSheep01201/claw-voice-chat.git
cd claw-voice-chat
npm install && cd client && npm install && cd ../server && npm install && cd ..
npm run stt:install   # Python STT backend dependencies

2. Set up OpenClaw Gateway

npm install -g openclaw
openclaw setup          # connect channels, create config
openclaw gateway run    # starts on port 18789

3. Configure environment

cp .env.example .env

Edit .env:

PORT=8888
NODE_ENV=production
OPENCLAW_GATEWAY_URL=http://127.0.0.1:18789
OPENCLAW_GATEWAY_TOKEN=your-token-here

# Model catalog — path to openclaw CLI binary or openclaw.mjs entry point.
# Required for OAuth provider models (GitHub Copilot, Google Antigravity, etc.)
# to appear in the Options model picker.
OPENCLAW_CLI=openclaw

Get your token:

  • macOS/Linux: cat ~/.openclaw/openclaw.json | grep token
  • Windows: type %USERPROFILE%\.openclaw\openclaw.json | findstr token

Or extract it programmatically:

python -c "import json; print(json.load(open('$HOME/.openclaw/openclaw.json'))['gateway']['auth']['token'])"

4. Build and run

npm run build
npm start       # starts Express (8888) + STT backend (8766) concurrently

Open http://127.0.0.1:8888

To run only the Express server without STT: npm run start:server

5. Development mode

npm run dev    # Vite (5173) + Express (8888) + STT (8766) concurrently

TTS Providers

Configure in Options > TTS / STT tab.

Provider Setup Quality Latency
Browser Built-in, no setup Varies by OS Instant
OpenAI API key required Excellent ~1s
Qwen/DashScope API key required Good ~1s
Custom Any OpenAI-compatible endpoint Varies Varies
Local (edge-tts) pip install edge-tts Excellent ~2s

Local TTS Server (edge-tts)

High-quality TTS without API keys. Works on macOS, Linux, and Windows.

Setup:

pip install edge-tts fastapi uvicorn
python tts-local/server.py

Connect in UI:

  1. Options > TTS / STT tab
  2. Select Custom
  3. URL: http://localhost:5050/v1/audio/speech
  4. Leave API Key empty
  5. Voice: sunhi (Korean), echo (English), nanami (Japanese)
  6. Click Preview Voice to test

Available voices:

Language Voices
Korean sunhi, inwoo, hyunsu
English alloy, nova, echo, onyx, shimmer
Japanese nanami, keita
Chinese xiaoxiao, yunxi, xiaoyi

Run in background:

# macOS/Linux
nohup python tts-local/server.py > /tmp/tts-local.log 2>&1 &

# Windows (PowerShell)
Start-Process -NoNewWindow python -ArgumentList "tts-local/server.py"

Verify:

curl http://127.0.0.1:5050/health
# {"ok":true,"backend":"edge"}

STT Backend (Push-to-Talk)

The included stt-backend/ provides real-time speech-to-text using faster-whisper. It starts automatically with npm start.

Manual startup (if running separately):

npm run stt:install   # pip install -r stt-backend/requirements.txt
npm run stt:start     # starts on port 8766

Configuration:

STT model size and language can be configured in the Options > TTS / STT tab in the UI. Changes take effect on the next WebSocket connection (reconnect).

Setting Options Default Description
Model Size Tiny, Base, Small, Medium, Large v3 Medium Accuracy vs speed trade-off
Language Auto-detect, Korean, English, Japanese, + 12 more Auto (browser locale) Language hint for recognition

Environment variables (.env) set the server-side defaults:

Variable Default Description
STT_MODEL_SIZE medium Default model when client doesn't specify
STT_DEVICE auto Device: auto, cpu, cuda
STT_COMPUTE_TYPE int8 Compute type: int8, float16, float32

Models are cached in memory — switching sizes in the UI loads the new model once and reuses it for subsequent connections.

Usage

  1. Click Connect to establish the WebSocket connection
  2. Click Enable Audio to unlock browser audio
  3. Select a channel from the dropdown (e.g., Telegram bot session)
  4. Hold to Speak — hold the button, speak, release to send
  5. Or type in the text box and press Ctrl+Enter / Cmd+Enter
  6. Toggle TTS On/Off to control voice output

Remote Access (Mobile / Other Devices)

Microphone access requires a secure context (HTTPS or localhost). When accessing from a phone, tablet, or another machine over plain HTTP, the browser blocks microphone input silently.

Recommended: Tailscale HTTPS

Tailscale provides automatic HTTPS certificates for devices on your tailnet.

# Expose the voice-chat server (port 8888) over Tailscale HTTPS
tailscale serve --bg 8888

Access from mobile: https://your-machine.tail12345.ts.net/

Important: Do NOT append :8888 to the Tailscale URL. Tailscale serves HTTPS on port 443 and proxies internally to 8888. Accessing http://your-machine:8888 directly is plain HTTP and microphone will not work.

Verify HTTPS is active:

curl -sk https://your-machine.tail12345.ts.net/healthz
# Expected: {"ok":true,"port":8888,...}

Stop Tailscale serve:

tailscale serve --https=443 off

Environment Variables

Variable Required Default Description
PORT No 8888 Express server port
NODE_ENV No development production serves built client
VCB_BACKEND_HTTP No http://127.0.0.1:8766 STT/TTS backend URL
OPENCLAW_GATEWAY_URL For bridge http://127.0.0.1:18789 OpenClaw gateway URL
OPENCLAW_GATEWAY_TOKEN For bridge Gateway auth token
OPENCLAW_CLI For models Path to openclaw binary or openclaw.mjs. Required for OAuth models (GitHub Copilot, etc.)
STT_MODEL_SIZE No medium faster-whisper model size (tiny/base/small/medium/large-v3)
STT_DEVICE No auto STT device (auto/cpu/cuda)
STT_COMPUTE_TYPE No int8 STT compute type (int8/float16/float32)

All variables are loaded from .env at project root via dotenv.

macOS LaunchAgent

See launchd/README.md for background service templates.


AI Setup Guide

For AI coding assistants (Claude, Cursor, Copilot, etc.) setting up this project on a user's machine. Run each step sequentially. All commands are copy-paste ready. This guide is designed for one-shot execution — no manual editing required.

Platform Detection

Detect the platform first. All subsequent steps use $IS_WIN to branch.

macOS / Linux (bash/zsh):

IS_WIN=false

Windows (PowerShell):

$IS_WIN = $true

Step 1: Check Prerequisites

# macOS / Linux
node --version    # Must be 22+. If missing: https://nodejs.org/
python --version  # Must be 3.10+. If missing: https://www.python.org/downloads/
# Windows PowerShell
node --version    # Must be 22+. If missing: https://nodejs.org/
python --version  # Must be 3.10+. If missing: https://www.python.org/downloads/

If node or python is not found, install them before proceeding. On Windows, install Python from python.org (not Microsoft Store) to avoid PATH issues.

Step 2: Install OpenClaw Gateway

The channel bridge requires a running OpenClaw gateway. Skip this step if the gateway is already installed and running.

# macOS / Linux
npm install -g openclaw
openclaw setup          # interactive — connect channels, configure providers
openclaw gateway run &  # starts gateway on port 18789 in background
sleep 3                 # wait for gateway to initialize
# Windows PowerShell
npm install -g openclaw
openclaw setup
Start-Process -NoNewWindow -FilePath "openclaw" -ArgumentList "gateway","run"
Start-Sleep -Seconds 3

If you already have openclaw installed, just ensure the gateway is running: curl -s http://127.0.0.1:18789/healthz should return {"ok":true,...}

Step 3: Clone and Install

# macOS / Linux
git clone https://github.com/GreenSheep01201/claw-voice-chat.git
cd claw-voice-chat
npm install && cd client && npm install && cd ../server && npm install && cd ..
pip install -r stt-backend/requirements.txt
# Windows PowerShell
git clone https://github.com/GreenSheep01201/claw-voice-chat.git
cd claw-voice-chat
npm install; cd client; npm install; cd ..\server; npm install; cd ..
pip install -r stt-backend\requirements.txt

Step 4: Configure .env (Auto)

# macOS / Linux — fully automatic, no manual editing needed
cp .env.example .env

# Auto-detect gateway token
TOKEN=$(python -c "import json,os; print(json.load(open(os.path.expanduser('~/.openclaw/openclaw.json')))['gateway']['auth']['token'])" 2>/dev/null || echo "")
if [ -n "$TOKEN" ]; then
  sed -i.bak "s/^OPENCLAW_GATEWAY_TOKEN=.*/OPENCLAW_GATEWAY_TOKEN=$TOKEN/" .env && rm -f .env.bak
  echo "OK: Token configured (${TOKEN:0:8}...)"
else
  echo "WARNING: Gateway token not found. Set OPENCLAW_GATEWAY_TOKEN in .env manually."
  echo "  Hint: cat ~/.openclaw/openclaw.json | grep token"
fi

# Auto-detect openclaw CLI path
CLI_PATH=$(which openclaw 2>/dev/null || echo "")
if [ -z "$CLI_PATH" ]; then
  for p in ../openclaw/openclaw.mjs ../../openclaw/openclaw.mjs /usr/local/lib/node_modules/openclaw/openclaw.mjs; do
    if [ -f "$p" ]; then CLI_PATH=$(cd "$(dirname "$p")" && pwd)/$(basename "$p"); break; fi
  done
fi
if [ -n "$CLI_PATH" ]; then
  sed -i.bak "s|^OPENCLAW_CLI=.*|OPENCLAW_CLI=$CLI_PATH|" .env && rm -f .env.bak
  echo "OK: OPENCLAW_CLI=$CLI_PATH"
else
  echo "NOTE: openclaw CLI not found in PATH. Model catalog will be empty."
fi
# Windows PowerShell — fully automatic
Copy-Item .env.example .env

# Auto-detect gateway token
try {
  $config = Get-Content "$env:USERPROFILE\.openclaw\openclaw.json" | ConvertFrom-Json
  $token = $config.gateway.auth.token
  if ($token) {
    (Get-Content .env) -replace '^OPENCLAW_GATEWAY_TOKEN=.*', "OPENCLAW_GATEWAY_TOKEN=$token" | Set-Content .env
    Write-Host "OK: Token configured ($($token.Substring(0,8))...)"
  }
} catch {
  Write-Host "WARNING: Gateway token not found. Set OPENCLAW_GATEWAY_TOKEN in .env manually."
}

# Auto-detect openclaw CLI path
$cliPath = (Get-Command openclaw -ErrorAction SilentlyContinue).Source
if (-not $cliPath) {
  foreach ($p in "..\openclaw\openclaw.mjs", "..\..\openclaw\openclaw.mjs") {
    if (Test-Path $p) { $cliPath = (Resolve-Path $p).Path; break }
  }
}
if ($cliPath) {
  (Get-Content .env) -replace '^OPENCLAW_CLI=.*', "OPENCLAW_CLI=$cliPath" | Set-Content .env
  Write-Host "OK: OPENCLAW_CLI=$cliPath"
} else {
  Write-Host "NOTE: openclaw CLI not found. Model catalog will be empty."
}

Step 5: Build and Start

# macOS / Linux
npm run build
npm start &    # starts Express (8888) + STT backend (8766) concurrently
sleep 5        # wait for servers to initialize (STT model downloads on first run)
# Windows PowerShell
npm run build
Start-Process -NoNewWindow npm -ArgumentList "start"
Start-Sleep -Seconds 5

First-run note: The STT backend downloads the whisper model on first launch. The medium model is ~1.5 GB — download may take 1-3 minutes depending on bandwidth. Subsequent starts are instant (model is cached locally).

Step 6: Verify

# Health check — should return {"ok":true,"port":8888,...}
curl -s http://127.0.0.1:8888/healthz

# Channel targets — should return {"ok":true,"targets":[...]}
curl -s 'http://127.0.0.1:8888/bridge/targets?limit=5'
# If token error: check OPENCLAW_GATEWAY_TOKEN in .env

# Model catalog — should show available model count
curl -s http://127.0.0.1:8888/bridge/models | python -c "import json,sys; d=json.load(sys.stdin); print(f'Models: {len(d.get(\"models\",[]))} available')"
# Windows PowerShell
Invoke-RestMethod http://127.0.0.1:8888/healthz
Invoke-RestMethod 'http://127.0.0.1:8888/bridge/targets?limit=5'

If all checks pass, open http://127.0.0.1:8888 in a browser.

Step 7: Configure Agent for Background Tasks (Recommended)

Long-running tasks (image generation, video rendering, etc.) can block the voice chat agent from responding. This step prepends a background task policy to the user's AGENTS.md so the agent delegates heavy work to subagents via sessions_spawn and stays responsive.

Important: This must prepend (not overwrite) the existing AGENTS.md.

# macOS / Linux
AGENTS_FILE="$HOME/.openclaw/workspace/AGENTS.md"
MARKER="<!-- BEGIN claw-voice-chat background task policy -->"

# Only add if not already present
if [ ! -f "$AGENTS_FILE" ] || ! grep -q "$MARKER" "$AGENTS_FILE" 2>/dev/null; then
  mkdir -p "$(dirname "$AGENTS_FILE")"
  POLICY=$(cat <<'POLICY_EOF'
<!-- BEGIN claw-voice-chat background task policy -->
## Background Task Policy (Voice Chat Responsiveness)

> Added by claw-voice-chat setup.
> Ensures the agent stays responsive during long-running tasks.

When a user requests a task estimated to take **more than 10 seconds** (image generation,
video rendering, large file processing, web scraping, etc.), **ALWAYS** delegate it to a subagent
using `sessions_spawn`:

1. **Spawn** the task with a descriptive label and appropriate timeout
2. **Respond immediately** in the user's language (e.g. "Started! I'll let you know when it's done.")
3. **Continue handling** new messages while the subagent works in the background
4. The subagent will **announce results** back to this chat when done

**Always respond in the same language the user is speaking.**

**NEVER block the conversation** with synchronous long-running tool calls.
If unsure whether a task is long-running, default to spawning a subagent.

---

<!-- END claw-voice-chat background task policy -->
POLICY_EOF
)
  if [ -f "$AGENTS_FILE" ]; then
    # Prepend policy to existing file
    EXISTING=$(cat "$AGENTS_FILE")
    printf '%s\n\n%s' "$POLICY" "$EXISTING" > "$AGENTS_FILE"
  else
    echo "$POLICY" > "$AGENTS_FILE"
  fi
  echo "OK: Background task policy added to AGENTS.md"
else
  echo "OK: Background task policy already present in AGENTS.md"
fi
# Windows PowerShell
$agentsFile = "$env:USERPROFILE\.openclaw\workspace\AGENTS.md"
$marker = "<!-- BEGIN claw-voice-chat background task policy -->"

$exists = $false
if (Test-Path $agentsFile) {
  $exists = (Get-Content $agentsFile -Raw) -match [regex]::Escape($marker)
}

if (-not $exists) {
  $dir = Split-Path $agentsFile
  if (-not (Test-Path $dir)) { New-Item -ItemType Directory -Path $dir -Force | Out-Null }

  $policy = @"
<!-- BEGIN claw-voice-chat background task policy -->
## Background Task Policy (Voice Chat Responsiveness)

> Added by claw-voice-chat setup.
> Ensures the agent stays responsive during long-running tasks.

When a user requests a task estimated to take **more than 10 seconds** (image generation,
video rendering, large file processing, web scraping, etc.), **ALWAYS** delegate it to a subagent
using ``sessions_spawn``:

1. **Spawn** the task with a descriptive label and appropriate timeout
2. **Respond immediately** in the user's language (e.g. "Started! I'll let you know when it's done.")
3. **Continue handling** new messages while the subagent works in the background
4. The subagent will **announce results** back to this chat when done

**Always respond in the same language the user is speaking.**

**NEVER block the conversation** with synchronous long-running tool calls.
If unsure whether a task is long-running, default to spawning a subagent.

---

<!-- END claw-voice-chat background task policy -->
"@

  if (Test-Path $agentsFile) {
    $existing = Get-Content $agentsFile -Raw
    Set-Content $agentsFile -Value "$policy`n`n$existing"
  } else {
    Set-Content $agentsFile -Value $policy
  }
  Write-Host "OK: Background task policy added to AGENTS.md"
} else {
  Write-Host "OK: Background task policy already present in AGENTS.md"
}

This ensures the AI agent delegates heavy tasks to background subagents and keeps responding to voice input during long operations.

Step 8: Local TTS Server (Optional)

High-quality TTS without API keys using edge-tts:

pip install edge-tts fastapi uvicorn
python tts-local/server.py &
sleep 2
curl -s http://127.0.0.1:5050/health
# Expected: {"ok":true,"backend":"edge"}

Then configure in the UI: Options > TTS/STT > Custom > URL: http://localhost:5050/v1/audio/speech

Step 9: Remote Access for Mobile (Optional)

Microphone requires HTTPS. Use Tailscale for automatic HTTPS certificates:

tailscale serve --bg 8888
curl -sk https://your-machine.tail12345.ts.net/healthz

Access from mobile: https://your-machine.tail12345.ts.net/

Do NOT use http://your-machine:8888 — plain HTTP blocks microphone access.

Server Endpoints Reference

Endpoint Method Description
/healthz GET Server health
/bridge/healthz GET Bridge health
/bridge/targets GET List channel sessions
/bridge/attach POST Attach to session (returns bridgeId)
/bridge/stream GET SSE event stream
/bridge/inject POST Send message to session (async 202)
/bridge/models GET List available models
/bridge/tts POST TTS proxy (OpenAI/Qwen/Custom)
/api/* * Proxy to STT/TTS backend
/ws/chat WS Voice chat WebSocket

Key Files

client/src/App.tsx           # React UI (voice, chat, bridge, TTS/STT settings)
client/src/lib/audio.ts      # PCM audio encoding (downsample, base64)
client/src/types.ts           # TypeScript interfaces
server/src/index.ts           # Express server (bridge, TTS proxy, static)
server/src/openclaw.ts        # OpenClaw gateway client
server/src/bridge-inject.ts   # Session resolution + message delivery
stt-backend/                  # Python STT backend (faster-whisper)
stt-backend/app/stt.py        # Whisper transcriber + streaming VAD
stt-backend/app/main.py       # FastAPI + WebSocket entry point
stt-backend/requirements.txt  # Python dependencies
tts-local/server.py           # Local TTS server (edge-tts / CosyVoice)
.env.example                  # Environment template

WebSocket Protocol (/ws/chat)

// Client -> Server
{"type": "audio", "pcm16": "<base64 PCM16 mono 16kHz>"}
{"type": "text", "text": "hello"}
{"type": "flush"}   // end of speech segment
{"type": "reset"}   // clear conversation

// Server -> Client
{"type": "ready", "llm": "model-name", "tts_enabled": true}
{"type": "stt_partial", "text": "hel..."}
{"type": "stt_final", "text": "hello"}
{"type": "user_text", "text": "hello"}
{"type": "assistant_delta", "text": "Hi"}
{"type": "assistant_final", "text": "Hi there!"}
{"type": "tts_audio", "audio": "<base64 WAV>"}
{"type": "info", "message": "..."}
{"type": "error", "message": "..."}

Troubleshooting

Symptom Cause Fix
Cannot POST /bridge/tts Server running old build npm run build then restart server
OPENCLAW_GATEWAY_TOKEN is required Missing .env or token Check .env file exists with valid token
(no channel selected) Gateway not running or token wrong Run openclaw gateway run, verify token
Models empty in Options OPENCLAW_CLI not set in .env Set OPENCLAW_CLI=openclaw (or full path to openclaw.mjs)
Mic not working on mobile Accessing via HTTP, not HTTPS Use tailscale serve --bg 8888 and access via the HTTPS URL (no :8888 suffix)
Mic not working on localhost Browser permission denied Allow microphone in browser settings
TTS preview silent Audio not unlocked Click "Enable Audio" first

License

Apache-2.0

About

Push-to-talk voice chat interface for OpenClaw channels

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages