Claw-Voice-Chat

Push-to-Talk Voice Chat for OpenClaw Channels
Connect to Telegram, Discord, Slack, or any OpenClaw channel and interact using voice or text.
Messages are transcribed via STT, sent to the AI agent, and responses stream back with configurable TTS.

Install with AI · Features · Quick Start · TTS · STT · Config · AI Guide · 한국어

Install with AI

Just paste this to your AI coding agent (Claude Code, Codex, Cursor, Gemini CLI, etc.):
Install claw-voice-chat following the guide at:
https://github.com/GreenSheep01201/claw-voice-chat
The AI will read this README and handle everything automatically.

Features

Push-to-talk voice input with real-time STT (faster-whisper)
Channel bridge — select any active OpenClaw session and talk to it
Streaming transcript — agent responses arrive token by token
Configurable TTS — Browser (Web Speech API), OpenAI, Qwen/DashScope, or Custom endpoint
STT language selection — language hint for faster-whisper (Korean, English, Japanese, Chinese, etc.)
Local TTS server — included edge-tts wrapper for high-quality TTS without API keys
Voice preview — test TTS voices before saving
Model catalog — browse models from connected providers
Text input — type messages with Ctrl+Enter / Cmd+Enter
Standalone LLM mode — works without channel connection using a local LLM backend

Architecture

Browser (React + Tailwind)
   |
   | port 8888 (HTTP + WebSocket)
   v
Express Server (Node.js)
   |
   |--- /bridge/*      --> OpenClaw Gateway (port 18789)
   |--- /bridge/tts    --> TTS Proxy (OpenAI / Qwen / Custom / Local)
   |--- /api/* /ws/*   --> STT/TTS Backend (port 8766) [optional]
   |
   v
OpenClaw Gateway --> Telegram, Discord, Slack, Signal, ...

Operating Modes

Mode	Requirements	Description
Channel Bridge	Node.js + OpenClaw Gateway	Text/voice to channels. Browser or external TTS for responses.
Standalone LLM	Node.js + Python STT/TTS backend	Full voice pipeline: push-to-talk, local STT, LLM, audio TTS.

Both modes can run simultaneously. The Python backend is only needed for push-to-talk STT.

Requirements

Node.js 22+ (download)
OpenClaw Gateway running locally (for channel bridge)
Python 3.10+ (only for local TTS server or STT backend — optional)

Quick Start

1. Clone and install

git clone https://github.com/GreenSheep01201/claw-voice-chat.git
cd claw-voice-chat
npm install && cd client && npm install && cd ../server && npm install && cd ..
npm run stt:install   # Python STT backend dependencies

2. Set up OpenClaw Gateway

npm install -g openclaw
openclaw setup          # connect channels, create config
openclaw gateway run    # starts on port 18789

3. Configure environment

cp .env.example .env

Edit .env:

PORT=8888
NODE_ENV=production
OPENCLAW_GATEWAY_URL=http://127.0.0.1:18789
OPENCLAW_GATEWAY_TOKEN=your-token-here

# Model catalog — path to openclaw CLI binary or openclaw.mjs entry point.
# Required for OAuth provider models (GitHub Copilot, Google Antigravity, etc.)
# to appear in the Options model picker.
OPENCLAW_CLI=openclaw

Get your token:

macOS/Linux: cat ~/.openclaw/openclaw.json | grep token
Windows: type %USERPROFILE%\.openclaw\openclaw.json | findstr token

Or extract it programmatically:

python -c "import json; print(json.load(open('$HOME/.openclaw/openclaw.json'))['gateway']['auth']['token'])"

4. Build and run

npm run build
npm start       # starts Express (8888) + STT backend (8766) concurrently

Open http://127.0.0.1:8888

To run only the Express server without STT: npm run start:server

5. Development mode

npm run dev    # Vite (5173) + Express (8888) + STT (8766) concurrently

TTS Providers

Configure in Options > TTS / STT tab.

Provider	Setup	Quality	Latency
Browser	Built-in, no setup	Varies by OS	Instant
OpenAI	API key required	Excellent	~1s
Qwen/DashScope	API key required	Good	~1s
Custom	Any OpenAI-compatible endpoint	Varies	Varies
Local (edge-tts)	`pip install edge-tts`	Excellent	~2s

Local TTS Server (edge-tts)

High-quality TTS without API keys. Works on macOS, Linux, and Windows.

Setup:

pip install edge-tts fastapi uvicorn
python tts-local/server.py

Connect in UI:

Options > TTS / STT tab
Select Custom
URL: http://localhost:5050/v1/audio/speech
Leave API Key empty
Voice: sunhi (Korean), echo (English), nanami (Japanese)
Click Preview Voice to test

Available voices:

Language	Voices
Korean	`sunhi`, `inwoo`, `hyunsu`
English	`alloy`, `nova`, `echo`, `onyx`, `shimmer`
Japanese	`nanami`, `keita`
Chinese	`xiaoxiao`, `yunxi`, `xiaoyi`

Run in background:

# macOS/Linux
nohup python tts-local/server.py > /tmp/tts-local.log 2>&1 &

# Windows (PowerShell)
Start-Process -NoNewWindow python -ArgumentList "tts-local/server.py"

Verify:

curl http://127.0.0.1:5050/health
# {"ok":true,"backend":"edge"}

STT Backend (Push-to-Talk)

The included stt-backend/ provides real-time speech-to-text using faster-whisper. It starts automatically with npm start.

Manual startup (if running separately):

npm run stt:install   # pip install -r stt-backend/requirements.txt
npm run stt:start     # starts on port 8766

Configuration:

STT model size and language can be configured in the Options > TTS / STT tab in the UI. Changes take effect on the next WebSocket connection (reconnect).

Setting	Options	Default	Description
Model Size	Tiny, Base, Small, Medium, Large v3	Medium	Accuracy vs speed trade-off
Language	Auto-detect, Korean, English, Japanese, + 12 more	Auto (browser locale)	Language hint for recognition

Environment variables (.env) set the server-side defaults:

Variable	Default	Description
`STT_MODEL_SIZE`	`medium`	Default model when client doesn't specify
`STT_DEVICE`	`auto`	Device: `auto`, `cpu`, `cuda`
`STT_COMPUTE_TYPE`	`int8`	Compute type: `int8`, `float16`, `float32`

Models are cached in memory — switching sizes in the UI loads the new model once and reuses it for subsequent connections.

Usage

Click Connect to establish the WebSocket connection
Click Enable Audio to unlock browser audio
Select a channel from the dropdown (e.g., Telegram bot session)
Hold to Speak — hold the button, speak, release to send
Or type in the text box and press Ctrl+Enter / Cmd+Enter
Toggle TTS On/Off to control voice output

Remote Access (Mobile / Other Devices)

Microphone access requires a secure context (HTTPS or localhost). When accessing from a phone, tablet, or another machine over plain HTTP, the browser blocks microphone input silently.

Recommended: Tailscale HTTPS

Tailscale provides automatic HTTPS certificates for devices on your tailnet.

# Expose the voice-chat server (port 8888) over Tailscale HTTPS
tailscale serve --bg 8888

Access from mobile: https://your-machine.tail12345.ts.net/

Important: Do NOT append :8888 to the Tailscale URL. Tailscale serves HTTPS on port 443 and proxies internally to 8888. Accessing http://your-machine:8888 directly is plain HTTP and microphone will not work.

Verify HTTPS is active:

curl -sk https://your-machine.tail12345.ts.net/healthz
# Expected: {"ok":true,"port":8888,...}

Stop Tailscale serve:

tailscale serve --https=443 off

Environment Variables

Variable	Required	Default	Description
`PORT`	No	`8888`	Express server port
`NODE_ENV`	No	`development`	`production` serves built client
`VCB_BACKEND_HTTP`	No	`http://127.0.0.1:8766`	STT/TTS backend URL
`OPENCLAW_GATEWAY_URL`	For bridge	`http://127.0.0.1:18789`	OpenClaw gateway URL
`OPENCLAW_GATEWAY_TOKEN`	For bridge	—	Gateway auth token
`OPENCLAW_CLI`	For models	—	Path to `openclaw` binary or `openclaw.mjs`. Required for OAuth models (GitHub Copilot, etc.)
`STT_MODEL_SIZE`	No	`medium`	faster-whisper model size (`tiny`/`base`/`small`/`medium`/`large-v3`)
`STT_DEVICE`	No	`auto`	STT device (`auto`/`cpu`/`cuda`)
`STT_COMPUTE_TYPE`	No	`int8`	STT compute type (`int8`/`float16`/`float32`)

All variables are loaded from .env at project root via dotenv.

macOS LaunchAgent

See launchd/README.md for background service templates.

AI Setup Guide

For AI coding assistants (Claude, Cursor, Copilot, etc.) setting up this project on a user's machine. Run each step sequentially. All commands are copy-paste ready. This guide is designed for one-shot execution — no manual editing required.

Platform Detection

Detect the platform first. All subsequent steps use $IS_WIN to branch.

macOS / Linux (bash/zsh):

IS_WIN=false

Windows (PowerShell):

$IS_WIN = $true

Step 1: Check Prerequisites

# macOS / Linux
node --version    # Must be 22+. If missing: https://nodejs.org/
python --version  # Must be 3.10+. If missing: https://www.python.org/downloads/

# Windows PowerShell
node --version    # Must be 22+. If missing: https://nodejs.org/
python --version  # Must be 3.10+. If missing: https://www.python.org/downloads/

If node or python is not found, install them before proceeding. On Windows, install Python from python.org (not Microsoft Store) to avoid PATH issues.

Step 2: Install OpenClaw Gateway

The channel bridge requires a running OpenClaw gateway. Skip this step if the gateway is already installed and running.

# macOS / Linux
npm install -g openclaw
openclaw setup          # interactive — connect channels, configure providers
openclaw gateway run &  # starts gateway on port 18789 in background
sleep 3                 # wait for gateway to initialize

# Windows PowerShell
npm install -g openclaw
openclaw setup
Start-Process -NoNewWindow -FilePath "openclaw" -ArgumentList "gateway","run"
Start-Sleep -Seconds 3

If you already have openclaw installed, just ensure the gateway is running: curl -s http://127.0.0.1:18789/healthz should return {"ok":true,...}

Step 3: Clone and Install

# macOS / Linux
git clone https://github.com/GreenSheep01201/claw-voice-chat.git
cd claw-voice-chat
npm install && cd client && npm install && cd ../server && npm install && cd ..
pip install -r stt-backend/requirements.txt

# Windows PowerShell
git clone https://github.com/GreenSheep01201/claw-voice-chat.git
cd claw-voice-chat
npm install; cd client; npm install; cd ..\server; npm install; cd ..
pip install -r stt-backend\requirements.txt

Step 4: Configure .env (Auto)

# macOS / Linux — fully automatic, no manual editing needed
cp .env.example .env

# Auto-detect gateway token
TOKEN=$(python -c "import json,os; print(json.load(open(os.path.expanduser('~/.openclaw/openclaw.json')))['gateway']['auth']['token'])" 2>/dev/null || echo "")
if [ -n "$TOKEN" ]; then
  sed -i.bak "s/^OPENCLAW_GATEWAY_TOKEN=.*/OPENCLAW_GATEWAY_TOKEN=$TOKEN/" .env && rm -f .env.bak
  echo "OK: Token configured (${TOKEN:0:8}...)"
else
  echo "WARNING: Gateway token not found. Set OPENCLAW_GATEWAY_TOKEN in .env manually."
  echo "  Hint: cat ~/.openclaw/openclaw.json | grep token"
fi

# Auto-detect openclaw CLI path
CLI_PATH=$(which openclaw 2>/dev/null || echo "")
if [ -z "$CLI_PATH" ]; then
  for p in ../openclaw/openclaw.mjs ../../openclaw/openclaw.mjs /usr/local/lib/node_modules/openclaw/openclaw.mjs; do
    if [ -f "$p" ]; then CLI_PATH=$(cd "$(dirname "$p")" && pwd)/$(basename "$p"); break; fi
  done
fi
if [ -n "$CLI_PATH" ]; then
  sed -i.bak "s|^OPENCLAW_CLI=.*|OPENCLAW_CLI=$CLI_PATH|" .env && rm -f .env.bak
  echo "OK: OPENCLAW_CLI=$CLI_PATH"
else
  echo "NOTE: openclaw CLI not found in PATH. Model catalog will be empty."
fi

# Windows PowerShell — fully automatic
Copy-Item .env.example .env

# Auto-detect gateway token
try {
  $config = Get-Content "$env:USERPROFILE\.openclaw\openclaw.json" | ConvertFrom-Json
  $token = $config.gateway.auth.token
  if ($token) {
    (Get-Content .env) -replace '^OPENCLAW_GATEWAY_TOKEN=.*', "OPENCLAW_GATEWAY_TOKEN=$token" | Set-Content .env
    Write-Host "OK: Token configured ($($token.Substring(0,8))...)"
  }
} catch {
  Write-Host "WARNING: Gateway token not found. Set OPENCLAW_GATEWAY_TOKEN in .env manually."
}

# Auto-detect openclaw CLI path
$cliPath = (Get-Command openclaw -ErrorAction SilentlyContinue).Source
if (-not $cliPath) {
  foreach ($p in "..\openclaw\openclaw.mjs", "..\..\openclaw\openclaw.mjs") {
    if (Test-Path $p) { $cliPath = (Resolve-Path $p).Path; break }
  }
}
if ($cliPath) {
  (Get-Content .env) -replace '^OPENCLAW_CLI=.*', "OPENCLAW_CLI=$cliPath" | Set-Content .env
  Write-Host "OK: OPENCLAW_CLI=$cliPath"
} else {
  Write-Host "NOTE: openclaw CLI not found. Model catalog will be empty."
}

Step 5: Build and Start

# macOS / Linux
npm run build
npm start &    # starts Express (8888) + STT backend (8766) concurrently
sleep 5        # wait for servers to initialize (STT model downloads on first run)

# Windows PowerShell
npm run build
Start-Process -NoNewWindow npm -ArgumentList "start"
Start-Sleep -Seconds 5

First-run note: The STT backend downloads the whisper model on first launch. The medium model is ~1.5 GB — download may take 1-3 minutes depending on bandwidth. Subsequent starts are instant (model is cached locally).

Step 6: Verify

# Health check — should return {"ok":true,"port":8888,...}
curl -s http://127.0.0.1:8888/healthz

# Channel targets — should return {"ok":true,"targets":[...]}
curl -s 'http://127.0.0.1:8888/bridge/targets?limit=5'
# If token error: check OPENCLAW_GATEWAY_TOKEN in .env

# Model catalog — should show available model count
curl -s http://127.0.0.1:8888/bridge/models | python -c "import json,sys; d=json.load(sys.stdin); print(f'Models: {len(d.get(\"models\",[]))} available')"

# Windows PowerShell
Invoke-RestMethod http://127.0.0.1:8888/healthz
Invoke-RestMethod 'http://127.0.0.1:8888/bridge/targets?limit=5'

If all checks pass, open http://127.0.0.1:8888 in a browser.

Step 7: Configure Agent for Background Tasks (Recommended)

Long-running tasks (image generation, video rendering, etc.) can block the voice chat agent from responding. This step prepends a background task policy to the user's AGENTS.md so the agent delegates heavy work to subagents via sessions_spawn and stays responsive.

Important: This must prepend (not overwrite) the existing AGENTS.md.

# macOS / Linux
AGENTS_FILE="$HOME/.openclaw/workspace/AGENTS.md"
MARKER="<!-- BEGIN claw-voice-chat background task policy -->"

# Only add if not already present
if [ ! -f "$AGENTS_FILE" ] || ! grep -q "$MARKER" "$AGENTS_FILE" 2>/dev/null; then
  mkdir -p "$(dirname "$AGENTS_FILE")"
  POLICY=$(cat <<'POLICY_EOF'
<!-- BEGIN claw-voice-chat background task policy -->
## Background Task Policy (Voice Chat Responsiveness)

> Added by claw-voice-chat setup.
> Ensures the agent stays responsive during long-running tasks.

When a user requests a task estimated to take **more than 10 seconds** (image generation,
video rendering, large file processing, web scraping, etc.), **ALWAYS** delegate it to a subagent
using `sessions_spawn`:

1. **Spawn** the task with a descriptive label and appropriate timeout
2. **Respond immediately** in the user's language (e.g. "Started! I'll let you know when it's done.")
3. **Continue handling** new messages while the subagent works in the background
4. The subagent will **announce results** back to this chat when done

**Always respond in the same language the user is speaking.**

**NEVER block the conversation** with synchronous long-running tool calls.
If unsure whether a task is long-running, default to spawning a subagent.

---

<!-- END claw-voice-chat background task policy -->
POLICY_EOF
)
  if [ -f "$AGENTS_FILE" ]; then
    # Prepend policy to existing file
    EXISTING=$(cat "$AGENTS_FILE")
    printf '%s\n\n%s' "$POLICY" "$EXISTING" > "$AGENTS_FILE"
  else
    echo "$POLICY" > "$AGENTS_FILE"
  fi
  echo "OK: Background task policy added to AGENTS.md"
else
  echo "OK: Background task policy already present in AGENTS.md"
fi

# Windows PowerShell
$agentsFile = "$env:USERPROFILE\.openclaw\workspace\AGENTS.md"
$marker = "<!-- BEGIN claw-voice-chat background task policy -->"

$exists = $false
if (Test-Path $agentsFile) {
  $exists = (Get-Content $agentsFile -Raw) -match [regex]::Escape($marker)
}

if (-not $exists) {
  $dir = Split-Path $agentsFile
  if (-not (Test-Path $dir)) { New-Item -ItemType Directory -Path $dir -Force | Out-Null }

  $policy = @"
<!-- BEGIN claw-voice-chat background task policy -->
## Background Task Policy (Voice Chat Responsiveness)

> Added by claw-voice-chat setup.
> Ensures the agent stays responsive during long-running tasks.

When a user requests a task estimated to take **more than 10 seconds** (image generation,
video rendering, large file processing, web scraping, etc.), **ALWAYS** delegate it to a subagent
using ``sessions_spawn``:

1. **Spawn** the task with a descriptive label and appropriate timeout
2. **Respond immediately** in the user's language (e.g. "Started! I'll let you know when it's done.")
3. **Continue handling** new messages while the subagent works in the background
4. The subagent will **announce results** back to this chat when done

**Always respond in the same language the user is speaking.**

**NEVER block the conversation** with synchronous long-running tool calls.
If unsure whether a task is long-running, default to spawning a subagent.

---

<!-- END claw-voice-chat background task policy -->
"@

  if (Test-Path $agentsFile) {
    $existing = Get-Content $agentsFile -Raw
    Set-Content $agentsFile -Value "$policy`n`n$existing"
  } else {
    Set-Content $agentsFile -Value $policy
  }
  Write-Host "OK: Background task policy added to AGENTS.md"
} else {
  Write-Host "OK: Background task policy already present in AGENTS.md"
}

This ensures the AI agent delegates heavy tasks to background subagents and keeps responding to voice input during long operations.

Step 8: Local TTS Server (Optional)

High-quality TTS without API keys using edge-tts:

pip install edge-tts fastapi uvicorn
python tts-local/server.py &
sleep 2
curl -s http://127.0.0.1:5050/health
# Expected: {"ok":true,"backend":"edge"}

Then configure in the UI: Options > TTS/STT > Custom > URL: http://localhost:5050/v1/audio/speech

Step 9: Remote Access for Mobile (Optional)

Microphone requires HTTPS. Use Tailscale for automatic HTTPS certificates:

tailscale serve --bg 8888
curl -sk https://your-machine.tail12345.ts.net/healthz

Access from mobile: https://your-machine.tail12345.ts.net/

Do NOT use http://your-machine:8888 — plain HTTP blocks microphone access.

Server Endpoints Reference

Endpoint	Method	Description
`/healthz`	GET	Server health
`/bridge/healthz`	GET	Bridge health
`/bridge/targets`	GET	List channel sessions
`/bridge/attach`	POST	Attach to session (returns bridgeId)
`/bridge/stream`	GET	SSE event stream
`/bridge/inject`	POST	Send message to session (async 202)
`/bridge/models`	GET	List available models
`/bridge/tts`	POST	TTS proxy (OpenAI/Qwen/Custom)
`/api/*`	*	Proxy to STT/TTS backend
`/ws/chat`	WS	Voice chat WebSocket

Key Files

client/src/App.tsx           # React UI (voice, chat, bridge, TTS/STT settings)
client/src/lib/audio.ts      # PCM audio encoding (downsample, base64)
client/src/types.ts           # TypeScript interfaces
server/src/index.ts           # Express server (bridge, TTS proxy, static)
server/src/openclaw.ts        # OpenClaw gateway client
server/src/bridge-inject.ts   # Session resolution + message delivery
stt-backend/                  # Python STT backend (faster-whisper)
stt-backend/app/stt.py        # Whisper transcriber + streaming VAD
stt-backend/app/main.py       # FastAPI + WebSocket entry point
stt-backend/requirements.txt  # Python dependencies
tts-local/server.py           # Local TTS server (edge-tts / CosyVoice)
.env.example                  # Environment template

WebSocket Protocol (/ws/chat)

// Client -> Server
{"type": "audio", "pcm16": "<base64 PCM16 mono 16kHz>"}
{"type": "text", "text": "hello"}
{"type": "flush"}   // end of speech segment
{"type": "reset"}   // clear conversation

// Server -> Client
{"type": "ready", "llm": "model-name", "tts_enabled": true}
{"type": "stt_partial", "text": "hel..."}
{"type": "stt_final", "text": "hello"}
{"type": "user_text", "text": "hello"}
{"type": "assistant_delta", "text": "Hi"}
{"type": "assistant_final", "text": "Hi there!"}
{"type": "tts_audio", "audio": "<base64 WAV>"}
{"type": "info", "message": "..."}
{"type": "error", "message": "..."}

Troubleshooting

Symptom	Cause	Fix
`Cannot POST /bridge/tts`	Server running old build	`npm run build` then restart server
`OPENCLAW_GATEWAY_TOKEN is required`	Missing .env or token	Check `.env` file exists with valid token
`(no channel selected)`	Gateway not running or token wrong	Run `openclaw gateway run`, verify token
Models empty in Options	`OPENCLAW_CLI` not set in `.env`	Set `OPENCLAW_CLI=openclaw` (or full path to `openclaw.mjs`)
Mic not working on mobile	Accessing via HTTP, not HTTPS	Use `tailscale serve --bg 8888` and access via the HTTPS URL (no `:8888` suffix)
Mic not working on localhost	Browser permission denied	Allow microphone in browser settings
TTS preview silent	Audio not unlocked	Click "Enable Audio" first

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
client		client
launchd		launchd
server		server
stt-backend		stt-backend
tts-local		tts-local
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.ko.md		README.ko.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

Claw-Voice-Chat

Install with AI

Table of Contents

Features

Architecture

Operating Modes

Requirements

Quick Start

1. Clone and install

2. Set up OpenClaw Gateway

3. Configure environment

4. Build and run

5. Development mode

TTS Providers

Local TTS Server (edge-tts)

STT Backend (Push-to-Talk)

Usage

Remote Access (Mobile / Other Devices)

Environment Variables

macOS LaunchAgent

AI Setup Guide

Platform Detection

Step 1: Check Prerequisites

Step 2: Install OpenClaw Gateway

Step 3: Clone and Install

Step 4: Configure .env (Auto)

Step 5: Build and Start

Step 6: Verify

Step 7: Configure Agent for Background Tasks (Recommended)

Step 8: Local TTS Server (Optional)

Step 9: Remote Access for Mobile (Optional)

Server Endpoints Reference

Key Files

WebSocket Protocol (/ws/chat)

Troubleshooting

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages