Clank's wake gate is a small openWakeWord ONNX model (models/wakeword/hey_clank.onnx).
The shipped model was trained mostly on synthetic speech, which is why recall
drops on a microphone it never "heard" during training — the symptom you hit on
the laptop: a headset scores 0.7–0.9 but the built-in mic barely clears 0.1.
The durable fix is to retrain with audio from the mic Clank actually uses. This guide covers doing that on the Clank box itself.
Security rule for this project: a
.onnxis executable graph code. Any model that lands on the Clank box must passscripts/train_wakeword/audit_onnx.pyand be SHA256-pinned in_KNOWN_OWW_SHA256before Clank will load it. Never drop an unaudited model intomodels/wakeword/.
VRAM is not the bottleneck. openWakeWord's heavy feature extractors (melspectrogram + embedding) are frozen; you only train a tiny classifier head. The real costs are:
- Disk — the augmentation/negative feature sets are several GB to download.
- CPU time — generating synthetic speech (Piper TTS) is the slow step.
- Setup — a separate Python 3.11 environment (the runtime venv is 3.14 and can't run the training stack).
A 4 GB dGPU is plenty; it'll even run CPU-only, just slower. Budget an evening, not a weekend. Bonus of training here: you record and train on the same acoustic path Clank listens through.
| Full retrain | Custom verifier | |
|---|---|---|
| Output | a new hey_clank.onnx |
a .pkl verifier on top of a base model |
| Data needed | synthetic + real clips + negatives | ~3 positive + ~10s negative clips |
| Effort | an evening | minutes |
| Fits Clank as-is? | Yes — drop-in replacement | No — Clank's loader expects an .onnx; using a verifier needs a small code change |
For closing the mic-recall gap, do the full retrain — it's the supported, drop-in path. The verifier is a quick personalization trick but doesn't slot into Clank's current model loader, so it's noted at the end only.
python3.11 -m venv ~/oww-training/.venv
source ~/oww-training/.venv/bin/activate.fish
pip install -r scripts/train_wakeword/requirements-training.txtIf a version fails to resolve, the upstream notebook is the source of truth: https://github.com/dscripka/openWakeWord/blob/main/notebooks/automatic_model_training.ipynb. The maintained local/Docker pipeline at https://github.com/CoreWorxLab/openwakeword-training is a good fallback when the Colab notebook breaks (it often does).
This is the step that actually fixes recall. Run it in the Clank runtime venv
(it uses sounddevice/soundfile, which are already there), on the laptop, with
the mic at its sweet spot (~55% gain — too low is quiet, too high clips):
# 50+ positive "hey clank" clips — vary distance, volume, cadence, who's speaking
.venv/bin/python scripts/train_wakeword/record_samples.py --label positive --count 50
# 30 hard negatives — talk, TV, similar words, but NEVER the wake word
.venv/bin/python scripts/train_wakeword/record_samples.py --label negative --count 30 --seconds 3Clips land in data/wakeword/positive/ and data/wakeword/negative/
(gitignored — they're personal audio, see below). More variety > more clips.
The bulk of positive training data is synthetic speech across many voices:
git clone https://github.com/rhasspy/piper-sample-generator ~/oww-training/piper-sample-generator
# follow its README to fetch a generator voice, then:
python ~/oww-training/piper-sample-generator/generate_samples.py "hey clank" \
--max-samples 2000 --output-dir ~/oww-training/synthetic_positiveThe trainer mixes in large precomputed negative and room-impulse sets (AudioSet, FMA, RIRs). The training notebook downloads these from Hugging Face — follow its "data" cells. This is the multi-GB disk cost.
Run the openWakeWord trainer (the notebook, or its train.py equivalent),
pointing it at:
- synthetic positives (step 3) + your real positives (
data/wakeword/positive/), - your hard negatives (
data/wakeword/negative/) plus the downloaded negatives, - export format: ONNX (Clank uses onnxruntime; you don't need the tflite export).
Output is a new hey_clank.onnx.
python scripts/train_wakeword/audit_onnx.py ~/oww-training/output/hey_clank.onnxIt hard-fails on custom operator domains, external-data tensors, or embedded URLs/paths, and confirms it loads under stock onnxruntime CPU. On success it prints the exact SHA256 line to pin. If it fails, do not ship the model.
- Copy the line the audit printed into
_KNOWN_OWW_SHA256insrc/voicecommand/voice_LED_control.py, replacing the oldhey_clank.onnxentry (update the dated audit comment above it). - Put the model in place:
cp ~/oww-training/output/hey_clank.onnx models/wakeword/hey_clank.onnx - Restart Clank. At startup the loader re-hashes the file and refuses to run if it doesn't match the pin — so a typo here fails loudly, by design.
./start_clank.sh --oww-debug--oww-debug logs the peak wake score each second. Say "hey clank" from across
the room a dozen times; you want consistent peaks well above the
oww_threshold (currently 0.15 in config/default.yaml). If real hits now
sit at 0.5–0.9 on the built-in mic, the retrain worked. Raise the threshold back
toward 0.3–0.5 if you start getting false triggers.
data/wakeword/ is gitignored — those are recordings of your voice and room and
shouldn't be committed. Only the trained, audited hey_clank.onnx ships in the
repo. (This matches Clank's runtime stance: live audio is RAM-only and never
written to disk; these training clips are the one deliberate exception, kept
local.)
openWakeWord can train a lightweight per-speaker verifier on a base model from
just a handful of clips (openwakeword.train_custom_verifier(...), see the
upstream docs).
It outputs a .pkl that gates an existing wake model's detections. Clank's
loader (WakeWordDetector) currently consumes a single .onnx and has no
verifier hook, so using this would mean a small code change to load and apply the
verifier alongside the base model. Noted for completeness; the full retrain above
is the supported path.