You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Local on-device transcription via OpenAI Whisper (Transformers.js, inline blob worker)
- Six models: tiny.en, tiny, base.en, base, small.en, small (~40-244 MB, download on demand)
- Push-to-talk: click to record, click to stop and transcribe
- Full-utterance transcription — whole recording sent as one chunk for best Whisper accuracy
- 0.3s silence pad prepended to prevent Whisper dropping the first word
- PCM chunks now accumulate correctly before transcription (was overwriting on each chunk)
- Model pre-warming on startup when mic is enabled and a model is downloaded
- Two-step model UX: select to highlight, set as default to activate; tick = downloaded, dot = active
- Mic sensitivity slider in settings — adjustable energy gate (grom.voiceSensitivity, default 0.010)
- Persist downloaded model list to localStorage across sessions
- Hide/show mic toggle, privacy badge, ffmpeg lifecycle management
- Update CHANGELOG, README, website, and all docs — remove all Moonshine references
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,7 @@ All notable changes to Grom are documented here.
13
13
-**Active model indicator** — the current default model is clearly marked in the picker. Selecting a model highlights it; a "Set as default" button promotes it. Downloaded models show a tick; the active model shows a filled dot.
14
14
-**Model pre-warming** — Whisper loads silently in the background when Grom starts (if the mic is enabled and a model is downloaded), so the first utterance transcribes without delay.
15
15
-**Full-utterance transcription** — the entire recording is sent to Whisper as one chunk (capped at 28 s), giving the model full context for accurate transcription. A 0.3 s silence pad is prepended to prevent Whisper from dropping the first word.
16
+
-**Mic sensitivity slider** — Settings → Voice Input exposes the energy gate (RMS threshold) as a slider. Raise it if phantom transcriptions appear from background noise; lower it for quiet microphones. Persisted to VS Code settings as `grom.voiceSensitivity`.
16
17
-**ffmpeg lifecycle management** — Settings → Voice Input lets you remove the downloaded ffmpeg binary for a full cleanup. Re-downloading works seamlessly afterwards.
17
18
-**Hide/show mic toggle** — hide the mic button from the toolbar via Settings → Voice Input; restore it anytime from the same panel. An info badge explains how to get it back if you hide it accidentally.
18
19
-**Privacy badge** — the Voice Input settings section carries a circled-i badge explaining that audio is transcribed entirely on your device and never leaves your machine — part of Grom's accessibility and privacy ethos.
Copy file name to clipboardExpand all lines: docs/index.html
+5-5Lines changed: 5 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -177,7 +177,7 @@ <h3>Memory</h3>
177
177
<divclass="feature-card">
178
178
<divclass="icon">🎙️</div>
179
179
<h3>Voice Input</h3>
180
-
<p>Speak your prompts. Audio is transcribed on-device using a local Moonshine ONNX model — nothing is sent to a server. Choose Tiny (~75 MB) or Base (~300 MB). Chunked streaming so text appears as you speak.</p>
180
+
<p>Speak your prompts. Audio is transcribed on-device using Whisper — nothing is sent to a server. Six models from Tiny EN (~40 MB) to Small (~244 MB). Push-to-talk with model pre-warming for instant first use.</p>
<p>Pick <strong>Tiny</strong> (~75 MB, fast) or <strong>Base</strong> (~300 MB, more accurate) in Settings → Voice Input. Models download on demand and are cached locally. Switch at any time.</p>
206
+
<h3>Choose your Whisper model</h3>
207
+
<p>Pick from six models in Settings → Voice Input — from <strong>Tiny EN</strong> (~40 MB, fast) up to <strong>Small</strong> (~244 MB, best accuracy). English-only <code>.en</code> variants are more accurate for English speakers. Models download on demand and are cached locally. Download multiple and switch at any time.</p>
208
208
</div>
209
209
</div>
210
210
<divclass="step">
211
211
<divclass="step-num">4</div>
212
212
<divclass="step-body">
213
213
<h3>Record</h3>
214
-
<p>Click the mic button or press <code>Ctrl+Shift+M</code> to start. Text appears as you speak. Click again or press the shortcut to stop — the transcript is appended to your prompt.</p>
214
+
<p>Click the mic button or press <code>Ctrl+Shift+M</code> to start recording. Click again to stop — the transcript is appended to your prompt. The model pre-warms on startup so the first utterance transcribes without delay.</p>
215
215
</div>
216
216
</div>
217
217
</div>
218
-
<pstyle="margin-top:20px;font-size:13px;">Audio is transcribed entirely on your device using <ahref="https://github.com/huggingface/transformers.js">Transformers.js</a> and the <ahref="https://github.com/usefulsensors/moonshine">Moonshine</a> ONNX model. Nothing leaves your machine. Voice input is optional and designed for those who want or need it as an accessibility tool.</p>
218
+
<pstyle="margin-top:20px;font-size:13px;">Audio is transcribed entirely on your device using <ahref="https://github.com/huggingface/transformers.js">Transformers.js</a> and <ahref="https://openai.com/research/whisper">OpenAI Whisper</a>. Nothing leaves your machine. Voice input is optional and designed for those who want or need it as an accessibility tool.</p>
219
219
<pstyle="font-size:13px;"><strong>Platform support:</strong> Windows (DirectShow), macOS (avfoundation), Linux (PulseAudio / PipeWire / ALSA).</p>
<buttonid="mic-toggle-btn" class="voice-remove-btn voice-mic-on" onclick="window.toggleMicVisibility()" title="Toggle the mic button in the toolbar">● Mic on</button>
Copy file name to clipboardExpand all lines: package.json
+7Lines changed: 7 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -434,6 +434,13 @@
434
434
"default": "tiny.en",
435
435
"description": "Which Whisper model to use for local voice transcription. English-only (.en) models are more accurate for English speakers. The model is downloaded once and cached in your browser session."
436
436
},
437
+
"grom.voiceSensitivity": {
438
+
"type": "number",
439
+
"default": 0.010,
440
+
"minimum": 0.001,
441
+
"maximum": 0.100,
442
+
"description": "Mic energy gate for voice input (RMS threshold). Lower = more sensitive (picks up quiet speech and background noise). Higher = less sensitive (ignores noise but may miss quiet speech). Default 0.010 works for most setups; raise to 0.020–0.030 if phantom transcriptions appear."
0 commit comments