dictate.sh

Standalone, low-latency speech transcription for Apple Silicon.

dictate.sh uses MLX for fast, local ASR with VAD-based turn detection, plus optional LLM intent analysis. It ships as a single Python script with inline dependencies so you can run it with uv and start talking.

Features

Low-latency, rolling-window ASR (MLX)
Voice activity detection (VAD) for turn boundaries
Optional intent analysis with a local LLM
Live terminal UI (status, transcript, stats) via Rich
Works offline after models are downloaded

Requirements

macOS on Apple Silicon (MLX)
Python >= 3.10
uv installed
Microphone permission granted to your terminal

Quick start

uv run stt.py

With intent analysis:

uv run stt.py --analyze

Choose a different ASR model:

uv run stt.py --model mlx-community/Qwen3-ASR-1.7B-8bit

List audio input devices:

uv run stt.py --list-devices

Use a specific input device:

uv run stt.py --device 3

CLI options

--model: ASR model (default: mlx-community/Qwen3-ASR-0.6B-8bit)
--language: Transcription language (default: English)
--transcribe-interval: Seconds between updates (default: 0.5)
--vad-frame-ms: VAD frame size (10/20/30, default: 30)
--vad-mode: VAD aggressiveness 0-3 (default: 2)
--vad-silence-ms: Silence to finalize a turn (default: 500)
--min-words: Minimum words to finalize a turn (default: 3)
--analyze: Enable LLM intent analysis
--llm-model: LLM model to use for analysis (default: mlx-community/Qwen3-0.6B-4bit)
--no-ui: Disable the Rich live UI
--list-devices: List audio input devices
--device: Audio input device index

Recommended models

ASR (MLX Qwen3-ASR):

mlx-community/Qwen3-ASR-0.6B-4bit: fastest, lowest quality
mlx-community/Qwen3-ASR-0.6B-8bit: good balance (default)
mlx-community/Qwen3-ASR-0.6B-bf16: higher quality, more RAM
mlx-community/Qwen3-ASR-1.7B-8bit: higher quality, slower

LLM (for --analyze):

mlx-community/Qwen3-0.6B-4bit: fastest, lowest RAM (default)
mlx-community/Qwen3-1.7B-4bit: better quality, slower
mlx-community/Mistral-7B-Instruct-v0.2-4bit: heavier
mlx-community/Llama-3.1-8B-Instruct-4bit: heavier

UI and Piping

The Rich live UI renders on stderr to keep stdout clean for scripting.
If stdout is not a TTY (e.g., when piping to another tool), stt.py automatically suppresses the UI elements and prints raw transcript lines to stdout.
Use --no-ui to force-disable the visual interface even in a TTY.

# Pipe raw transcripts into another tool
uv run stt.py | grep "important"

Troubleshooting

Too many short turns: increase --vad-silence-ms or lower --vad-mode.
No audio: check mic permissions or try --list-devices + --device.
Laggy output: reduce --transcribe-interval.

Logging

Set LOG_LEVEL=DEBUG for verbose logs.
Hugging Face HTTP request logs are suppressed by default.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
stt.py		stt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dictate.sh

Features

Requirements

Quick start

CLI options

Recommended models

UI and Piping

Troubleshooting

Logging

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

License

mpuig/dictate.sh

Folders and files

Latest commit

History

Repository files navigation

dictate.sh

Features

Requirements

Quick start

CLI options

Recommended models

UI and Piping

Troubleshooting

Logging

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages