🎤 Whisper Dictation

Real-time voice-to-text dictation for Linux using OpenAI Whisper and GPU acceleration.

Press Ctrl+M anywhere to start recording, speak, and have your words automatically transcribed and pasted where your cursor is. Perfect for coding, writing, email, and any text input.

✨ Features

🎯 Global Hotkey: Press Ctrl+M to record from any application
🌍 Auto Language Detection: Automatically detects and transcribes any language
⚡ GPU Accelerated: Uses whisper.cpp with CUDA for RTX GPUs (or CPU fallback)
🔄 Auto-paste: Transcribed text automatically pastes at cursor position
📍 System Tray: Visual status indicator (idle/recording/transcribing)
📜 History: Access your last 10 transcriptions from the tray menu
🚀 Boot on Startup: Automatically starts with your system

🎬 Demo

Idle (Blue) → Press Ctrl+M → Recording (Red Mic) → Press Ctrl+M → Transcribing (Orange) → Text auto-pastes!

🖥️ System Requirements

OS: Linux (Ubuntu 20.04+, Debian, Fedora, Arch)
Python: 3.8 or higher
GPU (optional): NVIDIA GPU with CUDA support for faster transcription
RAM: 4GB minimum, 8GB recommended
Disk: ~2GB for models and dependencies

📦 One-Line Installation

git clone https://github.com/NicolasHuberty/whisper-dictation.git && cd whisper-dictation && chmod +x install.sh && ./install.sh

That's it! The installer will:

✅ Install system dependencies (build tools, CUDA if available)
✅ Clone and build whisper.cpp from source
✅ Download the Whisper model (default: base, ~150MB)
✅ Install Python dependencies
✅ Set up autostart on boot
✅ Launch the application

🎯 Usage

Quick Start

After installation, the app runs automatically with a blue circle icon in your system tray.

Start Recording: Press Ctrl+M
- Icon turns red (microphone)
Speak your text
Stop Recording: Press Ctrl+M again
- Icon turns orange (processing)
Auto-paste: Text appears at your cursor automatically!

Tray Menu

Right-click the tray icon to:

View current status
See transcription history (last 10)
Quit the application

Manual Start

cd whisper-dictation
python3 whisper-dictation.py

⚙️ Configuration

Edit whisper-dictation.py to customize:

Change Hotkey

# Line ~245
hotkeys = keyboard.GlobalHotKeys({
    '<ctrl>+m': toggle_recording  # Change to '<f9>' or '<ctrl>+<alt>+v'
})

Change Model

# Line ~32 - Available models:
# tiny (~75MB, fastest, least accurate)
# base (~150MB, default, fast and decent accuracy)
# small (~500MB, balanced)
# medium (~1.5GB, better accuracy)
# large-v3 (~3GB, best accuracy, slower)

WHISPER_MODEL = "path/to/whisper.cpp/models/ggml-base.bin"

USB Microphone Selection

The app auto-detects USB microphones. If you have multiple mics, edit line ~20:

mic_device = usb_mics[0]  # Change index to select different mic

🔧 Advanced Installation

Custom Model Download

cd whisper.cpp/models
./download-ggml-model.sh medium  # Download medium model

GPU Acceleration

The installer automatically detects NVIDIA GPUs and builds with CUDA support. For AMD GPUs or CPU-only:

# CPU-only build
cd whisper.cpp
make clean
make

Manual Build

# Install dependencies
sudo apt update
sudo apt install -y python3-pip git build-essential portaudio19-dev

# Clone whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp

# Build with GPU support (NVIDIA)
make clean
WHISPER_CUDA=1 make

# Download model
bash ./models/download-ggml-model.sh base

# Install Python packages
pip3 install -r ../requirements.txt

# Run
cd ..
python3 whisper-dictation.py

🐛 Troubleshooting

App doesn't paste text

Problem: Text is copied to clipboard but not pasted.

Solutions:

Install xdotool: sudo apt install xdotool
Check permissions: The app needs permission to simulate keyboard input
Try a different hotkey (some apps block Ctrl+M)

No microphone detected

Problem: "No USB microphone detected" error.

Solutions:

# List audio devices
python3 -c "import sounddevice as sd; print(sd.query_devices())"

# If your mic isn't USB, edit line ~15 to detect all input devices:
mic_device = [i for i, d in enumerate(devices) if d['max_input_channels'] > 0][0]

Hotkey doesn't work in some apps

Problem: Ctrl+M doesn't work in browsers (Chrome, Firefox).

Solutions:

Chrome/Firefox intercept Ctrl+M for bookmark management
Change hotkey to F9, Ctrl+Alt+M, or another combination (see Configuration above)

Slow transcription

Problem: Transcription takes >5 seconds.

Solutions:

Use GPU: Ensure CUDA build with nvidia-smi to verify GPU is detected
Smaller model: Switch to tiny or base model for faster transcription
Check CPU: Close heavy applications during transcription

Icon not showing in system tray

Problem: No tray icon visible.

Solutions:

# For GNOME, install AppIndicator extension
sudo apt install gnome-shell-extension-appindicator

# For KDE/XFCE, restart the panel
killall plasmashell && plasmashell &  # KDE
xfce4-panel -r  # XFCE

📚 How It Works

Audio Capture: Uses sounddevice to capture audio from your microphone
Whisper.cpp: Transcribes audio using the optimized C++ implementation of OpenAI Whisper
Clipboard: Copies transcription to clipboard via pyperclip
Auto-paste: Simulates Ctrl+V using pyautogui to paste text
System Tray: pystray provides the tray icon and menu

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

git clone https://github.com/NicolasHuberty/whisper-dictation.git
cd whisper-dictation

# Install in development mode
pip3 install -r requirements.txt

# Make changes to whisper-dictation.py

# Test
python3 whisper-dictation.py

Roadmap

Support for more hotkey customization via config file
Web UI for configuration
Windows and macOS support
Plugin system for custom post-processing
Voice commands (punctuation, formatting)
Multiple language profiles

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

OpenAI Whisper - The amazing speech recognition model
whisper.cpp - High-performance C++ implementation
All the amazing open-source libraries used in this project

⭐ Star History

If you find this useful, please consider giving it a star! ⭐

Made with ❤️ for the Linux community

Have questions? Open an issue on GitHub!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt
test-install.sh		test-install.sh
whisper-dictation.desktop		whisper-dictation.desktop
whisper-dictation.py		whisper-dictation.py

Folders and files

Latest commit

History

Repository files navigation

🎤 Whisper Dictation

✨ Features

🎬 Demo

🖥️ System Requirements

📦 One-Line Installation

🎯 Usage

Quick Start

Tray Menu

Manual Start

⚙️ Configuration

Change Hotkey

Change Model

USB Microphone Selection

🔧 Advanced Installation

Custom Model Download

GPU Acceleration

Manual Build

🐛 Troubleshooting

App doesn't paste text

No microphone detected

Hotkey doesn't work in some apps

Slow transcription

Icon not showing in system tray

📚 How It Works

🤝 Contributing

Development Setup

Roadmap

📄 License

🙏 Acknowledgments

⭐ Star History

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages