Skip to content

rodolphoeck/whisper-dictation

 
 

Repository files navigation

🎤 Whisper Dictation

Real-time voice-to-text dictation for Linux using OpenAI Whisper and GPU acceleration.

Press Ctrl+M anywhere to start recording, speak, and have your words automatically transcribed and pasted where your cursor is. Perfect for coding, writing, email, and any text input.

Status Python License

✨ Features

  • 🎯 Global Hotkey: Press Ctrl+M to record from any application
  • 🌍 Auto Language Detection: Automatically detects and transcribes any language
  • GPU Accelerated: Uses whisper.cpp with CUDA for RTX GPUs (or CPU fallback)
  • 🔄 Auto-paste: Transcribed text automatically pastes at cursor position
  • 📍 System Tray: Visual status indicator (idle/recording/transcribing)
  • 📜 History: Access your last 10 transcriptions from the tray menu
  • 🚀 Boot on Startup: Automatically starts with your system

🎬 Demo

Idle (Blue) → Press Ctrl+MRecording (Red Mic) → Press Ctrl+MTranscribing (Orange) → Text auto-pastes!

🖥️ System Requirements

  • OS: Linux (Ubuntu 20.04+, Debian, Fedora, Arch)
  • Python: 3.8 or higher
  • GPU (optional): NVIDIA GPU with CUDA support for faster transcription
  • RAM: 4GB minimum, 8GB recommended
  • Disk: ~2GB for models and dependencies

📦 One-Line Installation

git clone https://github.com/NicolasHuberty/whisper-dictation.git && cd whisper-dictation && chmod +x install.sh && ./install.sh

That's it! The installer will:

  1. ✅ Install system dependencies (build tools, CUDA if available)
  2. ✅ Clone and build whisper.cpp from source
  3. ✅ Download the Whisper model (default: base, ~150MB)
  4. ✅ Install Python dependencies
  5. ✅ Set up autostart on boot
  6. ✅ Launch the application

🎯 Usage

Quick Start

After installation, the app runs automatically with a blue circle icon in your system tray.

  1. Start Recording: Press Ctrl+M
    • Icon turns red (microphone)
  2. Speak your text
  3. Stop Recording: Press Ctrl+M again
    • Icon turns orange (processing)
  4. Auto-paste: Text appears at your cursor automatically!

Tray Menu

Right-click the tray icon to:

  • View current status
  • See transcription history (last 10)
  • Quit the application

Manual Start

cd whisper-dictation
python3 whisper-dictation.py

⚙️ Configuration

Edit whisper-dictation.py to customize:

Change Hotkey

# Line ~245
hotkeys = keyboard.GlobalHotKeys({
    '<ctrl>+m': toggle_recording  # Change to '<f9>' or '<ctrl>+<alt>+v'
})

Change Model

# Line ~32 - Available models:
# tiny (~75MB, fastest, least accurate)
# base (~150MB, default, fast and decent accuracy)
# small (~500MB, balanced)
# medium (~1.5GB, better accuracy)
# large-v3 (~3GB, best accuracy, slower)

WHISPER_MODEL = "path/to/whisper.cpp/models/ggml-base.bin"

USB Microphone Selection

The app auto-detects USB microphones. If you have multiple mics, edit line ~20:

mic_device = usb_mics[0]  # Change index to select different mic

🔧 Advanced Installation

Custom Model Download

cd whisper.cpp/models
./download-ggml-model.sh medium  # Download medium model

GPU Acceleration

The installer automatically detects NVIDIA GPUs and builds with CUDA support. For AMD GPUs or CPU-only:

# CPU-only build
cd whisper.cpp
make clean
make

Manual Build

# Install dependencies
sudo apt update
sudo apt install -y python3-pip git build-essential portaudio19-dev

# Clone whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp

# Build with GPU support (NVIDIA)
make clean
WHISPER_CUDA=1 make

# Download model
bash ./models/download-ggml-model.sh base

# Install Python packages
pip3 install -r ../requirements.txt

# Run
cd ..
python3 whisper-dictation.py

🐛 Troubleshooting

App doesn't paste text

Problem: Text is copied to clipboard but not pasted.

Solutions:

  1. Install xdotool: sudo apt install xdotool
  2. Check permissions: The app needs permission to simulate keyboard input
  3. Try a different hotkey (some apps block Ctrl+M)

No microphone detected

Problem: "No USB microphone detected" error.

Solutions:

# List audio devices
python3 -c "import sounddevice as sd; print(sd.query_devices())"

# If your mic isn't USB, edit line ~15 to detect all input devices:
mic_device = [i for i, d in enumerate(devices) if d['max_input_channels'] > 0][0]

Hotkey doesn't work in some apps

Problem: Ctrl+M doesn't work in browsers (Chrome, Firefox).

Solutions:

  • Chrome/Firefox intercept Ctrl+M for bookmark management
  • Change hotkey to F9, Ctrl+Alt+M, or another combination (see Configuration above)

Slow transcription

Problem: Transcription takes >5 seconds.

Solutions:

  1. Use GPU: Ensure CUDA build with nvidia-smi to verify GPU is detected
  2. Smaller model: Switch to tiny or base model for faster transcription
  3. Check CPU: Close heavy applications during transcription

Icon not showing in system tray

Problem: No tray icon visible.

Solutions:

# For GNOME, install AppIndicator extension
sudo apt install gnome-shell-extension-appindicator

# For KDE/XFCE, restart the panel
killall plasmashell && plasmashell &  # KDE
xfce4-panel -r  # XFCE

📚 How It Works

  1. Audio Capture: Uses sounddevice to capture audio from your microphone
  2. Whisper.cpp: Transcribes audio using the optimized C++ implementation of OpenAI Whisper
  3. Clipboard: Copies transcription to clipboard via pyperclip
  4. Auto-paste: Simulates Ctrl+V using pyautogui to paste text
  5. System Tray: pystray provides the tray icon and menu

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

git clone https://github.com/NicolasHuberty/whisper-dictation.git
cd whisper-dictation

# Install in development mode
pip3 install -r requirements.txt

# Make changes to whisper-dictation.py

# Test
python3 whisper-dictation.py

Roadmap

  • Support for more hotkey customization via config file
  • Web UI for configuration
  • Windows and macOS support
  • Plugin system for custom post-processing
  • Voice commands (punctuation, formatting)
  • Multiple language profiles

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

  • OpenAI Whisper - The amazing speech recognition model
  • whisper.cpp - High-performance C++ implementation
  • All the amazing open-source libraries used in this project

⭐ Star History

If you find this useful, please consider giving it a star! ⭐


Made with ❤️ for the Linux community

Have questions? Open an issue on GitHub!

About

Linux whisper dictation

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Shell 63.5%
  • Python 36.5%