Voice keyboard is a demo application showcasing Deepgram's new turn-taking speech-to-text API: Flux.
A voice-controlled Linux virtual keyboard that converts speech to text and types it into any application.
As a result of directly targeting Linux as a driver, this works with all Linux applications.
- Voice-to-Text: Real-time speech recognition using Deepgram's Flux API service (turn-taking STT)
- Virtual Keyboard: Creates a virtual input device that works with all applications
- Incremental Typing: Smart transcript updates with minimal backspacing for real-time corrections
The application solves a common Linux privilege problem:
- Virtual keyboard creation requires root access to
/dev/uinput - Audio input requires user-space access to PipeWire/PulseAudio
Solution: The application starts with root privileges, creates the virtual keyboard, then drops privileges to access the user's audio session.
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://rustup.rs | sh
# Install required system packages (Fedora/RHEL)
sudo dnf install alsa-lib-devel
# Install required system packages (Ubuntu/Debian)
sudo apt install libasound2-devgit clone <repository-url>
cd voice-keyboard
cargo buildYou’ll need a Deepgram API key to authenticate with Flux.
- Create or manage keys in the Deepgram console: Create additional API keys
- Export the key so the app can pick it up (recommended):
export DEEPGRAM_API_KEY="dg_your_api_key_here"
- The client sends the header
Authorization: Token <DEEPGRAM_API_KEY>. - For CI or systemd services, set
DEEPGRAM_API_KEYin the environment for the service user. - Security tip: treat API keys like passwords. Prefer env vars over committing keys to files.
Use the provided runner script:
./run.sh# Build and run with proper privilege handling
cargo build
sudo -E ./target/debug/voice-keyboard --test-sttImportant: Always use sudo -E to preserve environment variables needed for audio access.
This application uses Deepgram Flux, the company's new turn‑taking STT API. The default WebSocket URL is wss://api.deepgram.com/v2/listen.
voice-keyboard [OPTIONS]
OPTIONS:
--test-audio Test audio input and show levels
--test-stt Test speech-to-text functionality (default if no other mode specified)
--debug-stt Debug speech-to-text (print transcripts without typing)
--stt-url <URL> Custom STT service URL (default: wss://api.deepgram.com/v2/listen)
-h, --help Print help information
-V, --version Print version informationNote: If no mode is specified, the application defaults to --test-stt behavior.
- Initialization: Application starts with root privileges
- Virtual Keyboard: Creates
/dev/uinputdevice as root - Privilege Drop: Drops to original user privileges
- Audio Access: Accesses PipeWire/PulseAudio in user space
- Speech Recognition: Streams audio to Deepgram Flux STT service
- Incremental Typing: Updates text in real-time with smart backspacing
- Turn Finalization: Clears tracking on "EndOfTurn" events (user presses Enter manually)
The application provides sophisticated real-time transcript updates:
- Incremental Updates: As speech is recognized, the application updates the typed text by finding the common prefix between the current and new transcript, backspacing only the changed portion, and typing the new ending
- Smart Backspacing: Minimizes cursor movement by only removing characters that actually changed
- Turn Management: On "EndOfTurn" events, the application clears its internal tracking but doesn't automatically press Enter, allowing users to review before submitting
- Endpoint:
wss://api.deepgram.com/v2/listen - What it is: Flux is Deepgram's turn‑taking, low‑latency STT API designed for conversational experiences.
- Authentication: Send an
Authorizationheader. Common forms:Token <DEEPGRAM_API_KEY>(what this app uses)token <DEEPGRAM_API_KEY>orBearer <JWT>are also accepted by the platform
- Message types (each server message includes a JSON
typefield):Connected— initial connection confirmationTurnInfo— streaming transcription updates with fields:event(Update,StartOfTurn,Preflight,SpeechResumed,EndOfTurn),turn_index,audio_window_start,audio_window_end,transcript,words[] { word, confidence },end_of_turn_confidenceError— fatal error with fields:code,description(may also include a close code)Configuration— echoes/acknowledges configuration (e.g., thresholds) when provided
- Client close protocol: After sending your final audio, send a control message:
{ "type": "CloseStream" }The server will flush any remaining responses and then close the WebSocket.
- Update cadence: Flux produces updates about every 240 ms with a typical worst‑case latency of ~500 ms.
- Common query parameters (as supported by the preview spec):
model,encoding,sample_rate,preflight_threshold,eot_threshold,eot_timeout_ms,keyterm,mip_opt_out,tag
- Minimal Root Time: Only root during virtual keyboard creation
- Environment Preservation: Maintains user's audio session access
- Clean Privilege Drop: Properly drops both user and group privileges
- No System Changes: No permanent system configuration required
If you get "Host is down" or "I/O error" when testing audio:
- Use
sudo -E: Always preserve environment variables - Check PipeWire: Ensure PipeWire is running:
systemctl --user status pipewire - Test without sudo: Try
./target/debug/voice-keyboard --test-audio(will fail on keyboard creation but audio should work)
If you get "Permission denied" for /dev/uinput:
- Check uinput module:
sudo modprobe uinput - Verify device exists:
ls -la /dev/uinput - Use sudo: The application is designed to run with
sudo -E
src/
├── main.rs # Main application and privilege dropping
├── virtual_keyboard.rs # Virtual keyboard device management
├── audio_input.rs # Audio capture and processing
├── stt_client.rs # WebSocket STT client
└── input_event.rs # Linux input event constants
- OriginalUser: Captures and restores user context
- VirtualKeyboard: Manages uinput device lifecycle with smart transcript updates
- AudioInput: Cross-platform audio capture
- SttClient: WebSocket-based speech-to-text client
- AudioBuffer: Manages audio chunking for STT streaming
ISC License. See LICENSE.txt