Skip to content

grcamauer/WhisperWin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Development Setup

Prerequisites

  • Windows 11 (10.0.19041.0 or later)
  • .NET 8 SDK
  • Visual Studio 2022 or VS Code with C# extension
  • Windows App SDK 1.7

Quick Start

  1. Clone the repository
  2. Bootstrap dependencies: tools/bootstrap-whisper.ps1
  3. Download models: external\whisper.cpp\models\download-ggml-model.cmd tiny.en
  4. Build solution: dotnet build
  5. Run tests: dotnet test
  6. Run UI app: dotnet run --project src/WhisperKey.UI
  7. Run Helper: dotnet run --project src/WhisperKey.Helper

Build Commands

# Build all projects
dotnet build

# Run tests
dotnet test

# Clean solution
dotnet clean

Technical Stack

Framework & Runtime

  • .NET 8: Latest .NET runtime with nullable reference types
  • Windows App SDK 1.7: Modern Windows application development
  • WinUI 3: Native Windows UI framework
  • WinForms: Legacy UI for Helper process (STA thread requirement)

Audio Processing

  • NAudio 2.2.1: WASAPI audio capture and processing
  • WASAPI: Low-latency Windows audio API
  • Silero VAD: Voice Activity Detection model

Speech-to-Text

  • whisper.cpp: Native C++ implementation of Whisper
  • P/Invoke: Native library integration
  • HuggingFace: Model repository and download management

Text Insertion

  • Windows API: SendInput, Clipboard, UIA integration
  • TSF: Text Services Framework for IME integration
  • PowerShell: Console application text insertion

Inter-Process Communication

  • Named Pipes: High-performance IPC between processes
  • JSON-RPC: Structured communication protocol
  • Timeout Handling: Robust error handling and recovery

Testing & Quality

  • MSTest: Primary testing framework
  • Moq: Mocking framework for dependencies
  • FluentAssertions: Readable test assertions
  • Structured Logging: Microsoft.Extensions.Logging

Configuration & Storage

  • JSON Configuration: Hierarchical settings with schema validation
  • Local Storage: Privacy-first local data storage
  • Resource Management: Proper disposal patterns and async handling

Architecture Details

Multi-Process Design

The application uses a two-process architecture for reliability and separation of concerns:

  • UI Process: Owns audio capture and STT processing
  • Helper Process: Owns text insertion and global hotkeys
  • IPC Protocol: Clean communication via named pipes with JSON-RPC

Service Architecture

  • Dependency Injection: Microsoft.Extensions.DependencyInjection throughout
  • Interface Segregation: Small, focused interfaces with single responsibilities
  • Event-Driven: Heavy use of events for async communication
  • Configuration: Hierarchical configuration with user overrides

Error Handling

  • Structured Logging: Comprehensive logging with ILogger
  • Error Taxonomy: Categorized error handling and reporting
  • Crash Recovery: Graceful handling of failures and recovery
  • Timeout Protection: Prevents hanging operations

Privacy & Security

Privacy-First Design

  • Local Processing: All audio processing happens locally
  • No Telemetry: Disabled by default, local-only when enabled
  • Minimal Data: No persistent audio data storage
  • User Control: Full control over data and settings

Security Considerations

  • Elevation Detection: Proper handling of elevated applications
  • Secure Desktop: Detection and appropriate handling
  • Password Field Detection: Skips insertion in sensitive fields
  • Self-Protection: Prevents targeting own application windows

Documentation

  • ARCHITECTURE_OVERVIEW.md - System architecture and design decisions
  • DEVELOPMENT_PLAN.md - Step-by-step implementation roadmap
  • SECURITY_AND_PRIVACY.md - Privacy and security considerations
  • QA_TEST_PLAN.md - Testing strategy and validation

License

TBD - Will be determined based on final distribution model

About

Privacy-first on-device speech-to-text for Windows 11 using whisper.cpp and WinUI 3. Global hotkeys, named-pipe IPC, and system-wide text insertion.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors