- Windows 11 (10.0.19041.0 or later)
- .NET 8 SDK
- Visual Studio 2022 or VS Code with C# extension
- Windows App SDK 1.7
- Clone the repository
- Bootstrap dependencies:
tools/bootstrap-whisper.ps1 - Download models:
external\whisper.cpp\models\download-ggml-model.cmd tiny.en - Build solution:
dotnet build - Run tests:
dotnet test - Run UI app:
dotnet run --project src/WhisperKey.UI - Run Helper:
dotnet run --project src/WhisperKey.Helper
# Build all projects
dotnet build
# Run tests
dotnet test
# Clean solution
dotnet clean- .NET 8: Latest .NET runtime with nullable reference types
- Windows App SDK 1.7: Modern Windows application development
- WinUI 3: Native Windows UI framework
- WinForms: Legacy UI for Helper process (STA thread requirement)
- NAudio 2.2.1: WASAPI audio capture and processing
- WASAPI: Low-latency Windows audio API
- Silero VAD: Voice Activity Detection model
- whisper.cpp: Native C++ implementation of Whisper
- P/Invoke: Native library integration
- HuggingFace: Model repository and download management
- Windows API: SendInput, Clipboard, UIA integration
- TSF: Text Services Framework for IME integration
- PowerShell: Console application text insertion
- Named Pipes: High-performance IPC between processes
- JSON-RPC: Structured communication protocol
- Timeout Handling: Robust error handling and recovery
- MSTest: Primary testing framework
- Moq: Mocking framework for dependencies
- FluentAssertions: Readable test assertions
- Structured Logging: Microsoft.Extensions.Logging
- JSON Configuration: Hierarchical settings with schema validation
- Local Storage: Privacy-first local data storage
- Resource Management: Proper disposal patterns and async handling
The application uses a two-process architecture for reliability and separation of concerns:
- UI Process: Owns audio capture and STT processing
- Helper Process: Owns text insertion and global hotkeys
- IPC Protocol: Clean communication via named pipes with JSON-RPC
- Dependency Injection: Microsoft.Extensions.DependencyInjection throughout
- Interface Segregation: Small, focused interfaces with single responsibilities
- Event-Driven: Heavy use of events for async communication
- Configuration: Hierarchical configuration with user overrides
- Structured Logging: Comprehensive logging with ILogger
- Error Taxonomy: Categorized error handling and reporting
- Crash Recovery: Graceful handling of failures and recovery
- Timeout Protection: Prevents hanging operations
- Local Processing: All audio processing happens locally
- No Telemetry: Disabled by default, local-only when enabled
- Minimal Data: No persistent audio data storage
- User Control: Full control over data and settings
- Elevation Detection: Proper handling of elevated applications
- Secure Desktop: Detection and appropriate handling
- Password Field Detection: Skips insertion in sensitive fields
- Self-Protection: Prevents targeting own application windows
ARCHITECTURE_OVERVIEW.md- System architecture and design decisionsDEVELOPMENT_PLAN.md- Step-by-step implementation roadmapSECURITY_AND_PRIVACY.md- Privacy and security considerationsQA_TEST_PLAN.md- Testing strategy and validation
TBD - Will be determined based on final distribution model