A command-line tool for recursively analyzing WAV audio files in a directory (including subfolders) to detect human speech using advanced Voice Activity Detection (VAD) based on WebRTC algorithms. Files are separated into speech
and non_speech
subfolders within the output directory, preserving the original folder structure.
- Recursive Processing: Scans input directory and all subdirectories for
.wav
files. - Advanced Speech Detection: Uses WebRTC VAD (via
earshot
crate) with energy-based statistical analysis on 20ms frames at 16kHz for robust detection in noisy environments. - Structure Preservation: Copies files to output while maintaining relative paths.
- Error Handling: Graceful handling of unsupported formats/channels with informative messages.
- Ensure you have Rust installed (version 1.75+ recommended).
- Clone the repository:
git clone https://github.com/RustedBytes/wav-files-ss cd wav-files-ss
- Build the project:
The binary will be available at
cargo build --release
target/release/wav-files-ss
.
Run the tool with the input directory (required) and optional output directory:
wav-files-ss [OPTIONS] <INPUT_DIR>
Args:
<INPUT_DIR> Input directory containing WAV files (processed recursively)
Options:
-o, --output-dir <OUTPUT_DIR> Output directory for separated files (creates 'speech' and 'non_speech' subfolders). Defaults to 'output' in the current directory
-h, --help Print help
-V, --version Print version
# Process ./audio_samples/ and output to ./results/
wav-files-ss ./audio_samples/ -o ./results/
This will:
- Create
./results/speech/
and./results/non_speech/
. - Copy detected files while preserving subfolder structure (e.g.,
./audio_samples/sub/dir/file.wav
→./results/speech/sub/dir/file.wav
).
- Dependencies: Managed via
Cargo.toml
. Key crates:clap
: CLI argument parsing.hound
: WAV file I/O.walkdir
: Recursive directory traversal.anyhow
: Error handling.earshot
: WebRTC VAD implementation.
- Run Tests:
Includes unit tests for VAD analysis (silence, speech simulation, edge cases).
cargo test
- Formatting and Linting:
cargo fmt cargo clippy
- Supports only 16-bit integer WAV files (PCM format assumed).
- Stereo files are downmixed to mono; multi-channel (>2) unsupported.
- VAD tuned for English-like speech; may need adjustment for other languages/noise profiles.
- Offline processing only; no real-time mode.