Skip to content

RustedBytes/wav-files-ss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wav-files-ss

A command-line tool for recursively analyzing WAV audio files in a directory (including subfolders) to detect human speech using advanced Voice Activity Detection (VAD) based on WebRTC algorithms. Files are separated into speech and non_speech subfolders within the output directory, preserving the original folder structure.

Features

  • Recursive Processing: Scans input directory and all subdirectories for .wav files.
  • Advanced Speech Detection: Uses WebRTC VAD (via earshot crate) with energy-based statistical analysis on 20ms frames at 16kHz for robust detection in noisy environments.
  • Structure Preservation: Copies files to output while maintaining relative paths.
  • Error Handling: Graceful handling of unsupported formats/channels with informative messages.

Installation

  1. Ensure you have Rust installed (version 1.75+ recommended).
  2. Clone the repository:
    git clone https://github.com/RustedBytes/wav-files-ss
    cd wav-files-ss
    
  3. Build the project:
    cargo build --release
    
    The binary will be available at target/release/wav-files-ss.

Usage

Run the tool with the input directory (required) and optional output directory:

wav-files-ss [OPTIONS] <INPUT_DIR>

Args:
  <INPUT_DIR>    Input directory containing WAV files (processed recursively)

Options:
  -o, --output-dir <OUTPUT_DIR>    Output directory for separated files (creates 'speech' and 'non_speech' subfolders). Defaults to 'output' in the current directory
  -h, --help                       Print help
  -V, --version                    Print version

Example

# Process ./audio_samples/ and output to ./results/
wav-files-ss ./audio_samples/ -o ./results/

This will:

  • Create ./results/speech/ and ./results/non_speech/.
  • Copy detected files while preserving subfolder structure (e.g., ./audio_samples/sub/dir/file.wav./results/speech/sub/dir/file.wav).

Building and Development

  • Dependencies: Managed via Cargo.toml. Key crates:
    • clap: CLI argument parsing.
    • hound: WAV file I/O.
    • walkdir: Recursive directory traversal.
    • anyhow: Error handling.
    • earshot: WebRTC VAD implementation.
  • Run Tests:
    cargo test
    
    Includes unit tests for VAD analysis (silence, speech simulation, edge cases).
  • Formatting and Linting:
    cargo fmt
    cargo clippy
    

Limitations

  • Supports only 16-bit integer WAV files (PCM format assumed).
  • Stereo files are downmixed to mono; multi-channel (>2) unsupported.
  • VAD tuned for English-like speech; may need adjustment for other languages/noise profiles.
  • Offline processing only; no real-time mode.

About

Speech Separation for WAV files

Topics

Resources

Stars

Watchers

Forks

Languages