wav-files-ss

A command-line tool for recursively analyzing WAV audio files in a directory (including subfolders) to detect human speech using advanced Voice Activity Detection (VAD) based on WebRTC algorithms. Files are separated into speech and non_speech subfolders within the output directory, preserving the original folder structure.

Features

Recursive Processing: Scans input directory and all subdirectories for .wav files.
Advanced Speech Detection: Uses WebRTC VAD (via earshot crate) with energy-based statistical analysis on 20ms frames at 16kHz for robust detection in noisy environments.
Structure Preservation: Copies files to output while maintaining relative paths.
Error Handling: Graceful handling of unsupported formats/channels with informative messages.

Installation

Ensure you have Rust installed (version 1.75+ recommended).

Clone the repository:

git clone https://github.com/RustedBytes/wav-files-ss
cd wav-files-ss

Build the project:
```
cargo build --release
```
The binary will be available at target/release/wav-files-ss.

Usage

Run the tool with the input directory (required) and optional output directory:

wav-files-ss [OPTIONS] <INPUT_DIR>

Args:
  <INPUT_DIR>    Input directory containing WAV files (processed recursively)

Options:
  -o, --output-dir <OUTPUT_DIR>    Output directory for separated files (creates 'speech' and 'non_speech' subfolders). Defaults to 'output' in the current directory
  -h, --help                       Print help
  -V, --version                    Print version

Example

# Process ./audio_samples/ and output to ./results/
wav-files-ss ./audio_samples/ -o ./results/

This will:

Create ./results/speech/ and ./results/non_speech/.
Copy detected files while preserving subfolder structure (e.g., ./audio_samples/sub/dir/file.wav → ./results/speech/sub/dir/file.wav).

Building and Development

Dependencies: Managed via Cargo.toml. Key crates:
- clap: CLI argument parsing.
- hound: WAV file I/O.
- walkdir: Recursive directory traversal.
- anyhow: Error handling.
- earshot: WebRTC VAD implementation.
Run Tests:
```
cargo test
```
Includes unit tests for VAD analysis (silence, speech simulation, edge cases).
Formatting and Linting:
```
cargo fmt
cargo clippy
```

Limitations

Supports only 16-bit integer WAV files (PCM format assumed).
Stereo files are downmixed to mono; multi-channel (>2) unsupported.
VAD tuned for English-like speech; may need adjustment for other languages/noise profiles.
Offline processing only; no real-time mode.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

wav-files-ss

Features

Installation

Usage

Example

Building and Development

Limitations

About

Uh oh!

Languages

RustedBytes/wav-files-ss

Folders and files

Latest commit

History

Repository files navigation

wav-files-ss

Features

Installation

Usage

Example

Building and Development

Limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages