Audio Processing Pipeline

Overview

This project provides a comprehensive pipeline for audio processing, feature extraction, noise reduction, and data augmentation. It is optimized for machine learning and deep learning applications and leverages GPU acceleration for efficient computation.

Features

1. Audio Loading & Conversion

Supports multiple audio formats: WAV, MP3, FLAC, OGG
Converts stereo audio to mono
Resampling (e.g., 44.1kHz → 16kHz)
Normalizes audio amplitude (peak or RMS normalization)
Trims leading/trailing silence

2. Spectrogram Representations

(a) Time-Frequency Representations

Short-Time Fourier Transform (STFT)
Inverse STFT (reconstruct waveform)
Constant-Q Transform (CQT)
Inverse CQT

(b) Mel-based Representations

Mel-Spectrogram (log-scaled frequency)
MFCC (Mel-Frequency Cepstral Coefficients)
Inverse Mel-Spectrogram

(c) Other Spectral Features

Chroma Feature Extraction (musical pitch representation)
Spectral Centroid (brightness of a sound)
Spectral Bandwidth & Spectral Contrast
Spectral Rolloff (frequency where energy drops)

3. Audio Augmentation (Data Augmentation Techniques)

(a) Time-domain Augmentations

Time Shifting (randomly shift audio forward/backward)
Volume Perturbation (increase/decrease amplitude)
Add Background Noise (white noise, pink noise)
Reverberation (simulate echo effects)
Time Stretching (speed up or slow down audio without pitch shift)
Pitch Shifting (change pitch while keeping speed constant)

(b) Frequency-domain Augmentations

SpecAugment (Time Masking & Frequency Masking)
Equalization (boost/cut specific frequency ranges)
Bandpass & Lowpass Filtering

4. Feature Engineering for ML/DL Models

Zero-Crossing Rate (ZCR)
Root Mean Square Energy (RMSE)
Harmonic-to-Noise Ratio (HNR)
Chromagram Features

5. Noise Reduction & Enhancement

Spectral Subtraction (removes stationary noise)
Wiener Filtering (adaptive noise filtering)
Adaptive Noise Reduction (Deep Learning-based methods) - Will be implemented in future

6. Voice Activity Detection (VAD) & Speaker Separation [ In Progress , Currently working on ]

Detect & Remove Silence Segments (Energy-based or Deep Learning-based)
Speaker Diarization (separate multiple speakers in one recording)

7. Batch Processing & Parallelism

Efficient Processing of Large Datasets (parallelized audio processing)
GPU Acceleration for spectrogram computation (using torchaudio/tensorflow)

Dependencies

librosa
torchaudio
numpy
scipy
pydub (for audio format conversion)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
configuration		configuration
finetune/whisper		finetune/whisper
preprocessing		preprocessing
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio Processing Pipeline

Overview

Features

1. Audio Loading & Conversion

2. Spectrogram Representations

(a) Time-Frequency Representations

(b) Mel-based Representations

(c) Other Spectral Features

3. Audio Augmentation (Data Augmentation Techniques)

(a) Time-domain Augmentations

(b) Frequency-domain Augmentations

4. Feature Engineering for ML/DL Models

5. Noise Reduction & Enhancement

6. Voice Activity Detection (VAD) & Speaker Separation [ In Progress , Currently working on ]

7. Batch Processing & Parallelism

Dependencies

About

Uh oh!

Releases

Packages

Languages

Coder1010ayush/audio_transformer

Folders and files

Latest commit

History

Repository files navigation

Audio Processing Pipeline

Overview

Features

1. Audio Loading & Conversion

2. Spectrogram Representations

(a) Time-Frequency Representations

(b) Mel-based Representations

(c) Other Spectral Features

3. Audio Augmentation (Data Augmentation Techniques)

(a) Time-domain Augmentations

(b) Frequency-domain Augmentations

4. Feature Engineering for ML/DL Models

5. Noise Reduction & Enhancement

6. Voice Activity Detection (VAD) & Speaker Separation [ In Progress , Currently working on ]

7. Batch Processing & Parallelism

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages