Recreation of scrambled hacks using modern ML-based audio embeddings.
Scrambled Hacks is an audio manipulation system that takes an input audio file and reconstructs it using segments from a database of audio files. Unlike the original MFCC+DTW approach, this version uses:
- CLAP embeddings for semantic audio understanding
- FAISS vector search for efficient similarity matching
- Pluggable reconstruction modules for flexible output generation
- 🎵 Works with music, vocals, speech, and any audio input
- ⚡ Fast similarity search using FAISS
- 🧠 Semantic matching using pre-trained CLAP models
- 🔌 Modular architecture with pluggable components
- 📦 Easy installation and CLI interface
- Python 3.10 or 3.11
- macOS, Linux, or Windows
# Clone the repository
git clone <repo-url>
cd scrambled
# Create environment and install dependencies
make env
# Activate the virtual environment
source venv/bin/activate
# Verify installation
scrambled-hacks hello
# Test CLAP model loading (downloads ~1GB model on first run)
make verify-clapIf you prefer manual setup:
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install torch torchvision torchaudio
pip install -e .The Scrambled Hacks CLI provides three main commands:
Build a searchable database from a collection of audio files:
scrambled-hacks build-db -i ./test_audio -o ./databases/my_databaseOptions:
-i, --input-dir: Directory containing audio files (required)-o, --output: Output directory for the database (required)--index-type: Index type -flat,hnsw, orauto(default: auto)--pattern: File pattern to match (default:*.wav)--device: Device to use -cpu,cuda, ormps(default: cpu)-v, --verbose: Enable detailed logging
What happens:
- Scans input directory for audio files
- Segments each file by beat onsets
- Extracts CLAP embeddings (512-dimensional) for each segment
- Builds a FAISS index for fast similarity search
- Saves database with metadata and statistics
Output structure:
databases/my_database/
├── index.faiss # FAISS index file
├── metadata.json # Segment metadata
├── config.json # Database configuration
└── statistics.json # Build statistics
Display information about an existing database:
scrambled-hacks info ./databases/my_databaseShows segment count, index type, embedding dimensions, source files, and disk usage.
Generate scrambled audio by matching input segments to database segments:
scrambled-hacks generate \
-i ./input.wav \
-d ./databases/my_database \
-o ./output.wavOptions:
-i, --input: Input audio file to scramble (required)-d, --database: Database directory to use (required)-o, --output: Output audio file path (required)--k: Number of nearest neighbors per segment (default: 1)-r, --reconstructor: Reconstruction strategy -simpleordirect(default: simple)simple: Concatenation with 10ms crossfading for smooth transitionsdirect: Direct concatenation without crossfading
--device: Device to use -cpu,cuda, ormps(default: cpu)-v, --verbose: Enable detailed logging
What happens:
- Loads the database
- Segments input audio by beats
- For each input segment, finds k most similar database segments
- Reconstructs output audio from matched segments
- Saves result to output file
# 1. Activate environment
source venv/bin/activate
# 2. Build database from audio collection
scrambled-hacks build-db -i ./test_audio -o ./databases/production_db
# 3. View database info
scrambled-hacks info ./databases/production_db
# 4. Generate scrambled audio
scrambled-hacks generate \
-i ./test_audio/input.wav \
-d ./databases/production_db \
-o ./scrambled_output.wav
# 5. Try different reconstruction strategies
scrambled-hacks generate \
-i ./test_audio/input.wav \
-d ./databases/production_db \
-o ./output_direct.wav \
-r direct- CLAP embedding extraction is slow (~15-20 seconds per file on CPU)
- Building a database from 26 files takes ~10 minutes
- Database auto-selects HNSW index for better search performance on larger collections
- Once built, databases load in <5 seconds
- Generation is much faster than building
No audio files found:
- Check
--patternmatches your files (default:*.wav) - Try patterns like
*.mp3,*.flac, etc.
Memory issues:
- Use HNSW index for large databases
- The flat index loads all vectors into memory
Device errors:
- Use
--device cpufor maximum compatibility mpsworks on Apple Silicon but may have issuescudarequires NVIDIA GPU with CUDA installed
- Environment setup with src/ layout
- Project structure with pyproject.toml
- Basic CLI framework
- CLAP embedding extraction
- Beat-based segmentation
- FAISS database implementation
- Simple reconstruction module
- End-to-end pipeline
See docs/DEVELOPMENT_PLAN.md for full roadmap.
make help # Show all available commands
make env # Create virtual environment and install dependencies
make test # Run tests with pytest
make format # Format code with black and isort
make lint # Run linters (flake8, mypy)
make clean # Remove virtual environment and cache files
make verify-clap # Test CLAP model loadingmake testscrambled/
├── src/
│ └── scrambled_hacks/
│ ├── embeddings/ # CLAP embedding extraction
│ ├── segmentation/ # Beat-based audio segmentation
│ ├── database/ # FAISS vector database
│ ├── reconstruction/ # Audio reconstruction modules
│ ├── cli.py # Command-line interface
│ └── pipeline.py # End-to-end pipeline (coming soon)
├── tests/ # Test suite
├── docs/ # Documentation
├── examples/ # Example workflows
├── pyproject.toml # Project configuration
├── Makefile # Development commands
└── README.md
- PLAN.md - Comprehensive architecture and research
- DEVELOPMENT_PLAN.md - Phased implementation plan
Original Example of Scrambled Hacks
MIT
- CLAP: Contrastive Language-Audio Pretraining
- FAISS: Facebook AI Similarity Search
- librosa: Audio analysis library