Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

README.md

MimicScribe Diarization Benchmark

Measures speaker diarization accuracy of the MimicScribe pipeline: Parakeet TDT 0.6B (on-device ASR) + Pyannote (on-device diarization) + Gemini 3 Flash (LLM speaker attribution).

Corpora

Corpus Scenario Sessions Hours Speakers
AMI IHM-mix Headset mix (simulates multi-mic remote call) 16 eval ~8h 4 per session
Earnings-21 Real corporate conference calls 11 eval ~10h 2-15

Results

See results/RESULTS.md after running the benchmark.

Prerequisites

  • macOS 15.0+ with Xcode and Swift 6.2+
  • MimicScribe built and run at least once (to download CoreML ASR + diarization models)
  • Gemini API key configured in the project root .env file (the pipeline calls Gemini for speaker attribution)
  • Python 3.10+ for the benchmark harness
  • HuggingFace token (optional) — required if Earnings-21 becomes gated. Set HF_TOKEN or HUGGINGFACE_PAT_READ in .env.

Quick Start

cd benchmark
python -m venv .venv && source .venv/bin/activate
pip install -e .

# Download eval data (~10 GB for both corpora)
bench-download

# Run MimicScribe pipeline on all audio files
bench-run

# Score and generate results
bench-score

Or run everything at once:

bench-all

Run a single corpus

bench-download --corpus ami
bench-run --corpus ami
bench-score --corpus ami

Metrics

  • DER (Diarization Error Rate): missed speech + false alarm + speaker confusion. Standard 0.25s collar.
  • Missed: speech in reference but not in hypothesis
  • False Alarm: speech in hypothesis but not in reference
  • Confusion: speech attributed to the wrong speaker

How It Works

  1. Download: Fetches audio + ground-truth RTTM files from AMI and Earnings-21
  2. Run: Runs each audio file through swift run mimicscribe --process-file (ASR + diarization + Gemini speaker attribution), then exports segments from SQLite as hypothesis RTTM files
  3. Score: Compares hypothesis RTTM against reference RTTM using pyannote.metrics, renders results to results/RESULTS.md

Attribution

  • AMI Meeting Corpus: Carletta, J. et al. (2005). The AMI Meeting Corpus: A Pre-announcement. CC BY 4.0
  • AMI RTTM ground truth: BUTSpeechFIT/AMI-diarization-setup
  • Earnings-21: Del Rio, M. et al. (2021). Earnings-21: A Practical Benchmark for ASR in the Wild. Interspeech 2021. speech-datasets