Name	Name	Last commit message	Last commit date
parent directory ..
results	results
src/mimicscribe_bench	src/mimicscribe_bench
.gitignore	.gitignore
README.md	README.md
pyannote_baseline.py	pyannote_baseline.py
pyproject.toml	pyproject.toml

Name

Last commit message

Last commit date

results

src/mimicscribe_bench

MimicScribe Diarization Benchmark

Measures speaker diarization accuracy of the MimicScribe pipeline: Parakeet TDT 0.6B (on-device ASR) + Pyannote (on-device diarization) + Gemini 3 Flash (LLM speaker attribution).

Corpora

Corpus	Scenario	Sessions	Hours	Speakers
AMI IHM-mix	Headset mix (simulates multi-mic remote call)	16 eval	~8h	4 per session
Earnings-21	Real corporate conference calls	11 eval	~10h	2-15

Results

See results/RESULTS.md after running the benchmark.

Prerequisites

macOS 15.0+ with Xcode and Swift 6.2+
MimicScribe built and run at least once (to download CoreML ASR + diarization models)
Gemini API key configured in the project root .env file (the pipeline calls Gemini for speaker attribution)
Python 3.10+ for the benchmark harness
HuggingFace token (optional) — required if Earnings-21 becomes gated. Set HF_TOKEN or HUGGINGFACE_PAT_READ in .env.

Quick Start

cd benchmark
python -m venv .venv && source .venv/bin/activate
pip install -e .

# Download eval data (~10 GB for both corpora)
bench-download

# Run MimicScribe pipeline on all audio files
bench-run

# Score and generate results
bench-score

Or run everything at once:

bench-all

Run a single corpus

bench-download --corpus ami
bench-run --corpus ami
bench-score --corpus ami

Metrics

DER (Diarization Error Rate): missed speech + false alarm + speaker confusion. Standard 0.25s collar.
Missed: speech in reference but not in hypothesis
False Alarm: speech in hypothesis but not in reference
Confusion: speech attributed to the wrong speaker

How It Works

Download: Fetches audio + ground-truth RTTM files from AMI and Earnings-21
Run: Runs each audio file through swift run mimicscribe --process-file (ASR + diarization + Gemini speaker attribution), then exports segments from SQLite as hypothesis RTTM files
Score: Compares hypothesis RTTM against reference RTTM using pyannote.metrics, renders results to results/RESULTS.md

Attribution

AMI Meeting Corpus: Carletta, J. et al. (2005). The AMI Meeting Corpus: A Pre-announcement. CC BY 4.0
AMI RTTM ground truth: BUTSpeechFIT/AMI-diarization-setup
Earnings-21: Del Rio, M. et al. (2021). Earnings-21: A Practical Benchmark for ASR in the Wild. Interspeech 2021. speech-datasets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

MimicScribe Diarization Benchmark

Corpora

Results

Prerequisites

Quick Start

Run a single corpus

Metrics

How It Works

Attribution

FilesExpand file tree

diarization

Directory actions

More options

Directory actions

More options

Latest commit

History

diarization

Folders and files

parent directory

README.md

MimicScribe Diarization Benchmark

Corpora

Results

Prerequisites

Quick Start

Run a single corpus

Metrics

How It Works

Attribution