Skip to content

Shanzita/whisper-patching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Whisper Encoder Interpretability via Activation Patching

Understanding how Whisper's encoder represents acoustic and linguistic information through counterfactual activation patching experiments.

Overview

This repository contains two interpretability experiments on OpenAI's Whisper model, inspired by techniques from mechanistic interpretability research like PatchScope. We use activation patching to probe how Whisper's encoder layers process and represent speech.

Key Findings:

  • Layer-specific representations: Same-layer patching achieves ~95% override accuracy vs ~54% for cross-layer patching
  • Language-agnostic encoding: 66.6% cross-lingual transfer rate suggests encoder representations are largely language-independent
  • Asymmetric layer influence: Later layers show stronger forward transfer to adjacent layers

Experiments

Experiment A: Bidirectional Patching (Monolingual)

Measures how encoder representations from one word can override the processing of a different word.

Methodology:

  1. Record encoder hidden states from Word A (e.g., "bat")
  2. Patch these states into encoder processing of Word B (e.g., "cat")
  3. Measure if output changes toward Word A

Dataset: 869 English word pairs including:

  • Minimal pairs (consonant contrasts): bat/pat, big/pig, etc.
  • Vowel contrasts: bit/bat, ship/sheep, etc.
  • Semantic pairs: apple/orange, hot/cold, etc.

Results:

Metric Value
Diagonal (same-layer) 0.951
Off-diagonal (cross-layer) 0.536
Max override Layer 3→3 (0.954)
Override Accuracy Matrix:
[[0.95  0.50  0.50  0.50]
 [0.50  0.95  0.50  0.51]
 [0.50  0.64  0.95  0.52]
 [0.50  0.52  0.74  0.95]]

Interpretation: The strong diagonal pattern indicates layer-specific representations—each layer encodes information in a format most compatible with the same layer position.


Experiment B: Cross-Lingual Transfer

Tests whether English encoder representations can influence Spanish audio processing.

Methodology:

  1. Process English word (e.g., "telephone") and capture encoder states
  2. Patch English states into Spanish cognate processing (e.g., "teléfono")
  3. Check if English word appears in output

Dataset: 162 English-Spanish pairs including:

  • Cognates: telephone/teléfono, hospital/hospital, chocolate/chocolate
  • Non-cognates: cat/gato, house/casa, water/agua

Results:

Metric Value
Overall transfer rate 0.666
Same-layer transfer 0.708
Cross-layer transfer 0.652
Best transfer Layer 1→0 (1.000)
Worst transfer Layer 1→3 (0.019)
Transfer Success Matrix:
[[0.72  0.93  0.55  0.12]
 [1.00  0.71  0.87  0.02]
 [1.00  0.33  0.70  0.16]
 [0.99  0.99  0.87  0.70]]

Interpretation: High transfer rates suggest Whisper's encoder learns largely language-agnostic acoustic representations, with early/mid layers being most transferable.

Installation

pip install openai-whisper torch torchaudio matplotlib numpy gTTS pydub seaborn

Usage

Run in Google Colab (recommended for GPU access):

# Experiment A - Monolingual patching (~5 hours on T4 GPU)
# Open experiment_a_bidirectional_patching.ipynb

# Experiment B - Cross-lingual transfer (~1 hour on T4 GPU)  
# Open experiment_b_crosslingual_transfer.ipynb

Repository Structure

├── experiment_a_bidirectional_patching.ipynb  # Monolingual patching experiment
├── experiment_b_crosslingual_transfer.ipynb   # Cross-lingual transfer experiment
├── README.md
└── figures/
    ├── experiment_a_heatmap.png
    └── experiment_b_figure.png

Technical Details

Model: Whisper Tiny (37.2M parameters, 4 encoder layers, 4 decoder layers)

Patching Mechanism:

def patch_hook(module, input, output):
    return source_state.to(output.device).to(output.dtype)

hook = model.encoder.blocks[target_layer].register_forward_hook(patch_hook)

Audio Generation: Google Text-to-Speech (gTTS) for consistent synthetic audio

Key Observations

  1. Layer Specialization: Each encoder layer develops specialized representations incompatible with other layer positions (diagonal dominance in Exp A)

  2. Asymmetric Transfer: Later source layers (2, 3) show elevated transfer to adjacent target layers, suggesting hierarchical processing

  3. Language Independence: The encoder appears to build language-agnostic acoustic representations that transfer across languages

  4. Early Layer Universality: Layers 0-1 show the highest cross-lingual transfer, possibly encoding more universal acoustic features

Limitations

  • Uses Whisper Tiny; larger models may show different patterns
  • Synthetic TTS audio may not reflect natural speech characteristics
  • Limited to single-word stimuli
  • Transfer metric is binary (word presence) rather than graded

Future Directions

  • Extend to larger Whisper models (Base, Small, Medium, Large)
  • Test with natural speech recordings
  • Analyze decoder layer interactions
  • Probe specific phonetic feature representations
  • Compare with other multilingual ASR models

Citation

If you use this code in your research, please cite:

@software{whisper_patching_2024,
  title={Whisper Encoder Interpretability via Activation Patching},
  year={2024},
  url={https://github.com/YOUR_USERNAME/whisper-activation-patching}
}

License

MIT License

Acknowledgments

  • OpenAI for the Whisper model
  • Inspired by PatchScope and mechanistic interpretability research

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors