Skip to content

mathiisk/TST-Mechanistic-Interpretability

Repository files navigation

Transformer Time Series Interpretability Toolkit

This repository provides an end-to-end workflow for analysing Transformer-based time series classification (TSC) models through mechanistic interpretability methods. It contains ready-to-run notebooks, a modular training script and a collection of pre-trained models.

Author: Matiss Kalnare Supervisor: Niki van Stein

Repository Structure

Notebooks/             - Interactive notebooks demonstrating the two analysis pipelines
  Patching.ipynb       - Activation patching/causal tracing walkthrough
  SAE.ipynb            - Sparse Autoencoder exploration
  IPYNB_to_PY/         - Python script versions of the notebooks

Utilities/             - Helper code
  TST_trainer.py       - Training/evaluation script and model definition
  utils.py             - Patching and plotting utilities

TST_models/            - Pre-trained models for several datasets
SAE_models/            - Example sparse autoencoder weights
Results/               - Example results (plots, patched predictions, ...)
requirements.txt       - Python package requirements

Installation

  1. Clone the repository and install dependencies
    git clone https://github.com/mathiisk/TSTpatching.git
    cd TSTpatching
    pip install -r requirements.txt
    A GPU with CUDA is recommended but the code also runs on CPU.

Quick Start

Pre-trained weights for common datasets are provided in TST_models. You can immediately run the notebooks to reproduce the experiments.

Open the activation patching notebook:

jupyter notebook Notebooks/Patching.ipynb

or the sparse autoencoder notebook:

jupyter notebook Notebooks/SAE.ipynb

Step through the cells to load a model, run the analysis and display plots. The notebooks assume the working directory is the repository root.

Training a New Model

Utilities/TST_trainer.py can train a fresh Transformer on any dataset from timeseriesclassification.com.

python Utilities/TST_trainer.py --dataset DATASET_NAME --epochs NUM_EPOCHS --batch_size BATCH_SIZE
  • DATASET_NAME should match one of the names on the website, e.g. JapaneseVowels.
  • NUM_EPOCHS defaults to 100 if not provided.
  • BATCH_SIZE defaults to 32.

The resulting weights are stored as TST_<dataset>.pth under TST_models/.

Sparse Autoencoders

The notebook Notebooks/SAE.ipynb trains an autoencoder on intermediate activations of a Transformer. It highlights interpretable concepts that the model relies on. Pre-trained SAE weights are stored in SAE_models/ and can be loaded by the notebook.

Output & Results

All figures and intermediate outputs generated by the notebooks are stored under Results/ by default. Separate folders exist for each dataset so you can keep experiments organised.

BSc Thesis Context

This code base accompanies a Bachelor thesis exploring whether interpretability techniques from NLP, namely activation patching and sparse autoencoders, can reveal causal mechanisms inside Transformer-based time series classifiers. The provided scripts and notebooks allow anyone to reproduce and extend the experiments.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors