LLM-Trace is a tooling and interpretability library designed to treat language models as observable systems. It exposes the internal computation paths of transformer-based models, focusing on attention-based circuits and token-level influence.
Unlike standard evaluation tools that analyze inputs and outputs, LLM-Trace intercepts intermediate activations to explain how a model reaches a conclusion, bridging the gap between academic mechanistic interpretability and engineering diagnostics.
- Project Philosophy
- Core Features
- System Architecture
- Scope & Non-Goals
- Installation
- Roadmap
- Contributing
- License
LLM-Trace is built on the premise that modern language models require observability, not just evaluation. The project adheres to three core tenets:
- Measurement > Aesthetics: While visualization is supported, the primary goal is quantifiable metrics of influence (e.g., attention scores, activation magnitudes).
- Causality > Correlation: We aim to validate findings through ablation and counterfactual testing, ensuring that detected circuits actually drive model behavior.
- Systems Thinking: The model is treated as a computational graph where specific nodes (heads, layers) perform distinct sub-tasks.
-
Attention Hooking: Direct extraction of attention matrices (
$A$ ), Queries ($Q$ ), and Keys ($K$ ) from intermediate transformer layers without altering model weights. - Activation Tracing: Capture residual stream states at critical junctures.
- Token Influence Mapping: Identify which source tokens (e.g., a specific instruction) contribute most heavily to a target token's generation.
- Dependency Graphs: Visualize recurring patterns of information flow across layers.
- Ablation Testing: Tools to mask or zero-out specific attention heads to verify their contribution to the output.
- Prompt-Set Evaluation: Batch processing capabilities to test if a circuit holds true across varied inputs.
- A dedicated React-based dashboard for navigating token-to-token relationships, heatmaps, and layer-wise comparisons.
LLM-Trace operates via a split architecture designed for performance and rigor:
- The Backend (Python/PyTorch): Handles model loading, hook injection, tensor manipulation, and metric calculation. It produces structured JSON/Tensor data representing the model's internal state.
- The Frontend (React/Node.js): Consumes structured data to render high-fidelity inspection interfaces. It acts as the "DevTools" for the model.
- Input: A prompt is fed into a Transformer model.
- Interception: Custom hooks capture activations during the forward pass.
- Analysis: Raw tensors are processed into influence metrics (Intervention).
- Output: Structured trace data is returned for visualization or programmatic analysis.
- Internal behavior inspection during inference.
- Mechanistic interpretability workflows.
- Safety audits via internal state monitoring.
- Small-to-Medium sized Transformer models (e.g., GPT-2, TinyLlama).
- Training/Fine-tuning: This is an inference-only tool.
- Black-box Evaluation: We do not provide benchmarks like MMLU.
- Prompt Engineering: We analyze prompts, we do not optimize them.
- Python 3.10+
- Node.js 18+ (for Frontend dashboard)
- PyTorch (CUDA/MPS recommended)
# Clone the repository
git clone https://github.com/your-org/LLM-Trace.git
cd LLM-Trace
# Install Backend Dependencies
pip install -r requirements.txtWe welcome contributions from researchers and engineers. However, because this is a scientific tool, strict standards apply regarding code correctness, testing, and commit hygiene. Please read Contribution Guide before submitting a Pull Request.
- This project is licensed under the MIT License. See the LICENSE file for details. LLM-Trace is an open-source initiative to demystify machine intelligence.