Welcome to the analysis/ directory! This folder contains various analysis implementations for LLM360 models. Each subfolder is an independent and self-contained module with setup instructions, relying soley on the code within the subfolder.
- Data memorization (
memorization/) evaluates model memorization of the training data. - LLM Unlearning (
unlearn/) implements machine unlearning methods to remove an LLM's hazardous knowledge. - Safety360 (
safety360/) contains modules to measure model safety:bold/provides sentiment analysis with BOLD dataset.toxic_detection/measures model's capability to identify toxic text.toxigen/evaluate model's toxicity on text generation.wmdp/evaluate model's hazardous knowledge.
- Mechanistic Interpretability (
mechinterp/) contains packages visualizing algorithms executed by LLMs during inference. - Evaluation metrics (
metrics/) contains modules for model evaluation:harness/provides instructions to evaluate models following the Open LLM Leaderboard.ppl/evaluates model per-token perplexity