Name	Name	Last commit message	Last commit date
parent directory ..
mechinterp	mechinterp
memorization	memorization
metrics	metrics
safety360	safety360
unlearning	unlearning
README.md	README.md

Name

Last commit message

Last commit date

Directory Overview

Welcome to the analysis/ directory! This folder contains various analysis implementations for LLM360 models. Each subfolder is an independent and self-contained module with setup instructions, relying soley on the code within the subfolder.

Data memorization (memorization/) evaluates model memorization of the training data.
LLM Unlearning (unlearn/) implements machine unlearning methods to remove an LLM's hazardous knowledge.
Safety360 (safety360/) contains modules to measure model safety:
- bold/ provides sentiment analysis with BOLD dataset.
- toxic_detection/ measures model's capability to identify toxic text.
- toxigen/ evaluate model's toxicity on text generation.
- wmdp/ evaluate model's hazardous knowledge.
Mechanistic Interpretability (mechinterp/) contains packages visualizing algorithms executed by LLMs during inference.
Evaluation metrics (metrics/) contains modules for model evaluation:
- harness/ provides instructions to evaluate models following the Open LLM Leaderboard.
- ppl/ evaluates model per-token perplexity

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Directory Overview

FilesExpand file tree

analysis

Directory actions

More options

Directory actions

More options

Latest commit

History

analysis

Folders and files

parent directory

README.md

Directory Overview