This repository contains code for the paper "Implicit Representations of Grammaticality in Language Models".
pip install -r requirements.txtling_comp/
├── baseline/ # Inference and few-shot evaluation
│ ├── inference.py # Extract hidden states and log-probs
│ ├── fewshot.py # Few-shot prompting evaluation
│ └── helpers.py # Sharding utilities
├── data/ # Dataset loading and processing
│ ├── format_data.py # Dataset class definitions
│ └── helpers.py # Tokenization and perturbation functions
├── models/ # Model wrappers
│ ├── all_models.py # HuggingFace model interface
│ └── helpers.py # Model configs and layer extraction
├── probe/ # Probing classifiers
│ ├── l2_classifier.py # L2 probe training
│ ├── l1_classifier.py # LASSO probe training
│ ├── surprisal_probe.py # Surprisal probe training + testing
│ ├── test_classifier.py # Test L2/LASSO probes
│ └── helpers.py # Helper functions
└── results/ # Output directory for inference and probe results
| Module | Description | Details |
|---|---|---|
| baseline | Inference and few-shot evaluation | Extract hidden states and log-probs from LLMs |
| data | Dataset loading and preprocessing | Load acceptability judgment benchmarks and generate perturbations |
| models | Model wrappers | HuggingFace model interface for causal LMs |
| probe | Probing classifiers | L1/L2 logistic regression and surprisal probes |
Extract hidden states and compute log-probabilities:
# Run inference on BLiMP
python -m baseline.inference --data blimp --model olmo2-7B
# Run inference on synthetic data with all perturbations
python -m baseline.inference --data synthetic --model olmo2-7B --perturb all
# Run incremental inference (per-token)
python -m baseline.inference --data blimp --model olmo2-7B --incrementalTrain and evaluate layer-wise L2-regularized probes:
# Train
python -m probe.l2_classifier --data synthetic --model olmo2-7B --start_exp -2 --end_exp 5
# Test
python -m probe.test_classifier --model olmo2-7B --eval_data BLiMP --train_data synthetic --start_exp -2 --end_exp 5
# Train / Eval with logprob as an extra feature
python -m ... --add_probTrain and evaluate LASSO probes with feature selection:
# Train
python -m probe.l1_classifier --data synthetic --model olmo2-7B
# Test: choose target_ratio from [0.01, 0.05, 0.1, 0.5]
python -m probe.test_classifier --model olmo2-7B --eval_data BLiMP --train_data synthetic --target_ratio 0.01
# Train and Test with 30 randomly selected subsets of neurons
python -m ... --randomTrain and evaluate surprisal prediction:
# Train on last token only
python -m probe.surprisal_probe --data synthetic --model olmo2-7B
# Evaluate on BLiMP on last token only
python -m probe.surprisal_probe --data synthetic --model olmo2-7B --eval_data blimp
# Train / evaluate on all token positions
python -m ... --incremental| Models |
|---|
| olmo2-7B |
| olmo3-7B |
| gemma2-2B |
| gemma2-9B |
| llama3-1-8B |
| llama-3-2-1B |
| Dataset | Language | Type | Paired |
|---|---|---|---|
| blimp | English | Grammaticality | Yes |
| cola | English | Grammaticality | No |
| syntaxgym | English | Grammaticality | No |
| plausibility | English | Plausibility | Yes |
| blimp-nl | Dutch | Grammaticality | Yes |
| scala | Swedish | Grammaticality | No |
| itacola | Italian | Grammaticality | No |
| rucola | Russian | Grammaticality | No |
| jcola | Japanese | Grammaticality | No |
| sling | Chinese | Grammaticality | Yes |
| synthetic (ptb, gutenberg-dpo) | English | Perturbation-based | Yes |
See LICENSE for details.