Paper: arxiv.org/abs/2312.15230
Authors: Max Zimmer, Megi Andoni, Christoph Spiegel, Sebastian Pokutta
This repository contains the official implementation of PERP, a framework for pruning and retraining/reconstructing Large Language Models (LLMs). The code is built on PyTorch and uses Weights & Biases for experiment tracking.
- Support for multiple LLM architectures (OPT, LLaMA-2, Mistral, Mixtral)
- Various pruning methods (magnitude, random, WANDA, SparseGPT)
- Sparsity-preserving reconstruction/retraining approaches such as MaskLoRA and ScaleLoRA
- Semi-Structured and unstructured sparsity patterns
- Parameter-Efficient Fine-Tuning (PEFT) integration
- Comprehensive experiment tracking with W&B
The main entry point is main.py
. If passed the --debug
flag, the code will run in debug mode, executing the default configuration as specified in the defaults
dictionary. Otherwise, the code expectes to be started by a WandB sweep agent. Hence, to run an experiment, either configure the parameters in main.py
or use Weights & Biases sweeps. Key parameters include:
-
training_mode
: Whether to retrain or reconstruct the model, must be one ofretrain
,reconstruct
, orNone
(in which case the model is not retrained/reconstructed). Retraining is a full retraining of the model using the overall loss. Reconstruction operates layerwise by minimizing the per-layer$L_2$ -deviation loss. -
peft_strategy
: Whether to use SelectivePEFT or FullFT. FullFT retrains all parameters, consequently requires more memory. SelectivePEFT allows for more parameter-efficient fine-tuning, which can be further specified by thepeft_use_*
parameters. -
lora_type
: WhenSelectivePEFT
is used andpeft_use_lora
isTrue
, this parameter specifies the LoRA-variant to use. Options arelora
,lora_prune
,scale_lora
,mask_lora
, corresponding to the variants proposed in the paper.
If you find this work useful for your research, please consider citing:
@article{zimmer2023perp,
title={Perp: Rethinking the prune-retrain paradigm in the era of llms},
author={Zimmer, Max and Andoni, Megi and Spiegel, Christoph and Pokutta, Sebastian},
journal={arXiv preprint arXiv:2312.15230},
year={2023}
}