This repository contains the code for the paper . QNI-SCFT updates only salient columns, while injecting Gaussian noise scaled by qunatization step size into non-salient columns.
We highly recommend to use docker image that supports CUDA. For our experiments we used the following image:
# pull image
docker pull pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel
Run container and install Git:
# run container
docker run -it --gpus all --ipc=host -v {local_storage}:{docker_container_storage} pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel
# install git
apt update && apt install git -y
- Clone the QNI-SCFT repository:
git clone
cd qni_scft
- Install QNI-SCFT integration into huggingface's Transformers library and additional packages:
# transformers
cd transformers_modified
pip install .
pip install sentencepiece
pip install protobuf
# configs
pip install ml_collections
pip install wandb
- Install lm-evaluation-harness for evaluation:
pip install lm-eval
- python 3.10.13
- pytorch 2.1.0
- cuda12.1
- cudnn8
Experiments were conducted on NVIDIA A100 GPU with 80GB memory.
To find salient columns for fine-tuning, it is necessary to compute sensitivity metrics. The metrics for a full-precision LLM can be estimated by the following script:
bash sensitivity_metrics/salientcolumns/scripts/
Note: Only the LLaMA models can be fine-tuned.
To fine-tune salient columns of a full-precision LLM with quantization noise injection, run the following command:
python llm_tune/ --config_path=llm_tune/configs/
Finally, you can run evaluation on zero-shot tasks benchmarks with lm-evaluation-harness by the following command:
lm_eval --model hf \
--model_args "pretrained=<path to the directory with the fine-tuned model>" \
--tasks winogrande,hellaswag,swag,boolq,xwinograd_en \
--batch_size 16 \
--num_fewshot 0 \
--device cuda
If you plan to use our work in your projects, please consider citing our paper: