This project provides tools and scripts for fine-tuning TinyLlama-1.1B with sitchback technology, evaluating its performance, and analyzing benchmarks. In addition, there is a script for building a custom dataset, as well as logs of fine-tuning and evaluating with an analysis of statistical significance of training acceleration and metrics on benchmarks.
Our experimental results and the research paper can be found at the following link: SwichBack Tiny Llama
Layer | OASST1 | Longform | Custom RU |
---|---|---|---|
nn.Linear (minutes) | 72.57 | 173.19 | 175.46 |
SwichBackLinear (minutes) | 65.50 | 155.73 | 158.85 |
Acceleration (%) | 9.76 | 10.08 | 9.46 |
To reproduce all results, you can use a Docker image setup to ensure all dependencies for fine-tuning and benchmarking workflows are installed. The base image includes Python
, CUDA
, Triton
, transformers
, lm_eval
, and torch
dependencies.
docker pull dmitryredkosk/bitsandbytes_transformer:cuda12.5
In the benchmarking section, you can find the reproducibility of results from the original paper, our results, results from GitHub issues, as well as the script from the official GitHub repository bitsandbytes-repo.
This script sft/script.sh
performs fine-tuning and evaluation iterations for a language model. It runs the fine-tuning process using sft/script.py
, and evaluates the model on specified tasks.
-
BS
: Batch size for training. Default value is 64. -
USE_SWICHBACK
: Boolean flag indicating whether to use switchback mode. Default value is false. -
CUDA_VISIBLE_DEVICES
: Specifies which GPU devices to use. Default value is 0. -
NUM_FTUNES
: Number of fine-tuning iterations to perform. Default value is 1. -
MODEL_NAME_OR_PATH
: Path to the pre-trained model. Default value is ../models/TinyLlama-1.1B-intermediate-step-240k-503b. -
OUTPUT_MODEL_NAME
: Name of the output model. Default value is 503b. -
OUTPUT_DIR
: Directory where fine-tuned models will be saved. Constructed as ../models_iter_ft_swichback_${USE_SWICHBACK}. -
LOG_DIR
: Directory for logging fine-tuning and evaluation outputs. Constructed as ./logs_use_switchback_${USE_SWICHBACK}. -
EVAL_TASKS
: Comma-separated list of evaluation tasks. Default value ishellaswag,boolq,swag,winogrande,xwinograd_en
. -
EVAL_OUTPUT_DIR
: Directory for evaluation results. Constructed as ./lmeval_res_use_switchback_${USE_SWICHBACK}. -
`FT_DATASET: Dataset to use for fine-tuning. Default value is oasst1.
The script creates the following directories if they do not exist:
-
${LOG_DIR}/ft
: For fine-tuning logs. -
${LOG_DIR}/eval
: For evaluation logs. -
${EVAL_OUTPUT_DIR}
: For evaluation results.
-
Ensure that all dependences are installed and paths are correct in
script.sh
. To execute the script, run:bash script.sh
In the ./switchback_layer
, you can find the original SwitchBackLinear
layer and all additional kernel functions from bitsandbytes, which are used for fine-tuning.
To calculate the significance of training speedup, use ./stats/stats.py and provide paths to the logs of nn.Linear and SwitchBackLinear. You also need to specify the significance level as the alpha parameter.
Example usage:
python stats.py --file1 "Linear_logs_path" --file2 "SwichBackLinear_logs_path" --output_dir "path_for_results" --alpha 0.05
GPU Model
: NVIDIA A40Number of GPUs
: 1
-
CUDA Version
: 12.0 -
NVIDIA Driver Version
: 525.147.05 -
Operating System
: Ubuntu 22.04