Prefix Finetuning

Prefix Finetuning is a Parameter-Efficient Fine-Tuning (PEFT) method for adapting large, pre-trained Transformer-based models to specific downstream tasks, introduced in 2021---Prefix-Tuning. Instead of updating all the weights of the model (which is computationally expensive), Prefix Finetuning freezes the original model's parameters and only trains a small, continuous, task-specific vector, known as a "prefix."

This prefix is prepended to the input of each layer of the transformer model. The model then attends to this prefix as if it were a set of "virtual tokens," allowing it to learn task-specific behaviour without altering its core knowledge.

How it Works

Freeze Transformer Parameters: The vast majority of the pre-trained model's weights are not changed.
Introduce a Prefix: A small, trainable matrix of parameters (the prefix) is added.
Prepend to Transformer Layers: For each layer in the transformer, this prefix is prepended to the sequence of key and value vectors for the multi-head attention mechanism.
Train Only the Prefix: During finetuning, only the parameters of the prefix are updated. The model learns to condition its output on this prefix to solve the target task.

images/Pasted image 20251007094355.png

Key Advantages

Parameter Efficiency: It requires training only a tiny fraction of the parameters (e.g., ~0.1% of the total), drastically reducing memory and storage costs.
No Model Retraining: Since the base model is frozen, you don't need to store a full copy of the model for each new task. You only need to store the small prefix.
Modularity: You can train multiple prefixes for different tasks and easily swap them out as needed without affecting the base model.
Comparable Performance: It has been shown to achieve performance comparable to full finetuning on many tasks, especially with limited data.