-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Feature request
We propose adding a new parameter-efficient fine-tuning method based on adaptive singular value decomposition (SVD) for continual learning in LLMs. The core idea is to decompose weight matrices into high-rank and low-rank subspaces and constrain updates only within the low-rank subspace while freezing the high-rank directions, effectively preventing catastrophic forgetting in LLMs.
Method details
This method performs an SVD decomposition of each weight matrix into orthogonal components:
We freeze the top-$k$ singular directions in both
Formally, for a matrix
- The high-rank (frozen) subspace is size
r
:U_high ∈ ℝⁿˣʳ
,V_high ∈ ℝⁿˣʳ
- The low-rank (trainable) subspace is size
n - r
:U_low ∈ ℝⁿˣ⁽ⁿ⁻ʳ⁾
,V_low ∈ ℝⁿˣ⁽ⁿ⁻ʳ⁾
Total compute (time complexity):
-
$2nr + 2n(n-r) = 2n^2$ multiplications (twice as dense linear layer)
Memory complexity:
-
$2n^2$ (matrix size after SVD) -
$2n(n - r)$ (gradients of trainable parameters) -
$4n(n - r)$ (optimizer state for trainable parameters)
Compared to full fine-tuning (which uses
Compared to full fine-tuning (which uses
Paper: Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning
Code: https://github.com/NikhilNayak-debug/mini_trainer
Why this fits well in PEFT:
- It fine-tunes only part of the SVD-decomposed matrix (low-rank subspace), keeping high-rank components frozen.
- It avoids task-specific parameter growth and preserves memory efficiency.
- It provides strong continual learning performance with fixed model size.
Your contribution
We have implemented the method and opened a PR with:
- Core logic under
svd_utils.py
- Integration into PEFT via
wrap_model_with_svd
- A usage example in
examples/orthogonal_subspace_learning
- Initial tests in
tests/test_svd_utils.py
Looking forward to your feedback on incorporating this into PEFT.