Feature Request: Add Adaptive Singular Value Decomposition based Orthogonal Subspace Fine-Tuning

### Feature request

We propose adding a new parameter-efficient fine-tuning method based on **adaptive singular value decomposition (SVD)** for continual learning in LLMs. The core idea is to decompose weight matrices into high-rank and low-rank subspaces and constrain updates only within the low-rank subspace while freezing the high-rank directions, effectively preventing catastrophic forgetting in LLMs.

---

#### **Method details**
This method performs an SVD decomposition of each weight matrix into orthogonal components:
$\mathbf{W} = \mathbf{U} \Sigma \mathbf{V}^\top$

We freeze the top-\$k\$ singular directions in both \$\mathbf{U}\$ and \$\mathbf{V}\$ which correspond to subspaces encoding knowledge from previously learned tasks and fine-tune only the remaining low-rank directions. This allows us to repurpose unused capacity in the weight matrix without interfering with critical past representations.

Formally, for a matrix \$\mathbf{W} \in \mathbb{R}^{n \times n}\$:

* The high-rank (frozen) subspace is size `r`: `U_high ∈ ℝⁿˣʳ`, `V_high ∈ ℝⁿˣʳ`
* The low-rank (trainable) subspace is size `n - r`: `U_low ∈ ℝⁿˣ⁽ⁿ⁻ʳ⁾`, `V_low ∈ ℝⁿˣ⁽ⁿ⁻ʳ⁾`

Total compute (time complexity):

* $2nr + 2n(n-r) = 2n^2$ multiplications (twice as dense linear layer)

Memory complexity:

* $2n^2$ (matrix size after SVD)
* $2n(n - r)$ (gradients of trainable parameters)
* $4n(n - r)$ (optimizer state for trainable parameters)

Compared to full fine-tuning (which uses $4n^2$), our method is memory-efficient as long as we freeze at least $\frac{2}{3}$ of the weight matrix. This makes it a practical alternative for continual learning and multi-task adaptation without additional memory or parameter overhead.

Compared to full fine-tuning (which uses $4n^2$), our method is memory-efficient as long as we freeze at least $\frac{2}{3}$ of the weight matrix. Unlike methods like LoRA it introduces no additional parameters after training, we reconstruct the original matrix thus preserving the exact architecture and parameter count. This makes it a practical alternative for continual learning and multi-task adaptation without extra memory or parameter overhead.

---

Paper: [Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning](https://arxiv.org/abs/2504.07097)
Code: [https://github.com/NikhilNayak-debug/mini_trainer](https://github.com/NikhilNayak-debug/mini_trainer)

Why this fits well in PEFT:

* It fine-tunes only part of the SVD-decomposed matrix (low-rank subspace), keeping high-rank components frozen.
* It avoids task-specific parameter growth and preserves memory efficiency.
* It provides strong continual learning performance with fixed model size.

### Your contribution

We have implemented the method and opened a [PR](https://github.com/huggingface/peft/pull/2685) with:

* Core logic under `svd_utils.py`
* Integration into PEFT via `wrap_model_with_svd`
* A usage example in `examples/orthogonal_subspace_learning`
* Initial tests in `tests/test_svd_utils.py`

Looking forward to your feedback on incorporating this into PEFT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Add Adaptive Singular Value Decomposition based Orthogonal Subspace Fine-Tuning #2648

Feature request

Method details

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Add Adaptive Singular Value Decomposition based Orthogonal Subspace Fine-Tuning #2648

Description

Feature request

Method details

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions