Skip to content

Feature Request: Add Adaptive Singular Value Decomposition based Orthogonal Subspace Fine-Tuning #2648

@NikhilNayak-debug

Description

@NikhilNayak-debug

Feature request

We propose adding a new parameter-efficient fine-tuning method based on adaptive singular value decomposition (SVD) for continual learning in LLMs. The core idea is to decompose weight matrices into high-rank and low-rank subspaces and constrain updates only within the low-rank subspace while freezing the high-rank directions, effectively preventing catastrophic forgetting in LLMs.


Method details

This method performs an SVD decomposition of each weight matrix into orthogonal components:
$\mathbf{W} = \mathbf{U} \Sigma \mathbf{V}^\top$

We freeze the top-$k$ singular directions in both $\mathbf{U}$ and $\mathbf{V}$ which correspond to subspaces encoding knowledge from previously learned tasks and fine-tune only the remaining low-rank directions. This allows us to repurpose unused capacity in the weight matrix without interfering with critical past representations.

Formally, for a matrix $\mathbf{W} \in \mathbb{R}^{n \times n}$:

  • The high-rank (frozen) subspace is size r: U_high ∈ ℝⁿˣʳ, V_high ∈ ℝⁿˣʳ
  • The low-rank (trainable) subspace is size n - r: U_low ∈ ℝⁿˣ⁽ⁿ⁻ʳ⁾, V_low ∈ ℝⁿˣ⁽ⁿ⁻ʳ⁾

Total compute (time complexity):

  • $2nr + 2n(n-r) = 2n^2$ multiplications (twice as dense linear layer)

Memory complexity:

  • $2n^2$ (matrix size after SVD)
  • $2n(n - r)$ (gradients of trainable parameters)
  • $4n(n - r)$ (optimizer state for trainable parameters)

Compared to full fine-tuning (which uses $4n^2$), our method is memory-efficient as long as we freeze at least $\frac{2}{3}$ of the weight matrix. This makes it a practical alternative for continual learning and multi-task adaptation without additional memory or parameter overhead.

Compared to full fine-tuning (which uses $4n^2$), our method is memory-efficient as long as we freeze at least $\frac{2}{3}$ of the weight matrix. Unlike methods like LoRA it introduces no additional parameters after training, we reconstruct the original matrix thus preserving the exact architecture and parameter count. This makes it a practical alternative for continual learning and multi-task adaptation without extra memory or parameter overhead.


Paper: Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning
Code: https://github.com/NikhilNayak-debug/mini_trainer

Why this fits well in PEFT:

  • It fine-tunes only part of the SVD-decomposed matrix (low-rank subspace), keeping high-rank components frozen.
  • It avoids task-specific parameter growth and preserves memory efficiency.
  • It provides strong continual learning performance with fixed model size.

Your contribution

We have implemented the method and opened a PR with:

  • Core logic under svd_utils.py
  • Integration into PEFT via wrap_model_with_svd
  • A usage example in examples/orthogonal_subspace_learning
  • Initial tests in tests/test_svd_utils.py

Looking forward to your feedback on incorporating this into PEFT.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions