ENH Cache DoRA weight norm for inference #2661

BenjaminBossan · 2025-07-21T15:02:14Z

Resolves #2651

This is an experimental branch that caches the weight norm from DoRA for faster inference. Since, during inference, the weights don't change, there is no need to recalculate the weight norm of a DoRA module each time.

During training, recalculation is needed, thus there is no caching when the module has training=True.

The cache does not prevent each and every possible duplicate calculation. For instance, the weight norm is calculated during module initialization and then again during the first forward pass when performing inference. Only starting from the second forward pass on will the weight norms be cached.

The reason why this is just a draft PR is that before finishing it, I want to ensure that the addition of cache is worth it. Caches can often be a tricky business and lead to subtle bugs. Just as an example here, we detect if users put the model into training mode when calling model.train() and clear the cache. However, we would not detect it if the user directly sets module.training=True.

Some preliminary testing with a small model, meta-llama/Llama-3.2-1B, and some dummy data didn't show huge improvements, with an average of 10 runs showing:

with caching: 1.3555 sec
w/o caching: 1.4071 sec

Thus, before continuing, I'd like to see some more real world measurements. If the change is considered to be worth it, tests should be added to the PR before it's ready.

Resolves huggingface#2651 This is an experimental branch that caches the weight norm from DoRA for faster inference. Since, during inference, the weights don't change, there is no need to recalculate the weight norm of a DoRA module each time. During training, recalculation is needed, thus there is no caching when the module has training=True. The cache does not prevent each and every possible duplicate calculation. For instance, the weight norm is calculated during module initialization and then again during the first forward pass when performing inference. Only starting from the second forward pass on will the weight norms be cached. The reason why this is just a draft PR is that before finishing it, I want to ensure that the addition of cache is worth it. Caches can often be a tricky business and lead to subtle bugs. Just as an example here, we detect if users put the model into training mode when calling model.train() and clear the cache. However, we would not detect it if the user directly sets module.training=True. Some preliminary testing with a small model, meta-llama/Llama-3.2-1B, and some dummy data didn't show huge improvements, with an average of 10 runs showing: - with caching: 1.3555 sec - w/o caching: 1.4071 sec Thus, before continuing, I'd like to see some more real world measurements. If the change is considered to be worth it, tests should be added to the PR before it's ready.

HuggingFaceDocBuilderDev · 2025-07-21T15:05:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2025-08-18T18:11:13Z

@phemw The PR is ready from my side, if you want to give this a try, LMK what you find. Note that the memory overhead of caching is quite significant (41% in one test), so it's turned off by default and users need to opt-in.

from peft.helpers import DoraCaching

model.eval()
with DoraCaching():
    output = model(inputs)

github-actions · 2025-09-13T15:03:41Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan mentioned this pull request Jul 21, 2025

DoRA slow forward inference #2651

Open

BenjaminBossan added 4 commits July 25, 2025 12:55

Also cache lora_weight computation

2feb7f5

Merge branch 'main' into enh-dora-cache-weight-norm-for-inference

ec4739d

Implement DoRA cache properly

4b71f51

Fix missing dtype cast

d1d32ce

BenjaminBossan marked this pull request as ready for review August 18, 2025 18:06

BenjaminBossan changed the title ~~[WIP] ENH Cache DoRA weight norm for inference~~ ENH Cache DoRA weight norm for inference Aug 18, 2025

Fix error in docs

129ae93

BenjaminBossan requested a review from githubnemo August 20, 2025 11:38

BenjaminBossan added the wip label Sep 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH Cache DoRA weight norm for inference #2661

ENH Cache DoRA weight norm for inference #2661

Uh oh!

BenjaminBossan commented Jul 21, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 21, 2025

Uh oh!

BenjaminBossan commented Aug 18, 2025

Uh oh!

github-actions bot commented Sep 13, 2025

Uh oh!

Uh oh!

ENH Cache DoRA weight norm for inference #2661

Are you sure you want to change the base?

ENH Cache DoRA weight norm for inference #2661

Uh oh!

Conversation

BenjaminBossan commented Jul 21, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 21, 2025

Uh oh!

BenjaminBossan commented Aug 18, 2025

Uh oh!

github-actions bot commented Sep 13, 2025

Uh oh!

Uh oh!