feat(lora) Add lora bgmv_shrink & bgmv_expand kernels #31

chaojun-zhang · 2025-09-05T02:29:58Z

Purpose

This PR primarily migrates the LoRA-related ops(bgmv_shrink & bgmv_expand .etc.) from the original IPEX library and focuses on implementing minor optimizations, such as eliminating unnecessary overhead related to unrolling and replacing macros with templates.

Test Plan

pytest tests/test_lora_ops.py

Test Result

BenchMark:

list_bench example:
python3 benchmark/benchmark_lora.py list_bench --arg-pool-size 32 --batch-sizes 1 16 32 --dtype torch.float16 --hidden-sizes 2048 --lora-ranks 16 --num-loras 1 4 --op-types bgmv_shrink bgmv_expand bgmv_expand_slice --seq-lengths 1 --sort-by-lora-id 1

LoRA Benchmark Performance Comparison (float16) - Optimized Operations

Configuration Parameters	Operation Type	ipex (μs)	xpu_kernel (μs)	Performance Improvement
bs=1, sl=1, m=1, k=2048, n=16, num_loras=1	BGMV_SHRINK (f16xf16=>f32)	9.9	7.0	+29.3%
bs=1, sl=1, m=1, k=16, n=2048, num_loras=1	BGMV_EXPAND (f32xf16=>f16)	11.2	8.1	+27.7%
bs=1, sl=1, m=1, k=16, n=2048, num_loras=1	BGMV_EXPAND_SLICE (2 slices)	22.8	16.6	+27.2%
bs=1, sl=1, m=1, k=16, n=2048, num_loras=1	BGMV_EXPAND_SLICE (3 slices)	34.0	24.2	+28.8%
bs=1, sl=1, m=1, k=2048, n=16, num_loras=4	BGMV_SHRINK (f16xf16=>f32)	10.0	7.0	+30.0%
bs=1, sl=1, m=1, k=16, n=2048, num_loras=4	BGMV_EXPAND (f32xf16=>f16)	11.3	8.1	+28.3%
bs=1, sl=1, m=1, k=16, n=2048, num_loras=4	BGMV_EXPAND_SLICE (2 slices)	23.0	16.8	+27.0%
bs=1, sl=1, m=1, k=16, n=2048, num_loras=4	BGMV_EXPAND_SLICE (3 slices)	34.0	24.5	+27.9%
bs=16, sl=1, m=16, k=2048, n=16, num_loras=1	BGMV_SHRINK (f16xf16=>f32)	10.0	7.0	+30.0%
bs=16, sl=1, m=16, k=16, n=2048, num_loras=1	BGMV_EXPAND (f32xf16=>f16)	11.4	8.2	+28.1%
bs=16, sl=1, m=16, k=16, n=2048, num_loras=1	BGMV_EXPAND_SLICE (2 slices)	22.9	16.6	+27.5%
bs=16, sl=1, m=16, k=16, n=2048, num_loras=1	BGMV_EXPAND_SLICE (3 slices)	34.0	24.5	+27.9%
bs=32, sl=1, m=32, k=2048, n=16, num_loras=1	BGMV_SHRINK (f16xf16=>f32)	9.9	7.0	+29.3%
bs=32, sl=1, m=32, k=16, n=2048, num_loras=1	BGMV_EXPAND (f32xf16=>f16)	11.4	8.9	+21.9%
bs=32, sl=1, m=32, k=16, n=2048, num_loras=1	BGMV_EXPAND_SLICE (2 slices)	22.9	18.4	+19.7%
bs=32, sl=1, m=32, k=16, n=2048, num_loras=1	BGMV_EXPAND_SLICE (3 slices)	34.0	27.6	+18.8%

rogerxfeng8 · 2025-09-05T07:52:44Z

@chaojun-zhang can you pls update the pr description?

csrc/utils/dpcpp.h

jikunshang · 2025-09-09T01:36:39Z

can you rebase?

Signed-off-by: chzhang <[email protected]>

chaojun-zhang · 2025-09-12T07:21:52Z

can you rebase?

updated

jikunshang · 2025-09-15T00:24:09Z

csrc/utils.h

+template <typename T>
+struct AccumulateType {
+ private:
+  static constexpr bool is_half = std::is_same_v<T, at::Half> ||


why named is_half here? I understand it's copied from ipex.

rename to is_lowp?

maybe is_narrow_float or is_low_bit_float

is_narrow_float

jikunshang

Overall LGTM.

csrc/dispatch_utils.h

tests/test_lora_ops.py

Signed-off-by: chzhang <[email protected]>

chaojun-zhang force-pushed the lora_ops branch 2 times, most recently from 8c797ad to c8f3166 Compare September 5, 2025 02:45

chaojun-zhang marked this pull request as draft September 5, 2025 02:51

chaojun-zhang force-pushed the lora_ops branch from c8f3166 to 363896a Compare September 5, 2025 06:09

chaojun-zhang marked this pull request as ready for review September 5, 2025 06:34

chaojun-zhang changed the title ~~feat(lora) Add lora bgmv & expand kernels~~ feat(lora) Add lora bgmv_shrink & bgmv_expand kernels Sep 5, 2025

rogerxfeng8 approved these changes Sep 5, 2025

View reviewed changes

chaojun-zhang force-pushed the lora_ops branch 2 times, most recently from 7773354 to 311827b Compare September 5, 2025 14:37

jikunshang reviewed Sep 8, 2025

View reviewed changes

csrc/utils/dpcpp.h Show resolved Hide resolved

chaojun-zhang closed this Sep 8, 2025

chaojun-zhang reopened this Sep 8, 2025

chaojun-zhang closed this Sep 9, 2025

chaojun-zhang reopened this Sep 9, 2025

chaojun-zhang force-pushed the lora_ops branch 2 times, most recently from 58db25f to 430a7ea Compare September 11, 2025 08:34

feat(lora): Add lora bgmv_shrink && bgmv_expand ops

89d629b

Signed-off-by: chzhang <[email protected]>

chaojun-zhang force-pushed the lora_ops branch from c42b87e to 89d629b Compare September 12, 2025 04:42

chaojun-zhang requested a review from jikunshang September 12, 2025 13:33

jikunshang reviewed Sep 15, 2025

View reviewed changes

csrc/dispatch_utils.h Outdated Show resolved Hide resolved

tests/test_lora_ops.py Outdated Show resolved Hide resolved

feat(lora): Add lora bgmv_shrink && bgmv_expand ops

30d29b3

Signed-off-by: chzhang <[email protected]>

chaojun-zhang requested a review from jikunshang September 15, 2025 02:00

chaojun-zhang added 2 commits September 14, 2025 19:33

feat(lora): Add lora bgmv_shrink && bgmv_expand ops

1ec80ea

Signed-off-by: chzhang <[email protected]>

feat(lora): Add lora bgmv_shrink && bgmv_expand ops

ea02715

Signed-off-by: chzhang <[email protected]>

jikunshang approved these changes Sep 16, 2025

View reviewed changes

jikunshang merged commit d66181a into vllm-project:main Sep 16, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(lora) Add lora bgmv_shrink & bgmv_expand kernels #31

feat(lora) Add lora bgmv_shrink & bgmv_expand kernels #31

Uh oh!

chaojun-zhang commented Sep 5, 2025 •

edited by rogerxfeng8

Loading

Uh oh!

rogerxfeng8 commented Sep 5, 2025

Uh oh!

Uh oh!

jikunshang commented Sep 9, 2025

Uh oh!

chaojun-zhang commented Sep 12, 2025

Uh oh!

jikunshang Sep 15, 2025

Uh oh!

chaojun-zhang Sep 15, 2025 •

edited

Loading

Uh oh!

jikunshang Sep 15, 2025

Uh oh!

chaojun-zhang Sep 15, 2025

Uh oh!

jikunshang left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat(lora) Add lora bgmv_shrink & bgmv_expand kernels #31

feat(lora) Add lora bgmv_shrink & bgmv_expand kernels #31

Uh oh!

Conversation

chaojun-zhang commented Sep 5, 2025 • edited by rogerxfeng8 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

BenchMark:

LoRA Benchmark Performance Comparison (float16) - Optimized Operations

Uh oh!

rogerxfeng8 commented Sep 5, 2025

Uh oh!

Uh oh!

jikunshang commented Sep 9, 2025

Uh oh!

chaojun-zhang commented Sep 12, 2025

Uh oh!

jikunshang Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

chaojun-zhang Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jikunshang Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

chaojun-zhang Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

jikunshang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chaojun-zhang commented Sep 5, 2025 •

edited by rogerxfeng8

Loading

chaojun-zhang Sep 15, 2025 •

edited

Loading