Skip to content

Conversation

chaojun-zhang
Copy link
Contributor

@chaojun-zhang chaojun-zhang commented Sep 5, 2025

Purpose

This PR primarily migrates the LoRA-related ops(bgmv_shrink & bgmv_expand .etc.) from the original IPEX library and focuses on implementing minor optimizations, such as eliminating unnecessary overhead related to unrolling and replacing macros with templates.

Test Plan

pytest tests/test_lora_ops.py

Test Result

BenchMark:

list_bench example:
python3 benchmark/benchmark_lora.py list_bench --arg-pool-size 32 --batch-sizes 1 16 32 --dtype torch.float16 --hidden-sizes 2048 --lora-ranks 16 --num-loras 1 4 --op-types bgmv_shrink bgmv_expand bgmv_expand_slice --seq-lengths 1 --sort-by-lora-id 1

LoRA Benchmark Performance Comparison (float16) - Optimized Operations

Configuration Parameters Operation Type ipex (μs) xpu_kernel (μs) Performance Improvement
bs=1, sl=1, m=1, k=2048, n=16, num_loras=1 BGMV_SHRINK (f16xf16=>f32) 9.9 7.0 +29.3%
bs=1, sl=1, m=1, k=16, n=2048, num_loras=1 BGMV_EXPAND (f32xf16=>f16) 11.2 8.1 +27.7%
bs=1, sl=1, m=1, k=16, n=2048, num_loras=1 BGMV_EXPAND_SLICE (2 slices) 22.8 16.6 +27.2%
bs=1, sl=1, m=1, k=16, n=2048, num_loras=1 BGMV_EXPAND_SLICE (3 slices) 34.0 24.2 +28.8%
bs=1, sl=1, m=1, k=2048, n=16, num_loras=4 BGMV_SHRINK (f16xf16=>f32) 10.0 7.0 +30.0%
bs=1, sl=1, m=1, k=16, n=2048, num_loras=4 BGMV_EXPAND (f32xf16=>f16) 11.3 8.1 +28.3%
bs=1, sl=1, m=1, k=16, n=2048, num_loras=4 BGMV_EXPAND_SLICE (2 slices) 23.0 16.8 +27.0%
bs=1, sl=1, m=1, k=16, n=2048, num_loras=4 BGMV_EXPAND_SLICE (3 slices) 34.0 24.5 +27.9%
bs=16, sl=1, m=16, k=2048, n=16, num_loras=1 BGMV_SHRINK (f16xf16=>f32) 10.0 7.0 +30.0%
bs=16, sl=1, m=16, k=16, n=2048, num_loras=1 BGMV_EXPAND (f32xf16=>f16) 11.4 8.2 +28.1%
bs=16, sl=1, m=16, k=16, n=2048, num_loras=1 BGMV_EXPAND_SLICE (2 slices) 22.9 16.6 +27.5%
bs=16, sl=1, m=16, k=16, n=2048, num_loras=1 BGMV_EXPAND_SLICE (3 slices) 34.0 24.5 +27.9%
bs=32, sl=1, m=32, k=2048, n=16, num_loras=1 BGMV_SHRINK (f16xf16=>f32) 9.9 7.0 +29.3%
bs=32, sl=1, m=32, k=16, n=2048, num_loras=1 BGMV_EXPAND (f32xf16=>f16) 11.4 8.9 +21.9%
bs=32, sl=1, m=32, k=16, n=2048, num_loras=1 BGMV_EXPAND_SLICE (2 slices) 22.9 18.4 +19.7%
bs=32, sl=1, m=32, k=16, n=2048, num_loras=1 BGMV_EXPAND_SLICE (3 slices) 34.0 27.6 +18.8%

@chaojun-zhang chaojun-zhang force-pushed the lora_ops branch 2 times, most recently from 8c797ad to c8f3166 Compare September 5, 2025 02:45
@chaojun-zhang chaojun-zhang marked this pull request as draft September 5, 2025 02:51
@chaojun-zhang chaojun-zhang marked this pull request as ready for review September 5, 2025 06:34
@chaojun-zhang chaojun-zhang changed the title feat(lora) Add lora bgmv & expand kernels feat(lora) Add lora bgmv_shrink & bgmv_expand kernels Sep 5, 2025
@rogerxfeng8
Copy link
Collaborator

@chaojun-zhang can you pls update the pr description?

@chaojun-zhang chaojun-zhang force-pushed the lora_ops branch 2 times, most recently from 7773354 to 311827b Compare September 5, 2025 14:37
@jikunshang
Copy link
Collaborator

can you rebase?

@chaojun-zhang chaojun-zhang force-pushed the lora_ops branch 2 times, most recently from 58db25f to 430a7ea Compare September 11, 2025 08:34
@chaojun-zhang
Copy link
Contributor Author

can you rebase?

updated

csrc/utils.h Outdated
template <typename T>
struct AccumulateType {
private:
static constexpr bool is_half = std::is_same_v<T, at::Half> ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why named is_half here? I understand it's copied from ipex.

Copy link
Contributor Author

@chaojun-zhang chaojun-zhang Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to is_lowp?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe is_narrow_float or is_low_bit_float

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_narrow_float

Copy link
Collaborator

@jikunshang jikunshang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

@jikunshang jikunshang merged commit d66181a into vllm-project:main Sep 16, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants