-
Notifications
You must be signed in to change notification settings - Fork 13
feat(lora) Add lora bgmv_shrink & bgmv_expand kernels #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8c797ad
to
c8f3166
Compare
c8f3166
to
363896a
Compare
@chaojun-zhang can you pls update the pr description? |
7773354
to
311827b
Compare
can you rebase? |
58db25f
to
430a7ea
Compare
Signed-off-by: chzhang <[email protected]>
c42b87e
to
89d629b
Compare
updated |
csrc/utils.h
Outdated
template <typename T> | ||
struct AccumulateType { | ||
private: | ||
static constexpr bool is_half = std::is_same_v<T, at::Half> || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why named is_half
here? I understand it's copied from ipex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to is_lowp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe is_narrow_float
or is_low_bit_float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_narrow_float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
Signed-off-by: chzhang <[email protected]>
Signed-off-by: chzhang <[email protected]>
Signed-off-by: chzhang <[email protected]>
Purpose
This PR primarily migrates the LoRA-related ops(bgmv_shrink & bgmv_expand .etc.) from the original IPEX library and focuses on implementing minor optimizations, such as eliminating unnecessary overhead related to unrolling and replacing macros with templates.
Test Plan
pytest tests/test_lora_ops.py
Test Result
BenchMark:
list_bench example:
python3 benchmark/benchmark_lora.py list_bench --arg-pool-size 32 --batch-sizes 1 16 32 --dtype torch.float16 --hidden-sizes 2048 --lora-ranks 16 --num-loras 1 4 --op-types bgmv_shrink bgmv_expand bgmv_expand_slice --seq-lengths 1 --sort-by-lora-id 1
LoRA Benchmark Performance Comparison (float16) - Optimized Operations