-
Notifications
You must be signed in to change notification settings - Fork 57
CVS-175734- [OVEP GPU] add GQA in support list for GPU backend #830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5708bda to
c4f7cdd
Compare
| "beam_idx", | ||
| "past_key_values", | ||
| "present", | ||
| "total_seq_len", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kotomi-Du Does the stateful model post translation into OVIR comprise of total_seq_len input always? Is this a general case for all LLMs now (since which OV toolkit version this was added)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is the input name from Msft generic model (specifically Phisilica model), not the Epctx OVIR model OV toolkit generated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
removed CPU support and added Jira ticket |
5234a6c to
d73ef49
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
Description:
GQA is originally supported by OV starting from 2025.1. This PR is to align with OV support. Will go to New ABI as well.
If feature goes to new ABI?
Yes
Jira Ticket :
https://jira.devtools.intel.com/browse/CVS-175734