Skip to content

Conversation

@Kotomi-Du
Copy link

@Kotomi-Du Kotomi-Du commented Oct 10, 2025

Description:

GQA is originally supported by OV starting from 2025.1. This PR is to align with OV support. Will go to New ABI as well.

If feature goes to new ABI?

Yes

Jira Ticket :

https://jira.devtools.intel.com/browse/CVS-175734

"beam_idx",
"past_key_values",
"present",
"total_seq_len",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kotomi-Du Does the stateful model post translation into OVIR comprise of total_seq_len input always? Is this a general case for all LLMs now (since which OV toolkit version this was added)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is the input name from Msft generic model (specifically Phisilica model), not the Epctx OVIR model OV toolkit generated

Copy link

@ankitm3k ankitm3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Kotomi-Du
Copy link
Author

removed CPU support and added Jira ticket

Copy link

@MayureshV1 MayureshV1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@MayureshV1 MayureshV1 changed the title [OVEP GPU] add GQA in support list for GPU backend CVS-175734- [OVEP GPU] add GQA in support list for GPU backend Oct 27, 2025
@MayureshV1 MayureshV1 merged commit eff6cac into intel:ovep-develop Oct 27, 2025
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants