Add SimpLayerNorm, GQA to supported_ops #625

wine99 · 2025-03-21T06:29:42Z

Description

Added GroupQueryAttention and SimplifiedLayerNormalization to supported ops.

Motivation and Context

With these two PR openvinotoolkit/openvino#28963 openvinotoolkit/openvino#28163 merged into OpenVINO, ONNX models generated by onnxruntime-genai containing these two ops will work.

Example models: Phi3 and Llama3 generated by the following command:

python -m onnxruntime_genai.models.builder -m meta-llama/Llama-3.1-8B-Instruct -o E:\download\onnx\llama3.1-8B-instruct-onnx -p int4 -e cpu -i E:\download\huggingface\llama3.1-8B-instruct

wine99 · 2025-03-25T01:59:24Z

@ankitm3k Please review

ankitm3k · 2025-03-25T07:45:38Z

@wine99 Can you please confirm the changes are functional with both benchmark_app & onnxruntime_perf_test app i.e. the model is fully supported without causing any subgraph partitions.

May I know these Ops are enable individually for which OV devices viz. CPU, GPU , NPU or NPUW? Kindly also state the OV toolkit version since which this support was introduced of course you have mentioned 2025.1 here.

ankitm3k

.

sfatimar · 2025-03-26T06:20:40Z

@MayureshV1 can you please take a look at this. With this merge, Phi Silica model will fail.

wine99 · 2025-03-26T12:13:52Z

@ankitm3k

Can you please confirm the changes are functional with both benchmark_app & onnxruntime_perf_test app i.e. the model is fully supported without causing any subgraph partitions.

The models mentioned in the comments (phi3 and llama3) are fully supported. The whole graph is run on OV EP (CPU and iGPU).

May I know these Ops are enable individually for which OV devices viz. CPU, GPU , NPU or NPUW? Kindly also state the OV toolkit version since which this support was introduced of course you have mentioned 2025.1 here.

The ops are enabled for CPU and iGPU. Support for NPU/NPUW is WIP. The current OV version of the master branch is 2025.1.0. We built the master branch and the models are fully supported with the changes in this PR.

sfatimar · 2025-03-26T12:17:49Z

@wine99 @ankitm3k we can only support GQA for GPU, For CPU, NPU we must fallback to MLAS to support Phi Silica Implementation

sfatimar

@wine99 @ankitm3k we can only support GQA for GPU, For CPU, NPU we must fallback to MLAS to support Phi Silica Implementation

sgbihu · 2025-03-27T02:25:06Z

@wine99 @ankitm3k we can only support GQA for GPU, For CPU, NPU we must fallback to MLAS to support Phi Silica Implementation

I think we have NPU GQA implementation, you can try Phi Silica when we merge it.

MayureshV1 · 2025-03-28T00:29:15Z

@MayureshV1 can you please take a look at this. With this merge, Phi Silica model will fail.

@sfatimar , I think you are right. Once we merge this PR ORT will not split subgraphs between MLAS and OV NPU. Entire model would try to execute on NPU. NPU compilation will fail due to lack of GQA support and entire model would run on OV CPU.

@ankitm3k , @preetha-intel Do you have a suggestion ton how we can test graph split between OV NPU and OV CPU (GQA) and later run entire graph on OV NPU when GQA is supported.
@sfatimar , We would likely need to have this merged to a different branch for intermediate testing.

Add SimpLayerNorm, GQA to supported_ops

b759f99

ankitm3k reviewed Mar 25, 2025

View reviewed changes

sfatimar requested a review from MayureshV1 March 26, 2025 06:20

sfatimar requested changes Mar 26, 2025

View reviewed changes

wine99 marked this pull request as draft September 12, 2025 02:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add SimpLayerNorm, GQA to supported_ops #625

Add SimpLayerNorm, GQA to supported_ops #625

wine99 commented Mar 21, 2025

Uh oh!

wine99 commented Mar 25, 2025

Uh oh!

ankitm3k commented Mar 25, 2025

Uh oh!

ankitm3k left a comment •

edited

Loading

Uh oh!

sfatimar commented Mar 26, 2025

Uh oh!

wine99 commented Mar 26, 2025 •

edited

Loading

Uh oh!

sfatimar commented Mar 26, 2025

Uh oh!

sfatimar left a comment

Uh oh!

sgbihu commented Mar 27, 2025

Uh oh!

MayureshV1 commented Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Add SimpLayerNorm, GQA to supported_ops #625

Are you sure you want to change the base?

Add SimpLayerNorm, GQA to supported_ops #625

Conversation

wine99 commented Mar 21, 2025

Description

Motivation and Context

Uh oh!

wine99 commented Mar 25, 2025

Uh oh!

ankitm3k commented Mar 25, 2025

Uh oh!

ankitm3k left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfatimar commented Mar 26, 2025

Uh oh!

wine99 commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfatimar commented Mar 26, 2025

Uh oh!

sfatimar left a comment

Choose a reason for hiding this comment

Uh oh!

sgbihu commented Mar 27, 2025

Uh oh!

MayureshV1 commented Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ankitm3k left a comment •

edited

Loading

wine99 commented Mar 26, 2025 •

edited

Loading