Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenVINO EP Weights Sharing Feature #23553

Merged
merged 9 commits into from
Feb 6, 2025

Conversation

ankitm3k
Copy link
Contributor

Description

These changes are done to ensure that weight sharing happens between two model using session context option ep_weight_sharing. Key changes introduced in this feature are:

Creating a shared context between two models
Extracting external constant initializers and re labelling them back as inputs to the model to allow weight loading in the direct blob.
Creating EP Context Nodes when Subgraph partitioning is happening.

Motivation and Context

This change was required to ensure that LLM with prefill and kvcache models can use the same share
The change was also required to ensure EP Context nodes can be formed even when model is being subgraph partitioned.

@ankitm3k
Copy link
Contributor Author

ankitm3k commented Feb 1, 2025

@jywu-msft @adrianlizarraga @HectorSVC Kindly Review & Merge

@ankitm3k ankitm3k changed the title Ovep weight sharing msft OpenVINO EP Feature Updates Feb 1, 2025
@jywu-msft
Copy link
Member

/azp run Linux OpenVINO CI Pipeline

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yihonglyu
Copy link
Contributor

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 8 pipeline(s).

Copy link
Contributor

@yihonglyu yihonglyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the PR title to better describe the changes you've made?

@ankitm3k ankitm3k changed the title OpenVINO EP Feature Updates OpenVINO EP Weights Sharing Feature Feb 4, 2025
@ankitm3k
Copy link
Contributor Author

ankitm3k commented Feb 4, 2025

Could you update the PR title to better describe the changes you've made?

Changed the title as requested

@HectorSVC
Copy link
Contributor

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline

Copy link

Pull request contains merge conflicts.

@yihonglyu
Copy link
Contributor

Pull request contains merge conflicts.

Could you resolve the merge conflict?

@ankitm3k ankitm3k force-pushed the ovep-weight-sharing-msft branch from d8857de to 6371811 Compare February 5, 2025 07:54
@ankitm3k
Copy link
Contributor Author

ankitm3k commented Feb 5, 2025

Pull request contains merge conflicts.

Could you resolve the merge conflict?

Fixed the conflicts kindly review & merge

@jywu-msft
Copy link
Member

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline

@jywu-msft
Copy link
Member

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

@jywu-msft
Copy link
Member

/azp run Linux OpenVINO CI Pipeline

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Azure Pipelines successfully started running 8 pipeline(s).

@yihonglyu
Copy link
Contributor

It seems there is no unit test for this feature for OpenVINO EP. Could you please add a unit test for it?

@HectorSVC
Copy link
Contributor

It seems there is no unit test for this feature for OpenVINO EP. Could you please add a unit test for it?

Yes. It would be great to have something demonstrate how this feature get used from model generation to model inference.
Here's an example from QNN EP:
https://github.com/microsoft/onnxruntime/blob/e1e3f623f61816008e79dddc91a51ffe7f0ff5cf/onnxruntime/test/providers/qnn/qnn_ep_context_test.cc#L1048C47-L1048C58

@javier-intel
Copy link
Contributor

It seems there is no unit test for this feature for OpenVINO EP. Could you please add a unit test for it?

Yes. It would be great to have something demonstrate how this feature get used from model generation to model inference. Here's an example from QNN EP: https://github.com/microsoft/onnxruntime/blob/e1e3f623f61816008e79dddc91a51ffe7f0ff5cf/onnxruntime/test/providers/qnn/qnn_ep_context_test.cc#L1048C47-L1048C58

Agreed, the OVEP unit tests are, well, absent. We started working on adding OVEP unit tests but those will come later and not be part of this PR.

@ankitm3k ankitm3k force-pushed the ovep-weight-sharing-msft branch from bf4dc5b to 568a64d Compare February 6, 2025 13:12
ankitm3k and others added 7 commits February 6, 2025 18:45
* Rename EP instance context as session_context

* Add support for GetEpContextNodes

* enable config option for ovep weight sharing

* add config option for ovep weight sharing

* Refactor the conditional blocks in OVEP for compilation

* Convert initializers with external data to graph inputs

* create, store and export metadata for ovep weight sharing

* fix error handling in weight sharing

* fix crash issue while setting up inputs for wai model

* pass weight sharing option to OVEP qdq stripping pass

* Aligning OVEP variable names to match the session option value they hold

* Add plumbing for context sharing plus refactoring around option handling

* Store metadata in shared context

* fix: fix provider options

* create ov tensor from meta data and external data

* create ov tensor

* Add support for binding weight as input tensors

* Fix for mapping subgraph to ov compiled network arguments

* Fix for using so_share_ep_contexts without ep.context* flags

* Add remote tensor support for NPU weight sharing

* Use a single ov::Core copy across OVEP

* Decouple provider option cache_dir from session option ep.context_file_path

* Add support for serialization and deserialization of metadata to disk

* Load blobs from relative path stored in ep_cache_context

* Use remote L0 tensors for shared weights

* fix linux ci issues

* fix ci issues

* Fix Windows build failure

* Use ifstream to load weights instead of mmaped file

* Fix for epctx models made up entirely of OVEP epctx nodes

* Limit ov::Core lifetime to that of provider object

* Enforce shared tensors cleanup on shutdown

* Add support for default device type based on project configuration

* fix: Fixed concrete_backend_ pointer double free issue on Linux

* Preetha/weight sharing fix (#545)

* Move variables from subgraph to session context for model specific properties

* Fix for redundant subgraph creation

* Remove unused variable

---------

Co-authored-by: Javier E. Martinez <[email protected]>
Co-authored-by: saurabhkale117 <[email protected]>
Co-authored-by: Preetha Veeramalai <[email protected]>
Co-authored-by: ankitm3k <[email protected]>
Co-authored-by: Eric Crawford <[email protected]>
* Fix blob generation with AUTO:GPU,CPU

* Remove unused variable
* Use ep.context_file_path to get base path when creating session from memory

* Fixed lint issues

---------

Co-authored-by: Javier E. Martinez <[email protected]>
@jywu-msft
Copy link
Member

/azp run Linux OpenVINO CI Pipeline

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@HectorSVC
Copy link
Contributor

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

@HectorSVC
Copy link
Contributor

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 8 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@HectorSVC HectorSVC dismissed yihonglyu’s stale review February 6, 2025 22:55

It was updated according to the comments.

@HectorSVC HectorSVC merged commit a6ea57b into microsoft:main Feb 6, 2025
76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants