Skip to content

Conversation

@Kotomi-Du
Copy link

@Kotomi-Du Kotomi-Du commented Oct 10, 2025

Description

In Phi-Silica app which didn't rely on ORT-GenAI, we need an API to remove KV history. The kvcache_rewind is an OVEP function to achieve this, however there is no python API exposed. This PR is for this purpose.

If feature goes to new ABI?

Yes

Jira Ticket :

https://jira.devtools.intel.com/browse/CVS-175737

@ankitm3k
Copy link

ankitm3k commented Oct 13, 2025

Please attach a JIRA for this feature request in the PR description.

@Kotomi-Du
Copy link
Author

Please attach a JIRA for this feature request in the PR description.

done

@MayureshV1 MayureshV1 changed the title [OVEP] Expose kvcache_rewind python api CVS-175737-[OVEP] Expose kvcache_rewind python api Oct 27, 2025
@MayureshV1 MayureshV1 requested a review from Copilot October 27, 2025 23:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR exposes a Python API for the kvcache_rewind functionality from OVEP (OpenVINO Execution Provider), enabling applications like Phi-Silica to manage KV cache history without relying on ORT-GenAI. The implementation adds a generic set_ep_dynamic_options method that passes dynamic configuration options to execution providers at runtime.

Key Changes:

  • Added set_ep_dynamic_options method to enable runtime configuration of execution providers
  • Implemented Python bindings in both the C++ pybind layer and Python wrapper class
  • Provided comprehensive documentation with usage examples

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
onnxruntime/python/onnxruntime_pybind_state.cc Implements C++ pybind11 binding for set_ep_dynamic_options with dict-to-C-array conversion and error handling
onnxruntime/python/onnxruntime_inference_collection.py Adds Python wrapper method with type hints and documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@MayureshV1
Copy link

@preetha-intel - Can you please review from perspective of setting workload type using this python interface.
@ankitm3k , @RyanMetcalfeInt8 - Please review whether the interface can be used to set kvcache rewind using Python.

@MayureshV1 MayureshV1 requested a review from mklimenk October 28, 2025 00:15
Copy link

@MayureshV1 MayureshV1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

Copy link

@preetha-intel preetha-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@RyanMetcalfeInt8 RyanMetcalfeInt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@MayureshV1 MayureshV1 merged commit b1f7750 into intel:ovep-develop Oct 28, 2025
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants