- 
                Notifications
    You must be signed in to change notification settings 
- Fork 57
CVS-175737-[OVEP] Expose kvcache_rewind python api #831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
998fda7    to
    476d46e      
    Compare
  
    | Please attach a JIRA for this feature request in the PR description. | 
| 
 done | 
476d46e    to
    5f35d37      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR exposes a Python API for the kvcache_rewind functionality from OVEP (OpenVINO Execution Provider), enabling applications like Phi-Silica to manage KV cache history without relying on ORT-GenAI. The implementation adds a generic set_ep_dynamic_options method that passes dynamic configuration options to execution providers at runtime.
Key Changes:
- Added set_ep_dynamic_optionsmethod to enable runtime configuration of execution providers
- Implemented Python bindings in both the C++ pybind layer and Python wrapper class
- Provided comprehensive documentation with usage examples
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description | 
|---|---|
| onnxruntime/python/onnxruntime_pybind_state.cc | Implements C++ pybind11 binding for set_ep_dynamic_optionswith dict-to-C-array conversion and error handling | 
| onnxruntime/python/onnxruntime_inference_collection.py | Adds Python wrapper method with type hints and documentation | 
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @preetha-intel - Can you please review from perspective of setting workload type using this python interface. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
In Phi-Silica app which didn't rely on ORT-GenAI, we need an API to remove KV history. The kvcache_rewind is an OVEP function to achieve this, however there is no python API exposed. This PR is for this purpose.
If feature goes to new ABI?
Yes
Jira Ticket :
https://jira.devtools.intel.com/browse/CVS-175737