Skip to content

Conversation

yuslepukhin
Copy link
Member

@yuslepukhin yuslepukhin commented Sep 9, 2025

Description

This pull request introduces several enhancements to ONNX Runtime's Python and C++ APIs, focusing on improved device and memory information handling, synchronization stream support, and tensor copy functionality. It adds new Python bindings for device/memory types, exposes more detailed session input/output metadata, and provides a Python-accessible tensor copy API. The changes also refactor and extend the C++ API for better stream and memory info management.

Key changes include:

Device and Memory Information Enhancements

  • Added Python bindings for OrtMemoryInfoDeviceType, OrtDeviceMemoryType, and expanded OrtDevice to expose the memory type via a new mem_type method. The OrtMemoryInfo Python class now supports both legacy and new V2 constructors and exposes additional properties such as device memory type and vendor ID. [1] [2] [3]
  • Extended the Python InferenceSession object to provide access to input/output OrtMemoryInfo and OrtEpDevice objects through new properties and methods. [1] [2] [3] [4]

Synchronization Stream and Execution Provider Device Support

  • Introduced Python bindings for OrtSyncStream, including creation via OrtEpDevice.create_sync_stream() and retrieval of device-specific OrtMemoryInfo via OrtEpDevice.memory_info(). [1] [2]
  • Refactored the C++ API to generalize SyncStream handling, allowing for unowned streams and improved type safety. [1] [2]

Tensor Copy Functionality

  • Added a new Python-level copy_tensors function and corresponding C++ binding, enabling efficient copying of tensor data between OrtValue objects, optionally using a synchronization stream. [1] [2] [3]

Miscellaneous Improvements and Fixes

  • Changed the return type of the OrtValue.data_ptr method in the Python binding from int64_t to uintptr_t for better cross-platform compatibility. [1] [2]
  • Minor improvements to error messages and device type handling in the Python API (e.g., for OrtDevice). [1] [2]
  • Included necessary C++ includes for plugin stream support.

These changes collectively improve the flexibility and introspection capabilities of ONNX Runtime's device, memory, and execution provider interfaces, and make advanced features available to Python users.

Motivation and Context

Depends on: #26021

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces significant enhancements to ONNX Runtime's Python and C++ APIs, focusing on device and memory management, synchronization streams, and tensor operations. The changes provide better introspection capabilities and make advanced features previously only available in C++ accessible from Python.

  • Adds comprehensive Python bindings for device/memory types and execution provider device handling
  • Introduces synchronization stream support with Python-accessible APIs
  • Implements a new tensor copy functionality for efficient data transfer between OrtValue objects

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
onnxruntime/test/python/onnxruntime_test_python_autoep.py Adds tests for new EP device memory info, sync stream creation, and tensor copy functionality
onnxruntime/test/python/onnxruntime_test_python.py Adds tests for new session memory info properties and device/memory info APIs
onnxruntime/python/onnxruntime_pybind_state.cc Core implementation of new Python bindings for device types, memory info, sync streams, and tensor copy
onnxruntime/python/onnxruntime_pybind_ortvalue.cc Changes OrtValue data_ptr return type from int64_t to uintptr_t for better cross-platform compatibility
onnxruntime/python/onnxruntime_inference_collection.py Adds Python wrapper methods for accessing session memory info and EP devices, plus tensor copy function
onnxruntime/init.py Exports new public APIs for device/memory types and tensor copy functionality
include/onnxruntime/core/session/onnxruntime_cxx_inline.h Refactors sync stream implementation to support templated approach
include/onnxruntime/core/session/onnxruntime_cxx_api.h Generalizes SyncStream handling with template-based implementation for better type safety

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@yuslepukhin yuslepukhin merged commit abc63e8 into main Sep 17, 2025
87 of 92 checks passed
@yuslepukhin yuslepukhin deleted the yuslepukhin/cs_python branch September 17, 2025 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants