Skip to content

Conversation

@nazanin-beheshti
Copy link

Disable QDQ scaling and strip on OV GPU for testing Copilot model accuracy and functionality.

jatinwadhwa921 and others added 30 commits February 24, 2025 18:49
 Changes to make sure to honor SessionOptions API Contract
* Fix flash attention for GQA (Phi4) (microsoft#23850)

### Description
This change fixes GQA for Flash Attention on Nvidia GPUs. The root cause
appears to be
`k_start + capped_sg_id < seq_causal_length`
check. This is either because, 
a. seq_causal_length varies per lane, so the check becomes non uniform
control flow, which is having interactions with subgroupShuffle.
or 
b. The check itself is incorrect and is wiping out values of v based on
the source lane's seq_causal_length. While in actualness values of v
need to be causal as per the lane that is going to multiply it with qkt.

qkt is already causal because earlier values of qk for out of bounds k
are set to min_value, and exp(<-4) are 0.

This fix works by removing that causal check and relying on the qk being
wiped out earlier. The documentation for causality behavior for GQA is
missing to determine which of this reason is the true reason.

Prior to this prompts with sequence length > 16 < 32 or 1k would break
with Phi 4 but smaller prompts would work.
Tested on Intel Alderlake, Nvidia 4070.

* Model Builder API (microsoft#23223)

### Description
<!-- Describe your changes. -->
Supports creating a model programmatically using the ORT C or C++ API. 
Supports augmenting an existing model to add nodes.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* Fix typo: change `Upample` to `Upsample`. (microsoft#23838)

### Description
<!-- Describe your changes. -->
Fixed a typo in function names related to the Upsample CUDA kernel.
Changed incorrect spelling Upample to Upsample across relevant
functions.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This change is necessary to maintain consistency and prevent potential
confusion caused by incorrect function names.

* [doc] Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ (microsoft#23848)

### Description
<!-- Describe your changes. -->
Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* Quant tool: Consistent `get_qdq_config` and `get_qnn_qdq_config` behavior (microsoft#23856)

* Change the logic to generate the default ep context file name (microsoft#23788)

Change the logic to generate the default ep context file name

### Description
Applies to all EPs: replace the .onnx to _ctx.onnx, instead of directly append extra string _ctx.onnx to existing model path. In QNN EP, also make the context binary .bin file shorter by removing QNNExecutionProvider_ from the file name.

* Make Nuget QNN package pipeline 1ES compliant (microsoft#23805)

### Description
Make
[QNN_Nuget_Windows](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1234)1ES
compliant



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* [js/common] allows using Uint16Array as data for float16 tensor (microsoft#23827)

### Description

Resolve microsoft#23817



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* [js/webgpu] Reland the optimization of ConvTranspose (microsoft#23858)

This PR fixes the errors in the ConvTranspose optimization and adds
tests to ensure the correctness of the implementation.

* [OpenVINO] Fix a build warning (microsoft#23877)

### Description
Fix a warning with std::move usage



### Motivation and Context
Possibly allow building without --compile_no_warning_as_error flag

* Change gsl::byte to std::byte (microsoft#23872)

To be compatible with the latest GSL library. Without this fix we will
get:

```
onnxruntime\core\providers\cpu\controlflow\loop.cc(247): error C4996: 'gsl::byte': Use std::byte instead.
```

* Allow using extended minimal build for several EPs (microsoft#23834)

### Description

#### Background

From code search, the following EPs use
`onnxruntime::GetCpuPreferredNodes()` in their `GetCapabilities()`
methods:
- CANN
- CUDA
- DML
- JS
- ROCM
- WebGPU

However, the source file that implements
`onnxruntime::GetCpuPreferredNodes()` is excluded when minimal build is
ON:
https://github.com/microsoft/onnxruntime/blob/6df0973e58ba5399fcaa98686f70ed9a9e59aaef/cmake/onnxruntime_framework.cmake#L38-L42

This means that all EPs mentioned above is not able to compile with
minimal build.

#### Solution

The excluded file `core/framework/fallback_cpu_capability.cc` cannot
build in minimal build because some of its dependencies are not included
in the minimal build. However, in extended minimal build mode, all
dependencies are available.

This PR looses the restrict and allows to compile this file when it is
extended minimal build. After this change, those EPs are able to compile
in extended minimal build.

* Add dawn to ThirdPartyNotices (microsoft#23876)

### Description

Add `dawn` to ThirdPartyNotices.

* Enable QNN EP weight sharing generation using public API (microsoft#23702)

### Description
Enable QNN EP weight sharing generation using public API instead of internal interfaces, so that user can integrate into their own toolchain. The change is to share the QnnBackendManager across ORT sessions if ep.share_ep_contexts is enabled. And there is extra option to end the share so that we know when to remove the shared QnnBackendManager from the singleton.

Change the tool name from onnxruntime_qnn_ctx_gen to ep_weight_sharing_ctx_gen, so that it can be shared for other EPs.

* [QNN-EP]: Fix inference failures while running with htp_shared_memory (microsoft#23892)

### Description
When using the enable_htp_shared_memory feature, we see that the address
of the buffer passed to rpcmem_free is incorrect. So the rpc buffers are
not freed leading to memory exhaustion.

### Motivation and Context
When using the enable_htp_shared_memory_allocator feature for QNN in
GenAI extensions, it leads to inference failures during the second
prompt. As GenAI memory asks are higher, it surfaces sooner in gen AI
use cases.

Co-authored-by: Ashish Garg <[email protected]>

* Fix enable_pix_capture build for WebGPU (microsoft#23857)

The build option --enable_pix_capture is broken. This fixes the problem.

---------

Co-authored-by: wp <[email protected]>

* [WebGPU-EP Native] Add ReduceMean (microsoft#23860)

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* [WebGPU EP] introduce BiasAdd contrib op (microsoft#23861)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Dynamo export and improve benchmark script for SAM2 encoder (microsoft#23887)

### Description
* Add dynamo export for Sam2 image encoder
* Verify fp32 onnx model with CPU EP (to avoid error message from TRT
EP).
* Update benchmark script:
  - output ORT profiling
- output torch compiled code and unique kernel name for compiled kernel
  - add an option for nightly package installation
  - uninstall existing ort packages before installing

The node metadata of dynamo exported model can help mapping node in onnx
model back to pytorch modeling script. Currently, the graph optimization
is not done on dynamo exported model, so it is experimental right now.

### Motivation and Context

To support profiling of torch compiled CUDA kernel.

* [js/web] improve workaround for bundlers (microsoft#23902)

### Description
This PR improves the workaround for bundlers in onnxruntime-web.
Specifically, the following changes have been made:

- Use [this
workaround](xenova@9c50aa2)
as suggested by @xenova in
huggingface/transformers.js#1161 (comment)

- Use `url > "file:" && url < "file;"` instead of
`url.startsWith("file:")` to allow minifiers to remove dead code
correctly.

This change allows to remove unnecessary dependencies of file parsed
from `new URL("ort.bundle.min.js", import.meta.url)` in Vite, and
optimize code like `if("file://filepath.js".startsWith("file:"))
{do_sth1(); } else {do_sth2();}` into `do_sth1()` for webpack/terser
usages.

Resolves huggingface/transformers.js#1161

* [webgpu] Restore MatMulNBits workgroup size for Phi-3.5 (microsoft#23349)

### Description
This change restores the MatMulNBits workgroup size from (8, 8, 1) back
to (16, 8, 1) to resolve a performance regression observed on Intel
iGPUs during token generation (M=1).

### Motivation and Context
As above.

Signed-off-by: Jianhui Dai <[email protected]>

* [webgpu] support Pad operator (microsoft#23141)

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* [WebNN] Accept Float16Array for float16 data type if it is available (microsoft#23894)

Float16Array is now shipping and WebNN Chromium implementation has
accepted it. We should allow it in WebNN EP as well.

* Ensure that the 'cmake_minimum_required' is version 3.5 or greater (microsoft#23888)

### Description
CMake 4.0 release candidate 2.0 is available, and it cannot compile all
of OnnxRuntime out-of-the-box. There's portions of the OnnxRuntime
codebase that specify a `cmake_minimum_required` version of 3.0, and
CMake 4.0 has removed support for compatibility with CMake < 3.5 - the
following error is reported:

```
CMake Error at winml_sdk_helpers.cmake:4 (cmake_minimum_required):
  Compatibility with CMake < 3.5 has been removed from CMake.

  Update the VERSION argument <min> value.  Or, use the <min>...<max> syntax
  to tell CMake that the project requires at least <min> but has been updated
  to work with policies introduced by <max> or earlier.

  Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway.
```

Since CMake 3.5 appears to have shipped in 2016, it seems reasonable to
set that as a minimum version to fix the error. The root CMakeLists.txt
does ask for a minimum version of 3.28, so we could snap to that, but
I'm still ramping up on the build, so wanted to propose a minimally
sufficient fix.

### Motivation and Context
Being able to build with the latest CMake - when it ships - reduces the
barrier to entry to building OnnxRuntime, and allows the OnnxRuntime to
leverage the latest and greatest tooling.

* WebGPU: Remove deprecated subgroups-f16 from WebGPU native and JS EP (microsoft#23898)

This PR removes the deprecated subgroups-f16 from WebGPU native and JS
EP, and also remove the unused deviceInfo in WebGPU JS EP.

* [JSEP/WebGPU] Fixed error in softmax dispatch. (microsoft#23906)

### Description
Fixed an error softmax dispatch



### Motivation and Context
Produce expected results for LlaMA model

* enable WebGPU EP in WebAssembly build (microsoft#23913)

### Description

This PR is the first step for migrating the webgpu backend of
onnxruntime-web from JSEP based to WebGPU EP based.

In this change, we enable building WebGPU EP in a wasm build (ie.
`--build_wasm` `--use_webgpu` `--use_jsep`). However, the old build
flags should still keep previous behavior.

* Adding OpenVINO Windows CI Pipeline (microsoft#23919)

### Description
<!-- Describe your changes. -->

Enable an OpenVINO Windows CI pipeline. This includes:
- Downloading the OpenVINO toolkit for Windows from an external source.
- Setting up OpenVINO environment variables.
- Building the ONNX Runtime OpenVINO Execution Provider.
- Running unit tests.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

This change is required to run checks on precommit and commit in the
ONNX Runtime project. It ensures that the code is tested with the
OpenVINO toolkit on Windows, improving the reliability and compatibility
of the project.

* [WebGPU EP] SoftMax Implementation (microsoft#23538)

Increase coverage for WebGPU Op

* Exclude MAUI projects from GPU C# packaging builds (microsoft#23923)

### Description
<!-- Describe your changes. -->
Use 'desktop only' solution in GPU C# packaging builds. We don't need to
include any MAUI support for those builds.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

* Support all block sizes that are multiples of 32 for DP4A (microsoft#23907)

### Description
Simple change 
1. The DP4A shader actually supports all block sizes that are multiples
of 32, relaxing the restriction and making a small tweak to support
sizes other than 32.
2. Moved the shader to a separate file for maintainability.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Example custom op with output type inferencing (microsoft#23916)

### Description
<!-- Describe your changes. -->
Add example of a custom op that is required to do type inference for the
output type for the model load to work.
Also acts as an example of how to override an ONNX op with a custom
implementation.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
microsoft#23891

* Enabling L2+ Optimizations for EPs (microsoft#23517)

There are some requirements to modify the graph which are specific to
the EP/hardware.
ORT has the hardcoded EP list for optimizations but that can't scale and
it's hard be extended to enable EP custom optimizations.

Here is the prototype to enable L2+ optimizations for EPs (The original
overview is provided by @skottmckay) as well as the TRT EP
implementation for the ConstantFoldingDQ optimization.

Signatures for selection and optimization functions:
````
  - Selection: std::function<std::vector<std::unique_ptr<ComputeCapability>>(const GraphViewer&, const KeyValueConfig&)>
  - Optimization: std::function<Status(const Graph&, const ComputeCapability& this_optimization, ComputeCapability& cc_to_update)>
````
GetCapability

- call (new) provider bridge API to lookup pre-defined optimizer by name
and get selection function
- ComputeCapability.optimize_func, i.e. optimization function, would be
set by the optimizer to the function that does the optimization

- EP has to update the returning ComputeCapability to include the
optimization ComputeCapability in nodes_to_optimize. So that later ORT
can perform optimization/transformation accordingly.

GraphPartitioner

- After assigning the ComputeCapability to the EP and prior to Compile,
if the ComputeCapability has nodes_to_optimize, iterate that list
  - optimization function needs to be called with
    - a mutable Graph instance
    - the ComputeCapability for the individual optimization
    - the overall ComputeCapability so it can be updated

* fix binplace file in web pipeline (microsoft#23930)

* Updated run_CIs_for_external_pr.py to support the Windows OpenVINO CI pipeline (microsoft#23931)

* Fix ConvInteger handling of optional inputs. (microsoft#23935)

### Description
<!-- Describe your changes. -->
Fix ConvInteger handling of optional inputs. Need to check Exists() and
not just the number of inputs.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
microsoft#23927

* Updated ov version in pipeline (intel#595) (microsoft#23882)

### Description
This PR updates the OpenVINO version used in the pipeline from 2024.5.0
to 2025.0.0

Co-authored-by: jatinwadhwa921 <[email protected]>

* [AIX] External data handling (microsoft#23859)

### Description
In BE system, model tensor data coming from external file is not handled
properly.
This was found during the debugging of
(microsoft/onnxruntime-genai#1104)

This PR changes do the endianness conversion of data loaded from
external file in BE system.

* Create a packaging pipeline for a custom nuget package (microsoft#23918)

* Fix license in example test code. (microsoft#23936)

* replace usage of gsl::narrow and gsl::narrow_cast in WebGPU EP (microsoft#23926)

### Description

`gsl::narrow` does not work in no exception build.
- use `onnxruntime::narrow` if necessary;
- or change to `static_cast` if it's obviously safe.

also apply the changes to usage of `gsl::narrow_cast`, which does not
apply checks.

* VCPKG improvement: set  VCPKG_OSX_DEPLOYMENT_TARGET (microsoft#23933)

### Description
1. Set  VCPKG_OSX_DEPLOYMENT_TARGET for macOS targets
2. Enable VCPKG in more pipelines.

* Allow using a different version of flatbuffers when building with vcpkg (microsoft#23946)

### Description
Allow using a different version of flatbuffers when building with vcpkg,
so that users do not need to pin flatbuffer's version, which provides
more flexibility in the build process.

Delete utf8_range from the dependencies, because it is an indirect
dependency of protobuf, which is already included in the build process.
### Motivation and Context

* Make python package pipeline 1ES compliant (microsoft#23800)

### Description
Make [Python packaging
pipeline](https://aiinfra.visualstudio.com/530acbc4-21bc-487d-8cd8-348ff451d2ff/_build?definitionId=841)
1ES compliant



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Checklist

- [x] Make Onnxruntime-QNNEP-Windows-2022-CPU stateless

* Delete ROCM Nuget Publishing Pipeline (microsoft#23948)

* Bump SixLabors.ImageSharp from 2.1.9 to 2.1.10 in /csharp/sample/Microsoft.ML.OnnxRuntime.FasterRcnnSample (microsoft#23924)

Bumps [SixLabors.ImageSharp](https://github.com/SixLabors/ImageSharp)
from 2.1.9 to 2.1.10.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/SixLabors/ImageSharp/releases">SixLabors.ImageSharp's
releases</a>.</em></p>
<blockquote>
<h2>v2.1.10</h2>
<h2>What's Changed</h2>
<ul>
<li>Backport <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2859">#2859</a>
to release/2.1.x by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2890">SixLabors/ImageSharp#2890</a></li>
<li>Backport <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2701">#2701</a>
to 2.1.x [copy] by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2891">SixLabors/ImageSharp#2891</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/d133ef99e8becfc3b924b0bb4315e63b8681d307"><code>d133ef9</code></a>
Set lang version</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/5dfe5a800367581239de442cc18de659da6e9b1d"><code>5dfe5a8</code></a>
Missed cache action update</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/4d3a85112b03c89d2cb8616a5b747684b6e73730"><code>4d3a851</code></a>
Use latest cache action</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/4cb9f40a722ab2b837157862f0320c6a652da4d0"><code>4cb9f40</code></a>
Merge pull request <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2891">#2891</a>
from SixLabors/af/backport-2701</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/bb82f79db0197166271d4355b5fb5ceda370a906"><code>bb82f79</code></a>
<a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2701">#2701</a>
to 2.1.x [copy]</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/627b5f721f30f6d529acb50bd81f92bd3db754eb"><code>627b5f7</code></a>
Merge pull request <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2890">#2890</a>
from SixLabors/af/backport-2859</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/67f7848d6e975e7956c8056823555de49a5fdf6d"><code>67f7848</code></a>
try to fix LFS for *.BMP</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/44d294e06606111195152ead3006452357ef1bb9"><code>44d294e</code></a>
8.0.x is not needed</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/adb85d9e66aa3a588a86f4a4ef9a0539a8502117"><code>adb85d9</code></a>
Another attempt for a Linux-specific skip</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/efc3fc4ee15eec4e523c26f7130e786541b00df2"><code>efc3fc4</code></a>
Disable BmpDecoder_CanDecode_Os2BitmapArray on Linux</li>
<li>Additional commits viewable in <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=SixLabors.ImageSharp&package-manager=nuget&previous-version=2.1.9&new-version=2.1.10)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: Jianhui Dai <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Sushanth Rajasankar <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
Co-authored-by: Seungtaek Kim <[email protected]>
Co-authored-by: co63oc <[email protected]>
Co-authored-by: Jambay Kinley <[email protected]>
Co-authored-by: Hector Li <[email protected]>
Co-authored-by: Jian Chen <[email protected]>
Co-authored-by: Yulong Wang <[email protected]>
Co-authored-by: Jiajia Qin <[email protected]>
Co-authored-by: Alessio Soldano <[email protected]>
Co-authored-by: Changming Sun <[email protected]>
Co-authored-by: Ashish Garg <[email protected]>
Co-authored-by: Ashish Garg <[email protected]>
Co-authored-by: Jie Chen <[email protected]>
Co-authored-by: wp <[email protected]>
Co-authored-by: Satya Kumar Jandhyala <[email protected]>
Co-authored-by: Prathik Rao <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Tianlei Wu <[email protected]>
Co-authored-by: Jianhui Dai <[email protected]>
Co-authored-by: xhcao <[email protected]>
Co-authored-by: Wanming Lin <[email protected]>
Co-authored-by: Mark Schofield <[email protected]>
Co-authored-by: jiangzhaoming <[email protected]>
Co-authored-by: Yi-Hong Lyu <[email protected]>
Co-authored-by: vraspar <[email protected]>
Co-authored-by: Chi Lo <[email protected]>
Co-authored-by: saurabh <[email protected]>
Co-authored-by: Ranjit Ranjan <[email protected]>
Co-authored-by: Baiju Meswani <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Revert "Rebasing with msft commits"
This reverts commit 920ed58, reversing
changes made to a6cdf62.
…#613)

This change allows for allocations made by the ov allocator to be
imported to other APIs that require base addresses to the original
device allocation.
ankitm3k and others added 30 commits November 3, 2025 11:25
Sync with Microsoft ONNX Runtime - 03/11/2025
CVS-175736-[OVEP] Enable stateful mode for Phi-silica models
…#844)

* openvino_provider_factory: Add nested map support to load_config parsing

* ParseInnerMap: Add warning that unsupported json types will become fatal in the future

* ParseInnerMap: address review comments

* load_config: Throw error for unsupported JSON types

---------

Co-authored-by: MayureshV1 <[email protected]>
intel#841)

* disable bfloat16 conversion when single cast node to bfloat16, unit test case

* Insert a Cast(To:BFloat16) before output node(bfloat16) to keep user use original bf16 outputs tensor

* revert changes to add Cast Node, add statement to disable bfloat16 transform for OV CPU

* remove bfloat16 silence conversion

* remove bf16 testing and cpu support for openvino

---------

Co-authored-by: MayureshV1 <[email protected]>
* Implement single bin

* Fix size mismatch for larger blobs

* disallow embed mode + sharing

* Tweak main context usage

* Remove redundant stop share setting

Co-authored-by: Copilot <[email protected]>

* Only reject if share context generated

* Fix up bin manager lifetimes

* Fix ep context node path

* Remove unnecessary initialized flag from BinManager

* Refactor BackendManager and EPCtxHandler to use SharedContextManager, removing SharedBinManager references

* tweak lock ordering

* Tweak when we use the active shared context

* Ensure all blobs are available at epctx export

* Update onnxruntime/core/providers/openvino/openvino_execution_provider.cc

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Copilot <[email protected]>
Co-authored-by: MayureshV1 <[email protected]>
Sync with Microsoft ONNX Runtime - 14/11/2025
…egy to get the pairs of KV name (intel#845)

* use output-to-input strategy to get the pairs of KV name

* minor change

* remove regex for extracting pattern

* Address review

* Design strict KV patterns: only two separately for key and value; patterns have to be followed by _%d

* simplify code structure

* address review

* remove useless comment

* add brief example to explain the functionalities

---------

Co-authored-by: MayureshV1 <[email protected]>
* Catch model import failure and report the appropriate error

* Address review comments

---------

Co-authored-by: ankitm3k <[email protected]>
Co-authored-by: MayureshV1 <[email protected]>
…ctional compatibility (intel#851)

* Modify shared context lifetime

* Provide more helpful error message when failing to deserialize bin

* Remove unused clear functions

* Remove unused variable
…E_OUT is disabled (intel#850)

* ovep stateful: Enable explicit slice of prefill logits when NPUW_SLICE_OUT is disabled

* Update onnxruntime/core/providers/openvino/ov_interface.cc

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Copilot <[email protected]>
Co-authored-by: MayureshV1 <[email protected]>
…ubgraph partitioning (intel#838)

* added a line to add initializers to be a part of meta_def -> inputs

* fixed possible array index out of bound problem which caused some models to fail rather than getting sg partitioned

* changed loop logic

* reverting to the previous logic to ensure j value is retained and not incremented if append_node == true

* updated loop logic

---------

Co-authored-by: Preetha Veeramalai <[email protected]>
Sync with Microsoft ONNX Runtime - 19/11/2025
Sync with Microsoft ONNX Runtime - 25/11/2025
* Fix npm audit vulnerabilities in /js directory (microsoft#26632)

### Description

Resolved all security vulnerabilities in JavaScript packages under `/js`
by running `npm audit fix`. All updates are non-breaking patch/minor
version bumps.

**Fixed vulnerabilities:**

- `/js` root: 1 high severity
  - `glob` 10.4.5 → 10.5.0 (command injection - GHSA-5j98-mcp5-4vw2)

- `/js/react_native`: 7 vulnerabilities (1 high, 3 moderate, 3 low)
- `image-size` → 1.2.1 (high: DoS via infinite loop -
GHSA-m5qc-5hw7-8vg7)
- `@babel/helpers` 7.25.6 → 7.28.4 (moderate: RegExp complexity -
GHSA-968p-4wvh-cqc8)
- `@babel/runtime` 7.25.6 → 7.28.4 (moderate: RegExp complexity -
GHSA-968p-4wvh-cqc8)
- `js-yaml` → fixed (moderate: prototype pollution -
GHSA-mh29-5h37-fv8m)
  - `brace-expansion` 2.0.1 → 2.0.2 (low: ReDoS - GHSA-v6h2-p8h4-qcjw)
- `on-headers` → fixed (low: header manipulation - GHSA-76c9-3jph-rj3q)

**Files modified:**
- `js/package-lock.json`
- `js/react_native/package-lock.json`

**Result:** All JS packages (`/js`, `/js/common`, `/js/web`, `/js/node`,
`/js/react_native`) now report 0 vulnerabilities.

### Motivation and Context

Security maintenance to address dependency vulnerabilities identified by
`npm audit`. No breaking changes or code modifications required.

<!-- START COPILOT CODING AGENT SUFFIX -->



<details>

<summary>Original prompt</summary>

> Please create a pull request that runs `npm audit fix` for the
JavaScript/TypeScript portion of the repository under the `/js`
directory of
[microsoft/onnxruntime](https://github.com/microsoft/onnxruntime).
> 
> Requirements:
> 
> 1. **Scope**
> - Work only within the `/js` folder and its subpackages (e.g.,
`js/web`, `js/node`, `js/common`, etc.).
>    - Do not modify files outside `/js`.
> 
> 2. **Dependency updates**
> - Run `npm audit fix` (and, if necessary to fully resolve
high/critical issues while staying non-breaking, `npm audit fix --force`
on specific subpackages) to address security vulnerabilities.
> - Prefer minimal, non-breaking version bumps (patch and minor) that
satisfy `npm audit` while keeping semver ranges sensible.
> - If any **major** upgrades are required to clear vulnerabilities,
handle them cautiously:
> - Apply the upgrade only if tests still pass and typings/build setup
remain compatible.
> - If a major bump would require code changes or creates breaking
behavior, **do not** apply it; instead, leave a TODO comment in the PR
description summarizing which packages remain vulnerable and why.
> 
> 3. **Validation**
> - Run the existing JS-related checks that the repo supports from
`/js`, such as:
>      - `npm test` or package-specific test scripts.
> - Any documented lint/build/test commands for JS packages (e.g., `npm
run build`, `npm run lint`) where applicable.
> - Ensure the updated lockfiles (if present) are consistent, and the
project installs cleanly with `npm ci` (or the repo's documented install
command) in the `/js` area.
> 
> 4. **Files to update**
> - Update `package.json` and lockfiles under `/js` (e.g.,
`package-lock.json`, `npm-shrinkwrap.json`, or workspace-specific lock
files) to reflect the audited dependency tree.
> - Do not manually edit `node_modules`; rely on `npm` to manage
dependencies and only commit manifest/lockfile changes.
> 
> 5. **Repository conventions**
> - Follow this repo's existing conventions for formatting, commit
messages, and JS tooling.
> - Keep the diff focused on the dependency and lockfile updates plus
any absolutely necessary code tweaks to maintain compatibility.
> 
> 6. **Pull request description**
>    - In the PR body, include:
> - A short summary: that `npm audit fix` was run in `/js` to address
dependency vulnerabilities.
> - A bullet list of notable dependency changes (especially any major
version bumps), with packages and old/new versions.
>      - A brief testing summary (commands run and their results).
> - A note about any remaining vulnerabilities that could not be fixed
without breaking changes (if applicable), including the affected
packages and advisories if available.
> 
> The goal is a clean, minimal PR that improves the security posture of
the JS packages under `/js` in `microsoft/onnxruntime` without
introducing breaking changes.


</details>

*This pull request was created as a result of the following prompt from
Copilot chat.*
> Please create a pull request that runs `npm audit fix` for the
JavaScript/TypeScript portion of the repository under the `/js`
directory of
[microsoft/onnxruntime](https://github.com/microsoft/onnxruntime).
> 
> Requirements:
> 
> 1. **Scope**
> - Work only within the `/js` folder and its subpackages (e.g.,
`js/web`, `js/node`, `js/common`, etc.).
>    - Do not modify files outside `/js`.
> 
> 2. **Dependency updates**
> - Run `npm audit fix` (and, if necessary to fully resolve
high/critical issues while staying non-breaking, `npm audit fix --force`
on specific subpackages) to address security vulnerabilities.
> - Prefer minimal, non-breaking version bumps (patch and minor) that
satisfy `npm audit` while keeping semver ranges sensible.
> - If any **major** upgrades are required to clear vulnerabilities,
handle them cautiously:
> - Apply the upgrade only if tests still pass and typings/build setup
remain compatible.
> - If a major bump would require code changes or creates breaking
behavior, **do not** apply it; instead, leave a TODO comment in the PR
description summarizing which packages remain vulnerable and why.
> 
> 3. **Validation**
> - Run the existing JS-related checks that the repo supports from
`/js`, such as:
>      - `npm test` or package-specific test scripts.
> - Any documented lint/build/test commands for JS packages (e.g., `npm
run build`, `npm run lint`) where applicable.
> - Ensure the updated lockfiles (if present) are consistent, and the
project installs cleanly with `npm ci` (or the repo's documented install
command) in the `/js` area.
> 
> 4. **Files to update**
> - Update `package.json` and lockfiles under `/js` (e.g.,
`package-lock.json`, `npm-shrinkwrap.json`, or workspace-specific lock
files) to reflect the audited dependency tree.
> - Do not manually edit `node_modules`; rely on `npm` to manage
dependencies and only commit manifest/lockfile changes.
> 
> 5. **Repository conventions**
> - Follow this repo's existing conventions for formatting, commit
messages, and JS tooling.
> - Keep the diff focused on the dependency and lockfile updates plus
any absolutely necessary code tweaks to maintain compatibility.
> 
> 6. **Pull request description**
>    - In the PR body, include:
> - A short summary: that `npm audit fix` was run in `/js` to address
dependency vulnerabilities.
> - A bullet list of notable dependency changes (especially any major
version bumps), with packages and old/new versions.
>      - A brief testing summary (commands run and their results).
> - A note about any remaining vulnerabilities that could not be fixed
without breaking changes (if applicable), including the affected
packages and advisories if available.
> 
> The goal is a clean, minimal PR that improves the security posture of
the JS packages under `/js` in `microsoft/onnxruntime` without
introducing breaking changes.

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for
you](https://github.com/microsoft/onnxruntime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot)
— coding agent works faster and does higher quality work when set up for
your repo.

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: fs-eire <[email protected]>

* [webgpu] Optimize InstanceNormalization by removing redundant transpose (microsoft#26626)

### Description
<!-- Describe your changes. -->

This PR optimizes `InstanceNormalization` by removing redundant
transpose.

Given the implementation of `InstanceNormalization` for `NCHW` is more
effiencient, we don't need to add wrapper `Transpose` to make it run in
`NHWC`, which helps use to elide redundant transpose and improve
performance.

Testing on Lunar Lake shows about `~60%` performance improvement in
`InstanceNormalization` operations.

#### `InstanceNormalization` OP benchmark
The input tensor shape: `(1,32,1048576)`
The scale tensor shape: `(32)`
The B tensor shape: `(32)`

| time cost (ms) | baseline | opt | diff |
| ---------------- | -------- | ---- | ---- |
| Lunar Lake | 82.6 | 34.2 | 58% |

#### Model benchmark
| time cost (ms) | baseline | opt | diff |
| ---------------- | -------- | ---- | ---- |
| sd-turbo-vae-decoder-fp16-demo | 2437.6 | 1835.9 | 25% |

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Please see above

* [webgpu] refactor a few "context" classes (microsoft#26602)

### Description

This PR refactors a few "context" classes to make it clearer and support
new features.

---------

Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Bump actions/checkout from 5 to 6 (microsoft#26641)

Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to
6.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/releases">actions/checkout's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update README to include Node.js 24 support details and requirements
by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2248">actions/checkout#2248</a></li>
<li>Persist creds to a separate file by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2286">actions/checkout#2286</a></li>
<li>v6-beta by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2298">actions/checkout#2298</a></li>
<li>update readme/changelog for v6 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2311">actions/checkout#2311</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v5.0.0...v6.0.0">https://github.com/actions/checkout/compare/v5.0.0...v6.0.0</a></p>
<h2>v6-beta</h2>
<h2>What's Changed</h2>
<p>Updated persist-credentials to store the credentials under
<code>$RUNNER_TEMP</code> instead of directly in the local git
config.</p>
<p>This requires a minimum Actions Runner version of <a
href="https://github.com/actions/runner/releases/tag/v2.329.0">v2.329.0</a>
to access the persisted credentials for <a
href="https://docs.github.com/en/actions/tutorials/use-containerized-services/create-a-docker-container-action">Docker
container action</a> scenarios.</p>
<h2>v5.0.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Port v6 cleanup to v5 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2301">actions/checkout#2301</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v5...v5.0.1">https://github.com/actions/checkout/compare/v5...v5.0.1</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/blob/main/CHANGELOG.md">actions/checkout's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2>V6.0.0</h2>
<ul>
<li>Persist creds to a separate file by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2286">actions/checkout#2286</a></li>
<li>Update README to include Node.js 24 support details and requirements
by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2248">actions/checkout#2248</a></li>
</ul>
<h2>V5.0.1</h2>
<ul>
<li>Port v6 cleanup to v5 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2301">actions/checkout#2301</a></li>
</ul>
<h2>V5.0.0</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
</ul>
<h2>V4.3.1</h2>
<ul>
<li>Port v6 cleanup to v4 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2305">actions/checkout#2305</a></li>
</ul>
<h2>V4.3.0</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<h2>v4.2.2</h2>
<ul>
<li><code>url-helper.ts</code> now leverages well-known environment
variables by <a href="https://github.com/jww3"><code>@​jww3</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li>
<li>Expand unit test coverage for <code>isGhes</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li>
</ul>
<h2>v4.2.1</h2>
<ul>
<li>Check out other refs/* by commit if provided, fall back to ref by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li>
</ul>
<h2>v4.2.0</h2>
<ul>
<li>Add Ref and Commit outputs by <a
href="https://github.com/lucacome"><code>@​lucacome</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1180">actions/checkout#1180</a></li>
<li>Dependency updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>- <a
href="https://redirect.github.com/actions/checkout/pull/1777">actions/checkout#1777</a>,
<a
href="https://redirect.github.com/actions/checkout/pull/1872">actions/checkout#1872</a></li>
</ul>
<h2>v4.1.7</h2>
<ul>
<li>Bump the minor-npm-dependencies group across 1 directory with 4
updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1739">actions/checkout#1739</a></li>
<li>Bump actions/checkout from 3 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1697">actions/checkout#1697</a></li>
<li>Check out other refs/* by commit by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1774">actions/checkout#1774</a></li>
<li>Pin actions/checkout's own workflows to a known, good, stable
version. by <a href="https://github.com/jww3"><code>@​jww3</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1776">actions/checkout#1776</a></li>
</ul>
<h2>v4.1.6</h2>
<ul>
<li>Check platform to set archive extension appropriately by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1732">actions/checkout#1732</a></li>
</ul>
<h2>v4.1.5</h2>
<ul>
<li>Update NPM dependencies by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1703">actions/checkout#1703</a></li>
<li>Bump github/codeql-action from 2 to 3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1694">actions/checkout#1694</a></li>
<li>Bump actions/setup-node from 1 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1696">actions/checkout#1696</a></li>
<li>Bump actions/upload-artifact from 2 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1695">actions/checkout#1695</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/actions/checkout/commit/1af3b93b6815bc44a9784bd300feb67ff0d1eeb3"><code>1af3b93</code></a>
update readme/changelog for v6 (<a
href="https://redirect.github.com/actions/checkout/issues/2311">#2311</a>)</li>
<li><a
href="https://github.com/actions/checkout/commit/71cf2267d89c5cb81562390fa70a37fa40b1305e"><code>71cf226</code></a>
v6-beta (<a
href="https://redirect.github.com/actions/checkout/issues/2298">#2298</a>)</li>
<li><a
href="https://github.com/actions/checkout/commit/069c6959146423d11cd0184e6accf28f9d45f06e"><code>069c695</code></a>
Persist creds to a separate file (<a
href="https://redirect.github.com/actions/checkout/issues/2286">#2286</a>)</li>
<li><a
href="https://github.com/actions/checkout/commit/ff7abcd0c3c05ccf6adc123a8cd1fd4fb30fb493"><code>ff7abcd</code></a>
Update README to include Node.js 24 support details and requirements (<a
href="https://redirect.github.com/actions/checkout/issues/2248">#2248</a>)</li>
<li>See full diff in <a
href="https://github.com/actions/checkout/compare/v5...v6">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=5&new-version=6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add LogEvaluationStart for ReplayGraph (microsoft#26645)

### Description
<!-- Describe your changes. -->

add LogEvaluationStart for ReplayGraph to match LogEvaluationStop

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

So by using ETW, could capture run time correctly

Co-authored-by: hualxie <[email protected]>

* add LogCompileModel to mark the session usage (microsoft#26646)

### Description
<!-- Describe your changes. -->

add LogCompileModel to mark the session usage as Compile because that
session will not be used for inference
We could also use it to log compile model parameters if needed

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

We are building a profiling tool for WinML and we want to differentiate
Compile session and inference session.

I think there are two ways to do it but I don't know which is better

microsoft#26646
microsoft#26647

---------

Co-authored-by: hualxie <[email protected]>

* [webgpu] Fix bug introduced by RoE (microsoft#26661)

Fix bug introduced by microsoft#26563 which used the wrong condition by accident
and results incorrect result in graph capture mode.

* [QNN-EP] Enable verbose and artifacts saving in onnxruntime_provider_test.exe (microsoft#26396)

### Description
<!-- Describe your changes. -->
- The change allows users to better debug unit tests by adding the
following environment variables:
    - `QNN_DUMP_ONNX`: Dump input onnx model
- `QNN_DUMP_JSON`: Dump json qnn graph with provider_option
`dump_json_qnn_graph`
- `QNN_DUMP_DLC`: Dump dlc with provider_option `qnn_ir_backend_path`
    - `QNN_VERBOSE`: Use the log level `ORT_LOGGING_LEVEL_VERBOSE`
- Developers can use the environment variables above to save the
artifacts of QNN-EP testcases to a directory named with
`<TestSuite>_<TestName>`
    ```
        .
├── QnnCPUBackendTests_BatchNorm2D_fp32 # RunQnnModelTest
│ ├── dumped_f32_model.onnx # float32 ONNX model
        │   ├── QNNExecutionProvider_QNN_XXXX_X_X.dlc
        │   └── QNNExecutionProvider_QNN_XXXX_X_X.json
├── QnnHTPBackendTests_BatchNorm_FP16 # TestFp16ModelAccuracy
│ ├── dumped_f16_model.onnx # float16 ONNX model
│ ├── dumped_f32_model.onnx # float32 ONNX model
        │   ├── QNNExecutionProvider_QNN_XXXX_X_X.dlc
        │   └── QNNExecutionProvider_QNN_XXXX_X_X.json
└── QnnHTPBackendTests_BatchNorm2D_U8U8S32 # TestQDQModelAccuracy
├── dumped_f32_model.onnx # float32 ONNX model
            ├── dumped_qdq_model.onnx                   # QDQ ONNX model
            ├── QNNExecutionProvider_QNN_XXXX_X_X.dlc
            └── QNNExecutionProvider_QNN_XXXX_X_X.json

# All artifact files are placed under the current working directory from
which the test binary is invoked.
    ```

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- The Json qnn graph/dlc are helpful for backend to debug
performance/accuracy issues
- By comparing the onnx and Json qnn graph/dlc, we can locate the issue
about graph manipulation.

* [webgpu] Use multiplication instead of pow if exponent is 2 (microsoft#26667)

### Description
More accurately compute Pow(2.0) on WebGPU EP.

Reproduction script:
```py
from onnx import helper, TensorProto
import onnxruntime as ort
import numpy as np

# 1. Create the ONNX model
# Define input and output
input_info = helper.make_tensor_value_info('X', TensorProto.FLOAT, [1, 1])
output_info = helper.make_tensor_value_info('Y', TensorProto.FLOAT, [1, 1])

# Create a constant tensor for the exponent (2.0)
exponent_tensor = helper.make_tensor('exponent', TensorProto.FLOAT, [], [2.0])
exponent_node = helper.make_node('Constant', [], ['exponent_out'], value=exponent_tensor)

# Create the Pow node
# Pow takes two inputs: Base (X) and Power (exponent_out)
pow_node = helper.make_node(
    'Pow',
    inputs=['X', 'exponent_out'],
    outputs=['Y'],
    name='PowNode'
)

# Create the graph
graph_def = helper.make_graph(
    [exponent_node, pow_node],
    'test-model',
    [input_info],
    [output_info]
)

# Create the model
model_def = helper.make_model(graph_def, producer_name='onnx-example')
opset = model_def.opset_import[0]
opset.version = 13 # Ensure opset version supports the operations

# 2. Convert model to string (bytes)
model_str = model_def.SerializeToString()

# 3. Prepare input data
np.random.seed(0)
input_data = np.array([[-2e3]], dtype=np.float32)

# 4. Run on CPUExecutionProvider
sess_cpu = ort.InferenceSession(model_str, providers=['CPUExecutionProvider'])
res_cpu = sess_cpu.run(['Y'], {'X': input_data})[0]
print("CPU Result:", res_cpu)

# 5. Run on WebGpuExecutionProvider
sess_webgpu = ort.InferenceSession(model_str, providers=['WebGpuExecutionProvider'])
res_webgpu = sess_webgpu.run(['Y'], {'X': input_data})[0]
print("WebGPU Result:", res_webgpu)

# Compare results
diff = np.abs(res_cpu - res_webgpu)
max_diff = diff.max().item()
assert max_diff < 1e-5, f"Results do not match within tolerance! Max diff: {max_diff}"
print("Results match!")
```

currently produces
```
CPU Result: [[4.e+06]]
WebGPU Result: [[3.999999e+06]]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[1], [line 56](vscode-notebook-cell:?execution_count=1&line=56)
     54 diff = np.abs(res_cpu - res_webgpu)
     55 max_diff = diff.max().item()
---> [56](vscode-notebook-cell:?execution_count=1&line=56) assert max_diff < 1e-5, f"Results do not match within tolerance! Max diff: {max_diff}"
     57 print("Results match!")

AssertionError: Results do not match within tolerance! Max diff: 1.0
```

but with this PR:
```
CPU Result: [[4.e+06]]
WebGPU Result: [[4.e+06]]
Results match!
```

### Motivation and Context

Leads to downstream issues/inaccuracies for certain models, especially
those which have larger values to compute pow(x,2) for.

cc @guschmue

* Avoid creation of temporary protobuf object (microsoft#26681)

### Description
While profiling session creation time for large graphs (number of nodes,
not size of tensors), we noticed that the creations and subsequent
destructions of protobuf objects were the major hotspot. This PR avoids
its creation.

Signed-off-by: Christian Bourjau <[email protected]>

* Use `std::string_view` directly as key to `absl::flat_hash_map::find` (microsoft#26682)

### Description
Use `std::string_view` directly as key in `find` method of
`flat_hash_map`. This part of the absl documentation may provide further
insights:
https://abseil.io/docs/cpp/guides/container#heterogeneous-lookup


### Motivation and Context
We noticed this when profiling the session creation of large models (in
terms of the number of nodes).

Signed-off-by: Christian Bourjau <[email protected]>

* [webgpu] Convert i32 to u32 in uniforms (microsoft#26676)

In debug mode, `webgpu_context.cc:257 Run Uniform variable[5]
(head_size) data type mismatch in program
"SplitPackedQKVWithRotaryEmbeddingAndCopyKV", Expected: u32, Actual:
i32`. No issue in release mode.

Convert i32 to u32 to avoid this issue.

* [webgpu] Fix BatchNormalization ShapeInferenceError for 2D inputs (microsoft#26659)

### Description

Test model (happens with any 2D inputs):
[2191__visual_projection_visual_projection.1_BatchNormalization.onnx.zip](https://github.com/user-attachments/files/23758390/2191__visual_projection_visual_projection.1_BatchNormalization.onnx.zip)


Command:
```
python -c "import onnxruntime as ort; ort.InferenceSession('2191__visual_projection_visual_projection.1_BatchNormalization.onnx', providers=['WebGpuExecutionProvider'])"
```

Before (failure):
```
Op (BatchNormalization) [ShapeInferenceError] Tensor must have at least 3 dimensions to convert between channels first and channels last.
```

After (success):
```
(nothing, meaning success)
```

### Motivation and Context

This fixes BatchNormalization on WebGPU, matching CPU version.

cc @guschmue

* Clear cuda error on unsupported CudaMemPool test (microsoft#26629)

### Description
<!-- Describe your changes. -->
CudaMemPool test checks if it is supported in a given environment.
We need to clear the error not to affect subsequent tests.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Potential test failure.

* [QNN-EP] Include detailed error message in the returned status (microsoft#26546)

### Description
<!-- Describe your changes. -->
The original error message only shows: "Failed to setup QNN input
tensors for graph: <graph_name>"
This change adds more detailed error information by logging the failure
reason from
[SetupTensors](https://github.com/microsoft/onnxruntime/blob/ea55c160a36d658eae61a4c7aeda6cb55dd54dec/onnxruntime/core/providers/qnn/builder/qnn_model.cc#L386),
making it easier to debug issues.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
User requires detailed error logging for the ORT online context binary
generation.

* add support for int32_t in webgpu / slice (microsoft#26693)

fix for microsoft#26690

* [webgpu] Remove `global_id` and `workgroup_id` in gemm_utils.cc (microsoft#26662)

### Description
This patch replaces `global_id` and `workgroup_id` with
`logical_global_id` and `logical_workgroup_id` which are computed from
`workgroup_idx` and the dispatch workgroup sizes set in
`ProgramBase::SetDispatchGroupSize()`.



### Motivation and Context
We shouldn't use `global_id` or `workgroup_id` directly because the
dispatch workgroup sizes may be normalized in
`ProgramManager::NormalizeDispatchGroupSize()`.

* [webgpu] Correct definition of large numbers, fixes softmax(max_negative_number) in float32 (microsoft#26670)

### Description

The correct definition of the most negative number is
`-3.40282346638528e+38`, according to IEEE 754, but it is being
incorrectly registered inline as a truncated version `-3.402823e+38f`.

```py
>>> import numpy as np
>>> np.finfo(np.float32).min
np.float32(-3.4028235e+38)
>>> np.finfo(np.float32).min.item()
-3.4028234663852886e+38
```

For this reason, values less than this threshold were handled
incorrectly. While this may seem like a small/irrelevant detail, it's
essential in attention masking, where we do in fact use this value,
leading to large numerical errors down the line.


Reproduction:
```py
from onnx import helper, TensorProto
import onnxruntime as ort
import numpy as np

# 1. Create the ONNX model
# Define input and output
input_shape = [1, 2]
input_info = helper.make_tensor_value_info('X', TensorProto.FLOAT, input_shape)
output_info = helper.make_tensor_value_info('Y', TensorProto.FLOAT, input_shape)

# Create the Softmax node
# Softmax takes one input: X
softmax_node = helper.make_node(
    'Softmax',
    inputs=['X'],
    outputs=['Y'],
    name='SoftmaxNode',
    axis=-1 # Default axis is -1, usually applied to the last dimension
)

# Create the graph
graph_def = helper.make_graph(
    [softmax_node],
    'test-model',
    [input_info],
    [output_info]
)

# Create the model
model_def = helper.make_model(graph_def, producer_name='onnx-example')
opset = model_def.opset_import[0]
opset.version = 13 # Ensure opset version supports the operations

# 2. Convert model to string (bytes)
model_str = model_def.SerializeToString()

# 3. Prepare input data
np.random.seed(0)
input_data = np.array(
[[-3.40282346638528e+38, -3.40282346638528e+38]]
# [[-3.4028234663852886e+38, -3.4028234663852886e+38]]
).astype(np.float32)
print(input_data.tolist())

# 4. Run on CPUExecutionProvider
sess_cpu = ort.InferenceSession(model_str, providers=['CPUExecutionProvider'])
res_cpu = sess_cpu.run(['Y'], {'X': input_data})[0]
print("CPU Result:", res_cpu)

# 5. Run on WebGpuExecutionProvider
sess_webgpu = ort.InferenceSession(model_str, providers=['WebGpuExecutionProvider'])
res_webgpu = sess_webgpu.run(['Y'], {'X': input_data})[0]
print("WebGPU Result:", res_webgpu)

# Compare results
diff = np.abs(res_cpu - res_webgpu)
max_diff = diff.max().item()
print(diff)
print(f"Max diff: {max_diff}")
assert max_diff < 1e-5, f"Results do not match within tolerance! Max diff: {max_diff}"
print("Results match!")
```

Before:
```
[[-3.4028234663852886e+38, -3.4028234663852886e+38]]
CPU Result: [[0.5 0.5]]
WebGPU Result: [[0. 0.]]
[[0.5 0.5]]
Max diff: 0.5
AssertionError: Results do not match within tolerance! Max diff: 0.5
```

After:
```
[[-3.4028234663852886e+38, -3.4028234663852886e+38]]
CPU Result: [[0.5 0.5]]
WebGPU Result: [[0.5 0.5]]
[[0. 0.]]
Max diff: 0.0
Results match!
```

cc @guschmue

* [TRT/TRT RTX EP] Fix bug for missing outputs in the returning ComputeCapability/IndexedSubGraph (microsoft#26444)

### Description
For TRT EP's `GetCapability()`, in some case, the `GetSubGraph()` won't
add graph's output to the `ComputeCapability/IndexedSubGraph` returning
to ORT.

The issue if from following code:
````c++
...
if (node->GetOutputEdgesCount() > node->OutputDefs().size()) {
 ... // execute here
} else {
  ...
          if (graph_output_names.find(output->Name()) != graph_output_names.end()) {
            graph_outputs_to_add[output] = output_order; // missing this
          }
}
````

Update TRT RTX EP as well.

### Motivation and Context
microsoft#25373

* [ROCM] Remove docker, contrib ops, ci scripts related to ROCM EP (microsoft#26697)

### Description

This is follow up of microsoft#25181
to remove ROCM EP related files to avoid confusion.

Documents will be updated later.

### Motivation and Context

microsoft#26692

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Christian Bourjau <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: fs-eire <[email protected]>
Co-authored-by: Wenqin Yang <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: xieofxie <[email protected]>
Co-authored-by: hualxie <[email protected]>
Co-authored-by: Jiajia Qin <[email protected]>
Co-authored-by: qti-hungjuiw <[email protected]>
Co-authored-by: Joshua Lochner <[email protected]>
Co-authored-by: Christian Bourjau <[email protected]>
Co-authored-by: Xiaofei Han <[email protected]>
Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: chunghow-qti <[email protected]>
Co-authored-by: Guenther Schmuelling <[email protected]>
Co-authored-by: Jiawei Shao <[email protected]>
Co-authored-by: Chi Lo <[email protected]>
Co-authored-by: Tianlei Wu <[email protected]>
Revert "Sync with Microsoft ONNX Runtime - 03/12/2025 (intel#867)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.