Update win-ort-main to tip main 250123 #23473

ashrit-ms · 2025-01-23T17:08:09Z

Description

This PR is to update the win-ort-main branch to the tip main branch as of 2025-01-23.

PR List

ddf0d37 [QNN EP] Add LoggingManager::HasDefaultLogger() to provider bridge API (#23467)
05fbbdf [QNN EP] Make QNN EP a shared library (#23120)
1336566 Add custom vcpkg ports (#23456)
2e1173c Update the compile flags for vcpkg packages (#23455)
1f628a9 [Mobile] Add BrowserStack Android MAUI Test (#23383)
009cae0 [js/webgpu] Optimize ConvTranspose (Continue) (#23429)
04a4a69 Use onnx_protobuf.h to suppress some GCC warnings (#23453)
2e3b62b Suppress some strict-aliasing related warnings in WebGPU EP (#23454)
b708f9b Bump ruff from 0.9.1 to 0.9.2 (#23427)
c0afc66 [WebNN] Remove workarounds for TFLite backend (#23406)
8a821ff Bump vite from 6.0.7 to 6.0.11 in /js/web/test/e2e/exports/testcases/vite-default (#23446)
220c1a2 Make ORT and Dawn use the same protobuf/abseil source code (#23447)
b7b5792 Change MacOS-13 to ubuntu on for android-java-api-aar-test.yml. (#23444)
19d0d2a WIP: Dp4MatMulNBits accuracy level 4 matmul for WebGPU EP (#23365)
95b8eff [QNN EP]: Clean up QNN logging resources if an error occurs during initialization (#23435)
626134c Bump clang-format from 19.1.6 to 19.1.7 (#23428)
0cf9753 Fix eigen external deps (#23439)
f9440ae Moving RN_CI Android Testing to Linux (#23422)
1aa5902 [QNN EP] workaround for QNN validation bug for Tanh with uint16 quantized output (#23432)
7f5582a Seperate RN andriod and IOS into 2 separated Stages. (#23400)
73deac2 Implement some missing element wise Add/Sub/Mul/Div/Neg operations for CPU and CUDA EPs (#23090)
949fe42 Upgrade Java version from react-native/android to Java 17 (#23066)
0892c23 Update Qnn SDK default version to 2.30 (#23411)
94c099b Fix type cast build error (#23423)
d633e57 [WebNN EP] Fix AddInitializersToSkip issues (#23354)
e988ef0 [QNN EP] Fix regression for MatMul with two quantized/dynamic uint16 inputs (#23419)
7538795 Update onnxruntime binary size checks ci pipeline's docker image (#23405)
6c5ea41 Revert "[QNN EP] Clean up correctly from a partial setup (#23320)" (#23420)
e866804 Enable comprehension simplification in ruff rules (#23414)
0a5f1f3 bugfix: string_view of invalid memory (#23417)
4cc38e0 fix crash when first input of BatchNormalization is 1-D (#23387)
0334414 Target py310 and modernize codebase with ruff (#23401)
87341ac [QNN EP] Fix segfault when unregistering HTP shared memory handles (#23402)

Motivation and Context

This update includes the change to make QNN-EP a shared library.

…23402) ### Description - Fixes segfault when the function that cleans up HTP memory handles uses an invalid Logger. - Fixes unit test that compares output from QNN EP with exact float values. QNN HTP runs float32 models with float16 precision, so need to use a tolerance in the comparison. ### Motivation and Context Fixes issues with using QNN HTP memory sharing on Windows ARM64. This is also needed to test HTP shared memory with #23120.

Change `target-version = "py310"` and modernize the code base with ruff.

### Description fix crash when first input of BatchNormalization is 1-D

### Description the `std::unordered_map` uses a `std::string_view` as key, while the string view may refer to invalid memory. Function `IdentityBuilder` returns a `std::string` which goes out of scope quickly. ```c++ unordered_map<string_view, std::vector<NodeIndex>> identical_children_map; for (auto i = node->OutputEdgesBegin(); i != node->OutputEdgesEnd(); ++i) { if (i->GetNode().OpType() == op) { identical_children_map[IdentityBuilder(graph, i->GetNode())].push_back(i->GetNode().Index()); } } ``` This code will cause a waring as error in EMSDK v4.0.1: ``` C:/code/o2/onnxruntime/core/optimizer/identical_children_consolidation.cc:51:30: error: object whose reference is captured by 'identical_children_map' will be destroyed at the end of the full-expression [-Werror,-Wdangling-capture] 51 | identical_children_map[IdentityBuilder(graph, i->GetNode())].push_back(i->GetNode().Index()); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. ```

Enable comprehension simplification rules (C4) for ruff and apply autofix.

…23420) ### Description  This reverts commit 5d215ff. ### Motivation and Context  The reverted change causes a packaging pipeline to fail due to a crash in one of the E2E Android tests. Reverting this first to fix the pipeline. We should come up with an alternative way to properly do the necessary clean up.

) 1. Update onnxruntime binary size checks ci pipeline's docker image. Use a different docker image that is not manylinux based. The new one is smaller. 2. Add flatbuffers tools/ci_build/requirements/pybind/requirements.txt 3. Delete tools/ci_build/github/azure-pipelines/py-package-build-pipeline.yml. The pipeline was for generating packages for Olive, but it went unused. And the content is highly duplicated with our official python packaging pipeline. 4. A lot of YAML files reference pypa/manylinux git repo but do not use it. This PR removes the references.

…inputs (#23419) ### Description - Fixes regression for MatMul with two quantized/dynamic uint16 inputs. We need to convert input[1] to uint8 to pass QNN validation. - Separates translation of `ONNX MatMul -> QNN MatMul` and `ONNX MatMul -> QNN FullyConnected` to separate functions to make the code more readable. ### Motivation and Context The following PR updated the handling of MatMul. The logic to handle MatMul with two non-const uint16 inputs was not ported from [simple_op_builder.cc](https://github.com/microsoft/onnxruntime/blob/c64fa18834f0651b7d62507a34d802874b099c29/onnxruntime/core/providers/qnn/builder/opbuilder/simple_op_builder.cc#L107) to the new [matmul_op_builder.cc](https://github.com/microsoft/onnxruntime/blob/c64fa18834f0651b7d62507a34d802874b099c29/onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc#L57). #22639

### Description  When the onnx model reuses initializers in more than one ops, if one of the ops wants to add this initializer to the skipped list, but the other ops still need this initializer, it will cause the process to crash. Therefore, like other EPs, we count `initializer_usage_`, the number of occurrences of each initializer in all ops and modify the `AddInitializersToSkip` to minus the corresponding initializers' statistic one when adding the specific operators. ### Motivation and Context

### Description - Fix a type cast in #23363. - Include some headers which are suggested by code scanning in that PR. ### Motivation and Context PostMerge has build error: ``` onnxruntime\core\framework\print_tensor_statistics_utils.h(92,55): error C2220: the following warning is treated as an error [D:\a\_work\1\b\Debug\onnxruntime_framework.vcxproj] ```

### Description Update Qnn SDK default version to 2.30

### Description Upgrade Java version from react-native/android to Java 17. This PR does not update the e2e Java 17 version

…r CPU and CUDA EPs (#23090) * [CPU EP] Implement Add/Sub/Mul/Div element wise operations for (u)int8, (u)int16, uint32 and uint64. * [CPU EP] Implement Neg unary operation for int16 * [CUDA EP] Implement Add/Sub/Mul/Div element wise operations for (u)int8 and (u)int16 ### Motivation and Context This solves #23051

### Description Seperate RN andriod and IOS into 2 separated Stages. ### Motivation and Context Speed up the PR process.

…ized output (#23432) ### Description - Skip QNN validation for Tanh with uint16 quantized output (workaround for QNN validation bug). - Re-enables unit test for Tanh with uint16 quantized output. The [QNN documentation](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/HtpOpDefSupplement.html#tanh) states that the output scale and offset for `ufixed_point_16` should be (1/32768) and -32768, respectively. However, the QNN validator incorrectly rejects these values. So, we skip validation for this configuration of Tanh. Building an actual QNN graph with the correct scale/offset still works. ### Motivation and Context This QNN validation bug appeared in QNN SDK 2.28.0 and is still present in QNN SDK 2.30.0. A previous PR disabled the corresponding unit test: https://github.com/microsoft/onnxruntime/pull/22724/files#diff-57f590c6c548b073ba8cd8af6cf198799906f7059ea46b31cd33972ea9b01983R232

### Description Moving Android E2E test steps from Mac-OS13 to unbunt22.04 ### Motivation and Context Deduced the dependency on MacOS, which is deprecating the x64 version.

@lixing-star

### Description  I think we should not use the eigen in the system directly, but should first use the eigen specified in deps.txt. in ubuntu22.04, ORT fails to compile when I install libeigen3-dev (which ROS2 humble depends on). The error message is below: ``` [ 62%] Built target onnxruntime_lora [ 62%] Building CXX object CMakeFiles/onnxruntime_session.dir/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/session/IOBinding.cc.o /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc: In member function ‘onnxruntime::common::Status onnxruntime::Min_6<T>::Compute(onnxruntime::OpKernelContext*) const [with T = float]’: /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:750:56: error: no matching function for call to ‘Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<float, -1, 1>, 0, Eigen::Stride<0, 0> > >::min<Eigen::PropagateNaN>(Eigen::ArrayWrapper<Eigen::Map<const Eigen::Matrix<float, -1, 1>, 0, Eigen::Stride<0, 0> > >)’ 750 | min = min.array().template min<Eigen::PropagateNaN>(EigenMap<float>(data_n).array()); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/include/eigen3/Eigen/Core:19, from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/util/math_cpuonly.h:68, from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:10, from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:4: /usr/include/eigen3/Eigen/src/Core/../plugins/ArrayCwiseBinaryOps.h:33:28: note: candidate: ‘template<class OtherDerived> const Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::traits<OtherDerived>::Scalar>, const Derived, const OtherDerived> Eigen::ArrayBase<Derived>::min(const Eigen::ArrayBase<OtherDerived>&) const [with OtherDerived = OtherDerived; Derived = Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<float, -1, 1>, 0, Eigen::Stride<0, 0> > >]’ 33 | EIGEN_MAKE_CWISE_BINARY_OP(min,min) | ^~~ /usr/include/eigen3/Eigen/src/Core/util/Macros.h:1339:4: note: in definition of macro ‘EIGEN_MAKE_CWISE_BINARY_OP’ 1339 | (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const \ | ^~~~~~ /usr/include/eigen3/Eigen/src/Core/../plugins/ArrayCwiseBinaryOps.h:33:28: note: template argument deduction/substitution failed: 33 | EIGEN_MAKE_CWISE_BINARY_OP(min,min) | ^~~ /usr/include/eigen3/Eigen/src/Core/util/Macros.h:1339:4: note: in definition of macro ‘EIGEN_MAKE_CWISE_BINARY_OP’ 1339 | (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const \ | ^~~~~~ /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:750:56: error: type/value mismatch at argument 1 in template parameter list for ‘template<class OtherDerived> const Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::traits<OtherDerived>::Scalar>, const Derived, const OtherDerived> Eigen::ArrayBase<Derived>::min(const Eigen::ArrayBase<OtherDerived>&) const [with OtherDerived = OtherDerived; Derived = Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<float, -1, 1>, 0, Eigen::Stride<0, 0> > >]’ 750 | min = min.array().template min<Eigen::PropagateNaN>(EigenMap<float>(data_n).array()); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:750:56: note: expected a type, got ‘Eigen::PropagateNaN’ /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc: In lambda function: /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:802:77: error: no matching function for call to ‘Eigen::Map<const Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >::min<Eigen::PropagateNaN>(Eigen::half)’ 802 | output_vec_map = input_1_vec_map.template min<Eigen::PropagateNaN>( | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ 803 | static_cast<Eigen::half>(per_iter_bh.ScalarInput0<MLFloat16>())); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/include/eigen3/Eigen/Core:19, from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/util/math_cpuonly.h:68, from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:10, from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:4: /usr/include/eigen3/Eigen/src/Core/../plugins/ArrayCwiseBinaryOps.h:33:28: note: candidate: ‘template<class OtherDerived> const Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::traits<OtherDerived>::Scalar>, const Derived, const OtherDerived> Eigen::ArrayBase<Derived>::min(const Eigen::ArrayBase<OtherDerived>&) const [with OtherDerived = OtherDerived; Derived = Eigen::Map<const Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >]’ 33 | EIGEN_MAKE_CWISE_BINARY_OP(min,min) | ^~~ /usr/include/eigen3/Eigen/src/Core/util/Macros.h:1339:4: note: in definition of macro ‘EIGEN_MAKE_CWISE_BINARY_OP’ 1339 | (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const \ | ^~~~~~ /usr/include/eigen3/Eigen/src/Core/../plugins/ArrayCwiseBinaryOps.h:33:28: note: template argument deduction/substitution failed: 33 | EIGEN_MAKE_CWISE_BINARY_OP(min,min) | ^~~ /usr/include/eigen3/Eigen/src/Core/util/Macros.h:1339:4: note: in definition of macro ‘EIGEN_MAKE_CWISE_BINARY_OP’ 1339 | (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const \ | ^~~~~~ /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:802:77: error: type/value mismatch at argument 1 in template parameter list for ‘template<class OtherDerived> const Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::traits<OtherDerived>::Scalar>, const Derived, const OtherDerived> Eigen::ArrayBase<Derived>::min(const Eigen::ArrayBase<OtherDerived>&) const [with OtherDerived = OtherDerived; Derived = Eigen::Map<const Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >]’ 802 | output_vec_map = input_1_vec_map.template min<Eigen::PropagateNaN>( | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ 803 | static_cast<Eigen::half>(per_iter_bh.ScalarInput0<MLFloat16>())); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:802:77: note: expected a type, got ‘Eigen::PropagateNaN’ /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:805:77: error: no matching function for call to ‘Eigen::Map<const Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >::max<Eigen::PropagateNaN>(Eigen::half)’ 805 | output_vec_map = input_1_vec_map.template max<Eigen::PropagateNaN>( | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ 806 | static_cast<Eigen::half>(per_iter_bh.ScalarInput0<MLFloat16>())); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` ### Motivation and Context  Fix #23407 @lixing-star

Bumps [clang-format](https://github.com/ssciwr/clang-format-wheel) from 19.1.6 to 19.1.7. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/ssciwr/clang-format-wheel/commit/f865928dd2b4510e0de77b5765f334b3d1082f1d"><code>f865928</code></a> Bump to v19.1.7</li> <li>See full diff in <a href="https://github.com/ssciwr/clang-format-wheel/compare/v19.1.6...v19.1.7">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=clang-format&package-manager=pip&previous-version=19.1.6&new-version=19.1.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…itialization (#23435) ### Description Re-implementation of #23320 (which was reverted). - Cleans up QNN logging resources if an error occurs during initialization. - Updates `QnnLogging()`, which is a logging callback called by QNN libs, to handle situations in which ORT logging is unavailable, thus avoiding a segmentation fault. - Updates `QnnBackendManager::CreateHtpPowerCfgId()` and `QnnBackendManager::SetHtpPowerConfig()` to check that backend setup is complete. These functions get called in QNN EP's `OnRunStart()` even if QNN backend setup failed and the model is assigned to a different EP. This prevents a segmentation fault. Our Android tests ran into this issue because the QNN backend setup failed, the model was then assigned to CPU EP, and the QNN EP's `OnRunStart()` was still called with an invalid backend. ### Motivation and Context If QNN initialization fails at any point, we have to properly clean up the logging resources so that QNN does not call our `QnnLogging()` callback after the EP has been destroyed.

@qjia7

### Description This change implements accuracy level 4 - quantize A to int8 matmul for the WebGPU EP. The matmul kernel here uses DP4A for matrix multiplication, in order to keep the DP4A fed co-operative matrix multiplication is implemented which preloads the row/col into local variables before the multiplication operation. Credits to @qjia7 for help with the quantizer shader. Performance metrics on intel ADL/TGL GPU. ``` PS C:\onnxruntime> C:\model_benchmark\model_benchmark.exe -i C:\Phi-3.5-mini-instruct-onnx-web\Phi-3.5-mini-instruct-onnx-web -l 500 Batch size: 1, prompt tokens: 501, tokens to generate: 128 Prompt processing (time to first token): avg (us): 2.76762e+06 **avg (tokens/s): 181.022** <<< Prefill speed p50 (us): 2.74843e+06 stddev (us): 41756.4 n: 5 * 501 token(s) Token generation: avg (us): 81500.7 avg (tokens/s): 12.2698 p50 (us): 81104.1 stddev (us): 2961.31 n: 635 * 1 token(s) Token sampling: avg (us): 13.1836 avg (tokens/s): 75851.9 p50 (us): 12 stddev (us): 6.47085 n: 640 * 1 token(s) E2E generation (entire generation loop): avg (ms): 13120 p50 (ms): 13081.6 stddev (ms): 114.689 n: 5 Peak working set size (bytes): 5467533312 WebGPU device lost (2): Device was destroyed. ``` This kernel is 2.10x faster than its F16 counterpart for a 500 token prefill. Previous prefill record is 86tks/s. In order to support devices with subgroup size 8/32, a no subgroup version of the same shader is included. Performance is slower than the subgroup version on ADL. ``` PS C:\onnxruntime> C:\model_benchmark\model_benchmark.exe -i C:\Phi-3.5-mini-instruct-onnx-web\Phi-3.5-mini-instruct-onnx-web -l 500 Batch size: 1, prompt tokens: 501, tokens to generate: 128 Prompt processing (time to first token): avg (us): 4.11989e+06 avg (tokens/s): 121.605 p50 (us): 4.11847e+06 stddev (us): 2147.48 n: 5 * 501 token(s) Token generation: avg (us): 81174.9 avg (tokens/s): 12.3191 p50 (us): 81301.1 stddev (us): 2177.2 n: 635 * 1 token(s) Token sampling: avg (us): 14.7998 avg (tokens/s): 67568.3 p50 (us): 12.3 stddev (us): 11.5481 n: 640 * 1 token(s) E2E generation (entire generation loop): avg (ms): 14431.1 p50 (ms): 14433.8 stddev (ms): 5.02473 n: 5 Peak working set size (bytes): 5466480640 WebGPU device lost (2): Device was destroyed. ```

### Description  ### Motivation and Context

### Description Make ORT and Dawn use the same protobuf/abseil source code

The WebNN CPU device type may now target different backends, such as CoreML. Legacy special workarounds for the TFLite backend should be removed and allowed to fail as is, as these are implementation issues. Additionally, the WebNN EP should adhere to the WebNN API conformance. We assume all the WebNN ops should be supported, so remove the WebNN op support status for different device types in webnn-operators.md as well.

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.9.1 to 0.9.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/releases">ruff's releases</a>.</em></p> <blockquote> <h2>0.9.2</h2> <h2>Release Notes</h2> <h3>Preview features</h3> <ul> <li>[<code>airflow</code>] Fix typo "security_managr" to "security_manager" (<code>AIR303</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15463">#15463</a>)</li> <li>[<code>airflow</code>] extend and fix AIR302 rules (<a href="https://redirect.github.com/astral-sh/ruff/pull/15525">#15525</a>)</li> <li>[<code>fastapi</code>] Handle parameters with <code>Depends</code> correctly (<code>FAST003</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15364">#15364</a>)</li> <li>[<code>flake8-pytest-style</code>] Implement pytest.warns diagnostics (<code>PT029</code>, <code>PT030</code>, <code>PT031</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15444">#15444</a>)</li> <li>[<code>flake8-pytest-style</code>] Test function parameters with default arguments (<code>PT028</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15449">#15449</a>)</li> <li>[<code>flake8-type-checking</code>] Avoid false positives for <code>|</code> in <code>TC008</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/15201">#15201</a>)</li> </ul> <h3>Rule changes</h3> <ul> <li>[<code>flake8-todos</code>] Allow VSCode GitHub PR extension style links in <code>missing-todo-link</code> (<code>TD003</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15519">#15519</a>)</li> <li>[<code>pyflakes</code>] Show syntax error message for <code>F722</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/15523">#15523</a>)</li> </ul> <h3>Formatter</h3> <ul> <li>Fix curly bracket spacing around f-string expressions containing curly braces (<a href="https://redirect.github.com/astral-sh/ruff/pull/15471">#15471</a>)</li> <li>Fix joining of f-strings with different quotes when using quote style <code>Preserve</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/15524">#15524</a>)</li> </ul> <h3>Server</h3> <ul> <li>Avoid indexing the same workspace multiple times (<a href="https://redirect.github.com/astral-sh/ruff/pull/15495">#15495</a>)</li> <li>Display context for <code>ruff.configuration</code> errors (<a href="https://redirect.github.com/astral-sh/ruff/pull/15452">#15452</a>)</li> </ul> <h3>Configuration</h3> <ul> <li>Remove <code>flatten</code> to improve deserialization error messages (<a href="https://redirect.github.com/astral-sh/ruff/pull/15414">#15414</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>Parse triple-quoted string annotations as if parenthesized (<a href="https://redirect.github.com/astral-sh/ruff/pull/15387">#15387</a>)</li> <li>[<code>fastapi</code>] Update <code>Annotated</code> fixes (<code>FAST002</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15462">#15462</a>)</li> <li>[<code>flake8-bandit</code>] Check for <code>builtins</code> instead of <code>builtin</code> (<code>S102</code>, <code>PTH123</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15443">#15443</a>)</li> <li>[<code>flake8-pathlib</code>] Fix <code>--select</code> for <code>os-path-dirname</code> (<code>PTH120</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15446">#15446</a>)</li> <li>[<code>ruff</code>] Fix false positive on global keyword (<code>RUF052</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15235">#15235</a>)</li> </ul> <h2>Contributors</h2> <ul> <li><a href="https://github.com/AlexWaygood"><code>@AlexWaygood</code></a></li> <li><a href="https://github.com/BurntSushi"><code>@BurntSushi</code></a></li> <li><a href="https://github.com/Daverball"><code>@Daverball</code></a></li> <li><a href="https://github.com/Garrett-R"><code>@Garrett-R</code></a></li> <li><a href="https://github.com/Glyphack"><code>@Glyphack</code></a></li> <li><a href="https://github.com/InSyncWithFoo"><code>@InSyncWithFoo</code></a></li> <li><a href="https://github.com/Lee-W"><code>@Lee-W</code></a></li> <li><a href="https://github.com/MichaReiser"><code>@MichaReiser</code></a></li> <li><a href="https://github.com/cake-monotone"><code>@cake-monotone</code></a></li> </ul>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's changelog</a>.</em></p> <blockquote> <h2>0.9.2</h2> <h3>Preview features</h3> <ul> <li>[<code>airflow</code>] Fix typo "security_managr" to "security_manager" (<code>AIR303</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15463">#15463</a>)</li> <li>[<code>airflow</code>] extend and fix AIR302 rules (<a href="https://redirect.github.com/astral-sh/ruff/pull/15525">#15525</a>)</li> <li>[<code>fastapi</code>] Handle parameters with <code>Depends</code> correctly (<code>FAST003</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15364">#15364</a>)</li> <li>[<code>flake8-pytest-style</code>] Implement pytest.warns diagnostics (<code>PT029</code>, <code>PT030</code>, <code>PT031</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15444">#15444</a>)</li> <li>[<code>flake8-pytest-style</code>] Test function parameters with default arguments (<code>PT028</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15449">#15449</a>)</li> <li>[<code>flake8-type-checking</code>] Avoid false positives for <code>|</code> in <code>TC008</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/15201">#15201</a>)</li> </ul> <h3>Rule changes</h3> <ul> <li>[<code>flake8-todos</code>] Allow VSCode GitHub PR extension style links in <code>missing-todo-link</code> (<code>TD003</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15519">#15519</a>)</li> <li>[<code>pyflakes</code>] Show syntax error message for <code>F722</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/15523">#15523</a>)</li> </ul> <h3>Formatter</h3> <ul> <li>Fix curly bracket spacing around f-string expressions containing curly braces (<a href="https://redirect.github.com/astral-sh/ruff/pull/15471">#15471</a>)</li> <li>Fix joining of f-strings with different quotes when using quote style <code>Preserve</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/15524">#15524</a>)</li> </ul> <h3>Server</h3> <ul> <li>Avoid indexing the same workspace multiple times (<a href="https://redirect.github.com/astral-sh/ruff/pull/15495">#15495</a>)</li> <li>Display context for <code>ruff.configuration</code> errors (<a href="https://redirect.github.com/astral-sh/ruff/pull/15452">#15452</a>)</li> </ul> <h3>Configuration</h3> <ul> <li>Remove <code>flatten</code> to improve deserialization error messages (<a href="https://redirect.github.com/astral-sh/ruff/pull/15414">#15414</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>Parse triple-quoted string annotations as if parenthesized (<a href="https://redirect.github.com/astral-sh/ruff/pull/15387">#15387</a>)</li> <li>[<code>fastapi</code>] Update <code>Annotated</code> fixes (<code>FAST002</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15462">#15462</a>)</li> <li>[<code>flake8-bandit</code>] Check for <code>builtins</code> instead of <code>builtin</code> (<code>S102</code>, <code>PTH123</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15443">#15443</a>)</li> <li>[<code>flake8-pathlib</code>] Fix <code>--select</code> for <code>os-path-dirname</code> (<code>PTH120</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15446">#15446</a>)</li> <li>[<code>ruff</code>] Fix false positive on global keyword (<code>RUF052</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15235">#15235</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/astral-sh/ruff/commit/0a393483811e0999578b5655d82e2c03238296f3"><code>0a39348</code></a> Include build binaries</li> <li><a href="https://github.com/astral-sh/ruff/commit/027f8009e557e9b5c21b812c9b70874bf02b590b"><code>027f800</code></a> Comment out non-npm-publish jobs</li> <li><a href="https://github.com/astral-sh/ruff/commit/425870df7666ef1c8e7d033cbc3195f43a54213e"><code>425870d</code></a> Upload npm publish logs when failed</li> <li><a href="https://github.com/astral-sh/ruff/commit/c20255abe4013866173ba0515ad9a3190bdfac51"><code>c20255a</code></a> Bump version to 0.9.2 (<a href="https://redirect.github.com/astral-sh/ruff/issues/15529">#15529</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/420365811f27d597ea33a62270667ce9cee1bb5f"><code>4203658</code></a> Fix joining of f-strings with different quotes when using quote style `Preser...</li> <li><a href="https://github.com/astral-sh/ruff/commit/fc9dd63d64ebc18cdca2e9648264704da43b902e"><code>fc9dd63</code></a> [airflow] extend and fix AIR302 rules (<a href="https://redirect.github.com/astral-sh/ruff/issues/15525">#15525</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/79e52c7fdf90597d933aea771a9cde0ad510bba6"><code>79e52c7</code></a> [<code>pyflakes</code>] Show syntax error message for <code>F722</code> (<a href="https://redirect.github.com/astral-sh/ruff/issues/15523">#15523</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/cf4ab7cba16b25f42d9d6b2464e22eb57df0fa8c"><code>cf4ab7c</code></a> Parse triple quoted string annotations as if parenthesized (<a href="https://redirect.github.com/astral-sh/ruff/issues/15387">#15387</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/d2656e88a3c17ca3351cd5069642253ac22490f5"><code>d2656e8</code></a> [<code>flake8-todos</code>] Allow VSCode GitHub PR extension style links in `missing-tod...</li> <li><a href="https://github.com/astral-sh/ruff/commit/c53ee608a1df4e471f0089e4f5d2881291e085be"><code>c53ee60</code></a> Typeshed-sync workflow: add appropriate labels, link directly to failing run ...</li> <li>Additional commits viewable in <a href="https://github.com/astral-sh/ruff/compare/0.9.1...0.9.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.9.1&new-version=0.9.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

### Description Suppress some strict-aliasing related warnings in WebGPU EP For example: ``` /home/chasun/src/onnxruntime/onnxruntime/core/providers/webgpu/math/unary_elementwise_ops.cc:208:30: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] 208 | float encoded_value = *reinterpret_cast<const float*>(attr); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` This PR does not really fix the problems. It just suppresses the warnings to make build pass. Some issues related to strict aliasing may be fixed by using std::bit_cast, which requires c++20 however. ### Motivation and Context Build the code on Azure Linux 3 fails. To reproduce the issue, you may get an AzureLinux3 machine and run: ``` python3 tools/ci_build/build.py --update --build --build_wheel --use_xnnpack --build_nodejs --use_webgpu --build_dir b --skip_submodule_sync --parallel --use_binskim_compliant_compile_flags --build_shared_lib --config Release ```

### Description Use onnx_protobuf.h to suppress some GCC warnings. All the changes are autogenerated by a shell command. ```bash find . -type f -exec sed -i 's/#include\s\+<onnx\/onnx_pb.h>/#include "core\/graph\/onnx_protobuf.h"/g' {} \; ``` ### Motivation and Context This PR is needed for making vcpkg work(without disabling all warnings) This PR is split from another bigger PR per request from a reviewer.

@jiangzhaoming

BUG #23273 This PR does below optimizations: 1. When output channels is one, 1) calculate the offset before the inchannel loop to reduce indices to offsets calculation, 2) split the `inputChannelsPerGroup` into `inputChannelsPerGroupInt` and `inputChannelsRemainder` parts so that we can always access 4 data for `inputChannelsPerGroupInt`. 2. Use precise initial value to reduce useless loop iterations. Thanks @jiangzhaoming 's suggestion's on this. With this PR, ConvTranspose becomes 3.7s from 8.4s on Intel Meteor Lake. On NV RTX 2000 Ada, it becomes 1.6s from 2.7s.

### Description Add test project that will perform an automated UI test that runs the unit tests on Android. ### Motivation - Enables end-to-end on-device MAUI unit testing which we want to add to the packaging pipelines ### Context Microsoft.ML.OnnxRuntime.Tests.MAUI uses DeviceRunners.VisualRunners to allow running the unit tests (found in Microsoft.ML.OnnxRuntime.Tests.Common) across multiple devices. DeviceRunners.VisualRunners provides a simple UI with a button that will run the unit tests and a panel with the unit test results. In order to automate the process of running the unit tests across mobile devices, Appium is used for UI testing orchestration (it provides a way to interact with the UI), and BrowserStack automatically runs these Appium tests across different mobile devices. This project does not include the capability to start an Appium server locally or attach to a local emulator or device. ## Build & run instructions ### Requirements * A BrowserStack account with access to App Automate * You can set BrowserStack credentials as environment variables as shown [here](https://www.browserstack.com/docs/app-automate/appium/getting-started/c-sharp/nunit/integrate-your-tests#CLI) * ONNXRuntime NuGet package 1. You can either download the [stable NuGet package](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime) then follow the instructions from [NativeLibraryInclude.props file](../Microsoft.ML.OnnxRuntime.Tests.Common/NativeLibraryInclude.props) to use the downloaded .nupkg file 2. Or follow the [build instructions](https://onnxruntime.ai/docs/build/android.html) to build the Android package locally * The dotnet workloads for maui and maui-android, which will not always automatically install correctly 1. `dotnet workload install maui` 2. `dotnet workload install maui-android` * [Appium](https://appium.io/docs/en/latest/quickstart/) and the [UiAutomator2 driver](https://appium.io/docs/en/latest/quickstart/uiauto2-driver/) ### Run instructions 1. Build the Microsoft.ML.OnnxRuntime.Tests.MAUI project into a signed APK. 1. Run the following: `dotnet publish -c Release -f net8.0-android` in the Microsoft.ML.OnnxRuntime.Tests.MAUI directory. 2. Search for the APK files generated. They should be located in `bin\Release\net8.0-android\publish`. 3. If they're in a different location, edit the `browserstack.yml` file to target the path to the signed APK. 2. Ensure you've set the BrowserStack credentials as environment variables. 3. Run the following in the Microsoft.ML.OnnxRuntime.Tests.Android.BrowserStack directory: `dotnet test` 4. Navigate to the [BrowserStack App Automate dashboard](https://app-automate.browserstack.com/dashboard/v2/builds) to see your test running!

### Description This PR updates the triplets files that manage the compile flags for vcpkg packages. All the changes are autogenerated except for the gen.py file in this PR. Main changes: 1. Enable debug info for all Linux build config(Release and Debug) 2. Set CMAKE_CXX_STANDARD in each triplet. The value is set to 20 for macOS targets and 17 for the others. 3. Only set _FORTIFY_SOURCE in release build. This is to address a build issue on some platforms with the following glibc change: "Warn if user requests __FORTIFY_SOURCE but it is disabled" https://sourceware.org/git/?p=glibc.git;a=commit;f=include/features.h;h=05c2c9618f583ea4acd69b3fe5ae2a2922dd2ddc ### Motivation and Context Address a Linux build error.

### Description Add custom vcpkg ports for the following packages: 1. cpuinfo 2. onnx 3. pthreadpool 4. xnnpack Because: - The cpuinfo/pthreadpool/xnnpack packages in the official vcpkg repo are too old. - XNNPack's version is updated from 2022-12-22 to 2025-01-17 - CPUINFO's version is updated from 2022-07-19 to 2024-12-09 - Pthreadpool's version is updated from 2020-04-10 to 2024-12-17, and the source code location is changed from https://github.com/Maratyszcza/pthreadpool to https://github.com/google/pthreadpool - The onnx package in the official repo requires building python from source, which then requires a lot of additional dependencies to be installed. This PR removes them. - Added a disable_gcc_warning.patch file for xnnpack for addressing the issue reported in google/XNNPACK#7650. I will remove this patch when the issue is fully addressed. - Added " -DONNX_DISABLE_STATIC_REGISTRATION=ON" to ONNX's config options. -

### Description - Makes QNN EP a shared library **by default** when building with `--use_qnn` or `--use_qnn shared_lib`. Generates the following build artifacts: - **Windows**: `onnxruntime_providers_qnn.dll` and `onnxruntime_providers_shared.dll` - **Linux**: `libonnxruntime_providers_qnn.so` and `libonnxruntime_providers_shared.so` - **Android**: Not supported. Must build QNN EP as a static library. - Allows QNN EP to still be built as a static library with `--use_qnn static_lib`. This is primarily for the Android QNN AAR package. - Unit tests run for both the static and shared QNN EP builds. ### Detailed changes - Updates Java bindings to support both shared and static QNN EP builds. - Provider bridge API: - Adds logging sink ETW to the provider bridge. Allows EPs to register ETW callbacks for ORT logging. - Adds a variety of methods for onnxruntime objects that are needed by QNN EP. - QNN EP: - Adds `ort_api.h` and `ort_api.cc` that encapsulates the API provided by ORT in a manner that allows the EP to be built as either a shared or static library. - Adds custom function to transpose weights for Conv and Gemm (instead of adding util to provider bridge API). - Adds custom function to quantize data for LeakyRelu (instead of adding util to provider bridge API). - Adds custom ETW tracing for QNN profiling events: - shared library: defines its own TraceLogging provider handle - static library: uses ORT's TraceLogging provider handle and existing telemetry provider. - ORT-QNN Packages: - **Python**: Pipelines build QNN EP as a shared library by default. User can build a local python wheel with QNN EP as a static library by passing `--use_qnn static_lib`. - **NuGet**: Pipelines build QNN EP as a shared library by default. `build.py` currently enforces QNN EP to be built as a shared library. Can add support for building a QNN NuGet package with static later if deemed necessary. - **Android**: Pipelines build QNN EP as a **static library**. `build.py` enforces QNN EP to be built as a static library. Packaging multiple shared libraries into an Android AAR package is not currently supported due to the added need to also distribute a shared libcpp.so library. ### Motivation and Context

#23467) ### Description Fixes QNN EP builds due to missing function in provider bridge API: `logging::LoggingManager::HasDefaultLogger()` ### Motivation and Context A [recent PR](#23120) made QNN EP a shared library. A [different PR](#23435) added use of a new function to QNN EP that was not part of the provider bridge API. The CI did not catch it because main was not merged into the first PR before merging.

adrianlizarraga and others added 30 commits January 23, 2025 09:02

Target py310 and modernize codebase with ruff (#23401)

0334414

Change `target-version = "py310"` and modernize the code base with ruff.

fix crash when first input of BatchNormalization is 1-D (#23387)

4cc38e0

### Description fix crash when first input of BatchNormalization is 1-D

Enable comprehension simplification in ruff rules (#23414)

e866804

Enable comprehension simplification rules (C4) for ruff and apply autofix.

Update Qnn SDK default version to 2.30 (#23411)

0892c23

### Description Update Qnn SDK default version to 2.30

Upgrade Java version from react-native/android to Java 17 (#23066)

949fe42

### Description Upgrade Java version from react-native/android to Java 17. This PR does not update the e2e Java 17 version

Seperate RN andriod and IOS into 2 separated Stages. (#23400)

7f5582a

### Description Seperate RN andriod and IOS into 2 separated Stages. ### Motivation and Context Speed up the PR process.

Moving RN_CI Android Testing to Linux (#23422)

f9440ae

### Description Moving Android E2E test steps from Mac-OS13 to unbunt22.04 ### Motivation and Context Deduced the dependency on MacOS, which is deprecating the x64 version.

Change MacOS-13 to ubuntu on for android-java-api-aar-test.yml. (#23444)

b7b5792

### Description  ### Motivation and Context

Make ORT and Dawn use the same protobuf/abseil source code (#23447)

220c1a2

### Description Make ORT and Dawn use the same protobuf/abseil source code

snnn and others added 3 commits January 23, 2025 09:02

ashrit-ms merged commit 4b5b5f7 into win-ort-main Jan 23, 2025
41 of 43 checks passed

ashrit-ms deleted the ashritms/main2win-ort-main-250123 branch January 23, 2025 17:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update win-ort-main to tip main 250123 #23473

Update win-ort-main to tip main 250123 #23473

ashrit-ms commented Jan 23, 2025 •

edited

Loading

Update win-ort-main to tip main 250123 #23473

Update win-ort-main to tip main 250123 #23473

Conversation

ashrit-ms commented Jan 23, 2025 • edited Loading

Description

PR List

Motivation and Context

ashrit-ms commented Jan 23, 2025 •

edited

Loading