fix: add runtime check for AVX512-FP16 support instead of relying on compiler version#28236
fix: add runtime check for AVX512-FP16 support instead of relying on compiler version#28236PriscillaJCorn wants to merge 3 commits intomicrosoft:mainfrom
Conversation
@microsoft-github-policy-service agree |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
No pipelines are associated with this pull request. |
There was a problem hiding this comment.
Pull request overview
This PR improves MLAS AVX512-FP16 feature enablement by replacing compiler-version heuristics with an actual compile-time capability check, so AVX512-FP16 codepaths are only enabled when the toolchain can assemble the required instructions (addressing build failures reported in #22519 and #24025).
Changes:
- Replace
_MSC_VER/__GNUC__version heuristics in MLAS runtime dispatch with a build-defined capability macro (MLAS_SUPPORTS_AVX512FP16). - Add a CMake compile test for AVX512-FP16 instruction support and conditionally include the AVX512-FP16 conversion assembly.
- Minor whitespace cleanup in ARM FP16 kernel selection code.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| onnxruntime/core/mlas/lib/platform.cpp | Switches runtime AVX512-FP16 dispatch gating to a build-time macro instead of compiler-version checks. |
| cmake/onnxruntime_mlas.cmake | Adds a compile-time probe for AVX512-FP16 instruction support and conditionally enables related assembly + a preprocessor define. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #ifndef __APPLE__ | ||
| #if (defined(_MSC_VER) && (_MSC_VER >= 1933)) || (defined(__GNUC__) && (__GNUC__ >= 13)) | ||
| #if defined(MLAS_SUPPORTS_AVX512FP16) | ||
| // | ||
| // Check if the processor supports AVX NE CONVERT. | ||
| // | ||
| if ((Cpuid7_1[3] & (0b1 << 5)) != 0) { | ||
| this->CastF16ToF32Kernel = &MlasCastF16ToF32KernelAvx; | ||
| } | ||
| #endif // (defined(_MSC_VER) && (_MSC_VER >= 1933)) || (defined(__GNUC__) && (__GNUC__ >= 13)) | ||
| #endif // MLAS_SUPPORTS_AVX512FP16 |
There was a problem hiding this comment.
MLAS_SUPPORTS_AVX512FP16 now gates enabling CastF16ToF32Kernel, but that macro is only defined in the non-MSVC CMake path. On Windows/MSVC, setup_mlas_source_for_windows() can still add amd64/cvtfp16Avx.asm (MSVC_VERSION>=1933), yet this block will never run, so AVX512-FP16 support may be silently disabled at runtime even when the kernel is built. Consider defining MLAS_SUPPORTS_AVX512FP16 in the MSVC path when cvtfp16Avx.asm is enabled (or reintroduce an MSVC-specific compile-time guard).
There was a problem hiding this comment.
@copilot apply changes based on this feedback Yes, in func setup_mlas_source_for_windows() , we should define MLAS_SUPPORTS_AVX512FP16 in the MSVC path when cvtfp16Avx.asm is enabled.
There was a problem hiding this comment.
Is this comment addressed ?
|
Thanks for the contribution - can you please check if copilot's review comments are relevant ? |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
No pipelines are associated with this pull request. |
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| list(APPEND mlas_private_compile_definitions MLAS_SUPPORTS_AVX512FP16) | ||
| endif() |
There was a problem hiding this comment.
The probe is hardwired to GCC/Clang-style flags (-mavx512fp16) and GNU inline asm syntax, which will fail under MSVC (and some clang-cl configurations). This can silently disable the FP16 path on Windows even when the toolchain supports it, changing behavior vs the prior _MSC_VER branch. Consider branching on CMAKE_CXX_COMPILER_ID and using a C++-based probe with compiler-appropriate flags (e.g., MSVC /arch: equivalent or a clang-cl compatible approach), so the macro is correctly set across supported toolchains.
| ) | ||
| if(CMAKE_CXX_COMPILER_VERSION GREATER_EQUAL 13.1 AND NOT(APPLE)) | ||
|
|
||
| include(CheckCSourceCompiles) |
There was a problem hiding this comment.
This feature is consumed by C++ sources (platform.cpp), but the probe uses CheckCSourceCompiles. Using CheckCXXSourceCompiles (and the corresponding include) makes the probe match the actual compilation environment (C++ compiler, flags, and frontend), and avoids coupling this logic to the presence/configuration of the C language in the build.
| #if defined(MLAS_SUPPORTS_AVX512FP16) | ||
| // | ||
| // Check if the processor supports AVX NE CONVERT. | ||
| // | ||
| if ((Cpuid7_1[3] & (0b1 << 5)) != 0) { | ||
| this->CastF16ToF32Kernel = &MlasCastF16ToF32KernelAvx; | ||
| } | ||
| #endif // (defined(_MSC_VER) && (_MSC_VER >= 1933)) || (defined(__GNUC__) && (__GNUC__ >= 13)) | ||
| #endif // MLAS_SUPPORTS_AVX512FP16 |
There was a problem hiding this comment.
The compile-time guard name MLAS_SUPPORTS_AVX512FP16 doesn’t match the runtime check/comment (“AVX NE CONVERT”) and the instruction used in the CMake probe (vcvtneeph2ps). Please align terminology so it’s clear what’s actually being required (e.g., rename the macro/probe variable to reflect AVX-NE-CONVERT, or update the comment/probe to match AVX512-FP16 if that’s the intended feature).
Description
It is inappropriate to merely guess whether the AVX512-FP16 instruction is supported based on the compiler's version information, as this can lead to numerous exceptions. Therefore, before starting, I have added a simple and practical check, which involves testing the compiler's support for the instruction through a small case study.
Motivation and Context
Fix #22519 and #24025
The current logic determines AVX512-FP16 support based on compiler version heuristics.
This is not reliable, since compiler version alone does not guarantee that the target
toolchain or hardware fully supports the instruction set.
As a result, this may lead to incorrect feature enablement, causing build failures or
runtime exceptions on certain environments.
This change improves robustness by replacing the heuristic with a compile-time check,
ensuring that AVX512-FP16 is only enabled when it is actually supported.