Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions cmake/onnxruntime_mlas.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -774,11 +774,29 @@ else()
${MLAS_SRC_DIR}/rotary_embedding_kernel_avx2.cpp
${MLAS_SRC_DIR}/rotary_embedding_kernel_avx2.cpp
)
if(CMAKE_CXX_COMPILER_VERSION GREATER_EQUAL 13.1 AND NOT(APPLE))

include(CheckCSourceCompiles)
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature is consumed by C++ sources (platform.cpp), but the probe uses CheckCSourceCompiles. Using CheckCXXSourceCompiles (and the corresponding include) makes the probe match the actual compilation environment (C++ compiler, flags, and frontend), and avoids coupling this logic to the presence/configuration of the C language in the build.

Copilot uses AI. Check for mistakes.

set(MLAS_OLD_CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS}")
if(CMAKE_REQUIRED_FLAGS)
set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} -mavx512fp16")
else()
set(CMAKE_REQUIRED_FLAGS "-mavx512fp16")
endif()
check_c_source_compiles("
int main() {
__asm__ volatile(\"vcvtneeph2ps %ymm0, %ymm1\");
return 0;
}
" COMPILER_SUPPORTS_AVX512FP16)
set(CMAKE_REQUIRED_FLAGS "${MLAS_OLD_CMAKE_REQUIRED_FLAGS}")

if(COMPILER_SUPPORTS_AVX512FP16 AND NOT APPLE)
set(mlas_platform_srcs_avx2
${mlas_platform_srcs_avx2}
${MLAS_SRC_DIR}/x86_64/cvtfp16Avx.S
)
Comment thread
hariharans29 marked this conversation as resolved.
list(APPEND mlas_private_compile_definitions MLAS_SUPPORTS_AVX512FP16)
endif()
Comment on lines +799 to 800
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The probe is hardwired to GCC/Clang-style flags (-mavx512fp16) and GNU inline asm syntax, which will fail under MSVC (and some clang-cl configurations). This can silently disable the FP16 path on Windows even when the toolchain supports it, changing behavior vs the prior _MSC_VER branch. Consider branching on CMAKE_CXX_COMPILER_ID and using a C++-based probe with compiler-appropriate flags (e.g., MSVC /arch: equivalent or a clang-cl compatible approach), so the macro is correctly set across supported toolchains.

Copilot uses AI. Check for mistakes.

message(STATUS "CMAKE_CXX_COMPILER_ID: ${CMAKE_CXX_COMPILER_ID}")
Expand Down Expand Up @@ -997,4 +1015,4 @@ if (NOT onnxruntime_ORT_MINIMAL_BUILD)
endif()
endif()

endif()
endif()
6 changes: 3 additions & 3 deletions onnxruntime/core/mlas/lib/platform.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -527,14 +527,14 @@ Return Value:
}

#ifndef __APPLE__
#if (defined(_MSC_VER) && (_MSC_VER >= 1933)) || (defined(__GNUC__) && (__GNUC__ >= 13))
#if defined(MLAS_SUPPORTS_AVX512FP16)
//
// Check if the processor supports AVX NE CONVERT.
//
if ((Cpuid7_1[3] & (0b1 << 5)) != 0) {
this->CastF16ToF32Kernel = &MlasCastF16ToF32KernelAvx;
}
#endif // (defined(_MSC_VER) && (_MSC_VER >= 1933)) || (defined(__GNUC__) && (__GNUC__ >= 13))
#endif // MLAS_SUPPORTS_AVX512FP16
Comment on lines 529 to +537
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLAS_SUPPORTS_AVX512FP16 now gates enabling CastF16ToF32Kernel, but that macro is only defined in the non-MSVC CMake path. On Windows/MSVC, setup_mlas_source_for_windows() can still add amd64/cvtfp16Avx.asm (MSVC_VERSION>=1933), yet this block will never run, so AVX512-FP16 support may be silently disabled at runtime even when the kernel is built. Consider defining MLAS_SUPPORTS_AVX512FP16 in the MSVC path when cvtfp16Avx.asm is enabled (or reintroduce an MSVC-specific compile-time guard).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback Yes, in func setup_mlas_source_for_windows() , we should define MLAS_SUPPORTS_AVX512FP16 in the MSVC path when cvtfp16Avx.asm is enabled.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment addressed ?

Comment on lines +530 to +537
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compile-time guard name MLAS_SUPPORTS_AVX512FP16 doesn’t match the runtime check/comment (“AVX NE CONVERT”) and the instruction used in the CMake probe (vcvtneeph2ps). Please align terminology so it’s clear what’s actually being required (e.g., rename the macro/probe variable to reflect AVX-NE-CONVERT, or update the comment/probe to match AVX512-FP16 if that’s the intended feature).

Copilot uses AI. Check for mistakes.


//
Expand Down Expand Up @@ -671,7 +671,7 @@ Return Value:
}
else{
this->ErfFP16KernelRoutine = MlasNeonErfFP16Kernel;
this->GeluFP16KernelRoutine = MlasNeonGeluFP16Kernel;
this->GeluFP16KernelRoutine = MlasNeonGeluFP16Kernel;
}
#else
this->ErfFP16KernelRoutine = MlasNeonErfFP16Kernel;
Expand Down
Loading