Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building NumPy from source for Windows on ARM using Clang-cl compiler #28106

Closed
Mugundanmcw opened this issue Jan 6, 2025 · 21 comments · Fixed by #28234
Closed

Building NumPy from source for Windows on ARM using Clang-cl compiler #28106

Mugundanmcw opened this issue Jan 6, 2025 · 21 comments · Fixed by #28234
Labels
component: SIMD Issues in SIMD (fast instruction sets) code or machinery

Comments

@Mugundanmcw
Copy link
Contributor

Hello Developers,

  • I am facing an issue while trying to build NumPy for Windows on ARM (WoA) using the Clang-cl compiler. Building NumPy from source requires C and C++ compilers with proper intrinsic support.
  • Previously, I was able to successfully compile NumPy for WoA using the MSVC-optimized C/C++ CL compiler, enabling CPU baseline features that support ARM.
  • However, I encountered limitations with the MSVC C/C++ CL compiler, as it does not support certain CPU dispatcher features like ASIMDHP, ASIMDFHM, and SVE. Is there any specific reason why these CPU dispatch features are not supported for WoA in MSVC?
  • Meanwhile, I attempted to compile NumPy for WoA using the clang-cl compiler (both from MSVC and LLVM toolchains) to check if the CPU dispatcher features would be enabled. While I found that, apart from SVE, all other test features—including baseline features—were supported, I ran into compilation errors due to unidentified instructions.

Steps to Reproduce

  1. Clone the Source code of NumPy and checkout to latest branch
  2. Install LLVM toolchain/MSVC clang toolset
  3. Remove the clang and clang++ from the bin directory to avoid conflicts
  4. Add the bin path at the top of environment path variable

Compilers used for compilation:
image

Error and Workaround:

  1. While building meson_cpu target, got an error with respect to invalid operand "fstcw" in multiarray_tests_c source file. Upon going through source code, the fstcw is floating-point control instructions for x86 assembly. So I made workaround to make one more condition to check whether it is a ARM64 arch build. Then the build proceeded:
    Screenshot 2025-01-06 123245
    Workaround:
    before:
    image
    After:
    image

Issue:

  1. Currently the build fails at 240+ targets while compiling meson_cpu due to unidentified assembly instructions:
    image

Can anyone give some suggestions to overcome this issue? I need enable CPU dispatch support for NumPy on WoA to get better optimised version of NumPy.

Thanks!

@seberg seberg added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jan 6, 2025
@seberg
Copy link
Member

seberg commented Jan 6, 2025

Ping @Mousius, might be interesting to you (or you have a quick idea).

@Mugundanmcw
Copy link
Contributor Author

@Mousius Any suggestions on this issue?

@DavidSpickett
Copy link

These look like MSVC intrinsics - https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170. We do not have support for all of these in clang-cl at the moment.

Just 2 days ago someone asked about this in fact - llvm/llvm-project#121689.

Without knwoing the numpy source code I can't suggest how to work around this, but the table on the Microsoft page tells you what each one does and for example, one of them produces the https://developer.arm.com/documentation/100069/0606/Data-Transfer-Instructions/LDARB instruction. So in theory you could use alternative APIs to do the same thing.

If anyone needs help figuring out the details of what the instructions do, I can help with that.

As for adding these intrinsics to clang-cl, I'll expand on that in the LLVM issue.

@seberg
Copy link
Member

seberg commented Jan 7, 2025

Ah, sorry, I misdiagnosed thinking it was related to SIMD (the second screenshot is so small... text would be easier).

This seems to be related to the atomics definitions, ping @ngoldbaum, I thought these are borrowed from Python, so it seems a bit surprising.

@DavidSpickett
Copy link

DavidSpickett commented Jan 7, 2025

As a temporary workaround, it might work to edit the source so that the #ifdef STDC_ATOMICS branch is used instead (

#ifdef STDC_ATOMICS
).

I thought these are borrowed from Python

I recall Linaro doing work for Windows on Arm Python but it may not have been using clang-cl.

Edit: It was all done with msvc/Visual Studio not clang-cl.

@Mugundanmcw
Copy link
Contributor Author

Is there any other workarounds I could perform for compiling NumPy on WoA?

@seberg
Copy link
Member

seberg commented Jan 7, 2025

Unless you want to dig into it yourself, please be patient for at least a few days. This will be fixed, but don't expect it to be fixed within hours.

@ngoldbaum
Copy link
Member

I can install Windows in a VM on my ARM Macbook and hopefully reproduce this. Sorry for the trouble...

By the way, what command are you using to build NumPy? IIRC you need to go a little out of your way to build with clang-cl properly.

@ngoldbaum
Copy link
Member

As a temporary workaround, it might work to edit the source so that the #ifdef STDC_ATOMICS branch is used instead

Is there a reason why STDC_ATOMICS isn't defined on the reporter's system?

@ngoldbaum
Copy link
Member

ngoldbaum commented Jan 7, 2025

I just successfully built NumPy after making the patch to _multiarray_tests.c.src suggested by OP. I do not see the same error about missing intrinsics as it seems clang-cl on my system is going into the STDC_ATOMICS branch as I expected it to do originally.

I suspect that there is something subtly wrong about the OP's compilation environment. Here's how I built NumPy, doing all this in a checkout of the NumPy repo:

"[binaries]","c = 'clang-cl'","cpp = 'clang-cl'","ar = 'llvm-lib'","c_ld = 'lld-link'","cpp_ld = 'lld-link'" | Out-File $PWD/clang-cl-build.ini -Encoding ascii
pip install -r requirements/build_requirements.txt
spin build -- --vsenv --native-file=$PWD/clang-cl-build.ini

Or alternatively via pip to actually install the numpy build:

python -m pip install -v . --no-build-isolation -C'setup-args=--vsenv' -C'setup-args=--native-file='$PWD'\clang-cl-build.ini'

I did this following our CI setup on github actions for clang-cl.

@ngoldbaum
Copy link
Member

OP is short for original post, I was referring to the patch you found for the multiarray tests file.

You should be able to build NumPy using one of the commands I shared in my last comment after applying the patch you suggested for the tests file.

At least right now with clang-cl it is not sufficient to build just use spin build, you need to do something a little more involved to insure the clang toolchain is being used.

Please feel free to send in a pull request for the fix you found for the multiarray tests file.

@Mugundanmcw
Copy link
Contributor Author

@ngoldbaum the following is my workflow to build NumPy natively on WoA

  1. Installed LLVM toolchain for WoA 19.1.0 from releases.
  2. Added the LLVM\bin path to the environment variables
  3. Then I applied the patch in the OP to make sure that it was able to compile multiarray_umath_Test with out any errors
  4. Then I created the build configuration file clang_cl_ini.build that you were mentioning in the below command
    "[binaries]","c = 'clang-cl'","cpp = 'clang-cl'","ar = 'llvm-lib'","c_ld = 'lld-link'","cpp_ld = 'lld-link'" | Out-File $PWD/clang-cl-build.ini -Encoding ascii
  5. Before getting starting with the build, I make sure that build uses LLVM's clang-cl rather than msvc build of clang-cl
    image
  6. I proceeded with build by using the following commands:
    pip install -r requirements/build_requirements.txt spin build -- --vsenv --native-file=$PWD/clang-cl-build.ini

But still the error points out to the same issue:
image

As per logic you said the code flow should enter stdatomic but still the definiton fails out to enter it

@ngoldbaum
Copy link
Member

I used MSVC's build of clang-cl. I don't know if it's possible to use clang's. Ping @rgommers who knows more about this than me.

@rgommers
Copy link
Member

rgommers commented Jan 8, 2025

It should be possible in principle; we use clang-cl from the Clang feedstock in conda-forge to build SciPy for example.

I have no knowledge specific to WoA + Clang-cl though.

@ngoldbaum
Copy link
Member

I'm confused why STDC_ATOMIC isn't defined on your setup - it definitely should be on clang 19. I would try to debug why that's happening.

@matthew-brett
Copy link
Contributor

Just to say - I also have a WoA machine, with clang-cl (from the LLVM install), and cl, and I get the same error (after applying the same fix as above to get the compilation to that point).

I think the problem occurs because - at the stage of the error - this is a C++ compile (of ../numpy/_core/src/umath/dispatching.cpp) and, sure enough, __STDC_VERSION__ is not defined for the C++ compile.

Does this relate to the comment in npy_atomic.h: \\ TODO: support C++ atomics as well if this header is ever needed in C++?

@ngoldbaum
Copy link
Member

ngoldbaum commented Jan 27, 2025

Does this relate to the comment in npy_atomic.h: \ TODO: support C++ atomics as well if this header is ever needed in C++?

That makes perfect sense!!

Yes, I suspect making that header properly support C++ would fix this. There must be some hack or subtlety in how the defines are set up on Unix GCC and Clang that lets it find the correct define but clang-cl doesn't pick it up.

@ngoldbaum
Copy link
Member

See #28234

@ngoldbaum
Copy link
Member

I marked that PR as fixing this issue but I didn't actually test that, I'd appreciate it if someone with a WoA system and clang-cl installed outside of MSVC can test.

@matthew-brett
Copy link
Contributor

I'm sure you saw this in my comment on your PR - but yes - with my other PR (version of @Mugundanmcw fix above for FPU instruction) - WoA compile runs to completion with clang-cl.

@Mugundanmcw
Copy link
Contributor Author

Thanks to @matthew-brett and @ngoldbaum for addressing the issue! Now I can able to build NumPy with LLVM clang-cl on a Native WoA device.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants