-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hipamd: SIGSEGV when compiled with -march=znver4 #18
Comments
in _mm512_stream_si512 (which requires 64-aligned allocations, but used to copy default-aligned objects). As it is seemingly difficult to change allocations for copied objects (common objects with ref-counts), the fix just replaces nontemporalMemcpy with normal memcpy, which is already optimized in most versions of C runtime. Closes ROCm#18
I will create an internal PR to fix this. The PARAMETERS_MIN_ALIGNMENT https://github.com/ROCm-Developer-Tools/clr/blob/develop/rocclr/utils/flags.hpp#L55 should be set to the native alignment. |
Hi, unfortunately, setting PARAMETERS_MIN_ALIGNMENT seems to be not enough. The traceback is:
It looks like thrust::host_vector and thrust::device_vector are using std::allocator, then it eventually appears in nontemporalMemcpy. Should I report this to rocThrust repo? Upd: according to https://docs.amd.com/projects/HIP/en/latest/.doxygen/docBin/html/group___memory.html#gab388b755a0ee2c86ca5e0c29391a584c , there are no alignment requirements for hipMemcpyAsync pointers, so I think it should be fixed in clr. |
Thank you @AngryLoki let me raise the issue internally and I will get back shortly. |
Hi @AngryLoki the nontemporal copy that is shown here is irrelevant to the passed pointers in hipMemcpyAsync. I see some offsets are still calculated based on the 16 alignment. https://github.com/ROCm-Developer-Tools/clr/blob/develop/rocclr/platform/kernel.hpp#L160 Could you please try to change these lines to the native alignment and let me know if it fixes the issue. I will followup with a permanent fix in that case. |
Hi @iassiour , sorry for delay, I think I don't fully understand what exactly I need to change in the first link (wrong line number?). I applied this:
it does not crash anymore with this change in nontemporalMemcpy, but it makes one subtest (TestBinarySearchDevice) in one rocThrust test fail. Summary:
Note the large size; test succeeds with smaller vectors and with size=1048453 ( |
By the way, here is another idea to fix this issue once and forever, which I learned recently. I described it in ROCm/rocRAND#403: if you require C++17 in CMakeLists.txt and add |
Thank you for the pointers @AngryLoki I have not managed to reproduce the issue in rocThrust TestBinarySearchDevice yet but the changes you made above in operator new and the constructors in kernel.hpp are correct and in any case are required in order to make the current implementation that uses nonTemmporalCopy copy work with avx/avx512. |
Same happens to me on Gentoo (awell) running the tests from the Orochi project (from gpuopen), using -march=znver2. It segfaults in the memset. I tried building everything with Clang at first, thinking it was an issue with GCC LTO, but when I discovered this issue, I tried disabling the AVX/2 options with: I'll subscribe here and to the Gentoo PR to get notified when it's fixed and I can enable AVX. Thank you @AngryLoki for the deep dive and finding the root cause. |
- add fix-unaligned-memcpy.patch - add exec-stack.patch - add disable-stack-protector patch - drop asan doc from the build system. Closes: https://bugs.gentoo.org/915969 Bug: ROCm/clr#18 Bug: #33400 Bug: ROCm/clr#22 Bug: ROCm/clr#21 Bug: ROCm/ROCm-CompilerSupport#61 Signed-off-by: Sv. Lockal <[email protected]> Signed-off-by: Benda Xu <[email protected]>
I just want to add that this issue also applies to |
Bug: https://bugs.gentoo.org/915969 Bug: ROCm/clr#18 Reference: gentoo#33400 Signed-off-by: Sv. Lockal <[email protected]>
Hi @AngryLoki, can you please check if your issue is fixed in the latest ROCm? If so, please close the ticket. Thanks! |
Hi @AngryLoki, sorry for the delay. I was able to reproduce the seg fault you're experiencing with the async_copy unit test in rocThrust, on ROCm 5.7 and with a CPU that supports AVX512 (Intel Core i7-11700). I can confirm that the issue is no longer present in ROCm 6.2 so I am closing it as resolved, but feel free to re-open it if you are still getting a seg fault. |
Due to unaligned allocations, library crashes in
nontemporalMemcpy
in_mm512_stream_si512
(which requires 64-aligned allocations, but used to copy default-aligned objects) in https://github.com/ROCm-Developer-Tools/clr/blob/5914ac3c6e9b3848023a7fa25e19e560b1c38541/rocclr/device/rocm/rocvirtual.cpp#L2793Originally reported to https://bugs.gentoo.org/915969 as a part of rocBLAS and miopen update (failure in hipamd module loader causes crash in dependent libraries).
The text was updated successfully, but these errors were encountered: