-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump LLVM to v19.1.1+1 #56130
base: master
Are you sure you want to change the base?
Bump LLVM to v19.1.1+1 #56130
Conversation
c1900d0
to
adc7db4
Compare
The EDIT: gcc is giving a slightly different warning on CI but likely a bug as well |
In that case you can guard the offending line with something like #pragma GCC diagnostic push
#if defined(_COMPILER_GCC_) && __GNUC__ >= 12 // if this is version-dependent
// Explain why this is being ignored...
#pragma GCC diagnostic ignored "-Wstringop-overflow"
#endif
...
#pragma GCC diagnostic pop |
Also, all tests are passing already on aarch64-darwin 🥳 |
Looks like the only two issues are a incorrect warning during the build (mose's suggestion doesn't seem to work though |
Guessing the asan problem is |
Without the push-and-pop, that becomes closer to just adding Lines 23 to 25 in 80e60c8
|
5c3c307
to
24f4d93
Compare
Looks like remaining issue is some failures in |
Second analyzegc failure looks incorrect, it seems to incorrectly think |
@nanosoldier |
Need to rebase on master now that #56133 has been merged. |
f6b2376
to
e12026f
Compare
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
FWIW, the standard way to deal with this is to add an assert and a comment, as that will make both reviewers and bots happy |
The %12 = select i1 %exactly_isa.not, float -0.000000e+00, float %immutable_union.sroa.0.0.copyload, !dbg !77
%value_phi6 = fadd float %value_phi, %12, !dbg !77 to %12 = fadd float %value_phi, %immutable_union.sroa.0.0.copyload, !dbg !77
%value_phi6 = select i1 %exactly_isa.not, float %value_phi, float %12, !dbg !77 prevents the loopvectorize pass from SIMDing the code. I suspect the other regression for the union.array benchmarks are similar. |
I'll try rebasing on top of #52850 and see if that fixes the regressions. |
@nanosoldier |
The package evaluation job you requested has completed - possible new issues were detected. |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
looks like there are a few 8x (vectorization probably) regressions here. |
If that helps, the signature was added with #50630, to give an idea of where to look at in the code. |
Thanks. On the function signature not printing it looks like we null out the printer here which prevents the printing |
@Zentrik what's needed to move this forward? It looks like you at least sorted the debug printing and had a decent idea of where the function signature was getting lost. I'm concerned about SVE not working on aarch64 (interestingly enough, |
I'm largely stuck on getting the function signature printing working. I'm sure what can be done here, it looks like the rename of Is there any particular rush getting this in? I was assuming this would go in at the earliest after 1.12 is branched off. |
No particular rush related to v1.12 (it's not going to make it at this point), but I wanted to get a sense of where we're stuck and see what people can help with. |
`clang -print-runtime-dir` reports a non-existent directory as we build with `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=OFF`. See llvm/llvm-project#102834. I suspect llvm/llvm-project@b6a1473 caused the change by chaning the code in Driver.cpp to not check whether the printed directory existed.
Locally I get: ``` In file included from /home/rag/Documents/Code/julia/usr/include/llvm/Object/ObjectFile.h:18, from /home/rag/Documents/Code/julia/usr/include/llvm/DebugInfo/DIContext.h:18, from /home/rag/Documents/Code/julia/src/debuginfo.cpp:6: In function 'bool llvm::operator==(llvm::StringRef, llvm::StringRef)', inlined from 'bool llvm::operator!=(llvm::StringRef, llvm::StringRef)' at /home/rag/Documents/Code/julia/usr/include/llvm/ADT/StringRef.h:874:71, inlined from 'objfileentry_t find_object_file(uint64_t, llvm::StringRef)' at /home/rag/Documents/Code/julia/src/debuginfo.cpp:948:43, inlined from 'bool jl_dylib_DI_for_fptr(size_t, llvm::object::SectionRef*, int64_t*, llvm::DIContext**, bool, bool*, uint64_t*, void**, char**, char**)' at /home/rag/Documents/Code/julia/src/debuginfo.cpp:1135:34: /home/rag/Documents/Code/julia/usr/include/llvm/ADT/StringRef.h:871:20: warning: 'int __builtin_memcmp_eq(const void*, const void*, long unsigned int)' specified bound 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Wstringop-overread] 871 | return ::memcmp(LHS.data(), RHS.data(), LHS.size()) == 0; | ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/rag/Documents/Code/julia/src/debuginfo.cpp: In function 'bool jl_dylib_DI_for_fptr(size_t, llvm::object::SectionRef*, int64_t*, llvm::DIContext**, bool, bool*, uint64_t*, void**, char**, char**)': /home/rag/Documents/Code/julia/src/debuginfo.cpp:1133:11: note: source object allocated here 1133 | fname = dlinfo.dli_fname; ``` On CI: ``` In file included from /cache/build/builder-amdci4-2/julialang/julia-master/usr/include/llvm/Object/ObjectFile.h:18, from /cache/build/builder-amdci4-2/julialang/julia-master/usr/include/llvm/DebugInfo/DIContext.h:18, from /cache/build/builder-amdci4-2/julialang/julia-master/src/debuginfo.cpp:6: In function 'bool llvm::operator==(llvm::StringRef, llvm::StringRef)', inlined from 'bool llvm::operator!=(llvm::StringRef, llvm::StringRef)' at /cache/build/builder-amdci4-2/julialang/julia-master/usr/include/llvm/ADT/StringRef.h:874:71, inlined from 'objfileentry_t find_object_file(uint64_t, llvm::StringRef)' at /cache/build/builder-amdci4-2/julialang/julia-master/src/debuginfo.cpp:943:43, inlined from 'bool jl_dylib_DI_for_fptr(size_t, llvm::object::SectionRef*, int64_t*, llvm::DIContext**, bool, bool*, uint64_t*, void**, char**, char**)' at /cache/build/builder-amdci4-2/julialang/julia-master/src/debuginfo.cpp:1126:47: /cache/build/builder-amdci4-2/julialang/julia-master/usr/include/llvm/ADT/StringRef.h:871:20: error: 'int __builtin_memcmp_eq(const void*, const void*, long unsigned int)' specified size 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=] 871 | return ::memcmp(LHS.data(), RHS.data(), LHS.size()) == 0; | ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ```
The change to `GCChecker.cpp` is so the static analyzer doesn't see e.g. a call to a function pointer in llvm and then complain that it might be a safepoint.
e7698a13e319a9919af04d3d693a6f6ea7168a44 isn't in llvm 19
128354f
to
a90df10
Compare
Rebased and added your fixes with mine. The "Function Signature" text is still missing which I couldn't investigate today |
Regarding the SVE issue, it may be due to LLVM optimization passes preferring NEON over SVE for some reason when targeting Neoverse V2 (which has 128-bit SVE registers, same width as NEON):
With both IRs, However, for the same input, when targeting A64FX (which has 512-bit SVE registers) instead of Neoverse V2,
Thanks @gbaraldi for suggesting doing this comparison between targets with different SVE register widths. |
Finally, to reassure myself that things still work, I tried a complex dot product on half precision, for which LLVM prefers SVE over NEON also on neoverse-v2: julia> function complex_dot(a::Vector{T}, b::Vector{T}) where {T}
out = zero(eltype(a))
for idx in eachindex(a, b)
@fastmath out += conj(a[idx]) * b[idx]
end
out
end
complex_dot (generic function with 1 method)
julia> code_llvm(complex_dot, NTuple{2,Vector{ComplexF16}}; debuginfo=:none) [...]
vector.body: ; preds = %vector.body, %vector.ph
%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%vec.phi = phi half [ 0xH0000, %vector.ph ], [ %38, %vector.body ]
%vec.phi92 = phi half [ 0xH0000, %vector.ph ], [ %40, %vector.body ]
%offset.idx = shl i64 %index, 2
%6 = or disjoint i64 %offset.idx, 4
%7 = shl i64 %4, 5
%8 = add i64 %7, %6
%9 = getelementptr i8, ptr %memoryref_data, i64 %6
%10 = getelementptr i8, ptr %memoryref_data, i64 %8
%11 = getelementptr i8, ptr %9, i64 -4
%12 = getelementptr i8, ptr %10, i64 -4
%wide.vec = load <vscale x 16 x half>, ptr %11, align 2
%wide.vec93 = load <vscale x 16 x half>, ptr %12, align 2
%strided.vec = call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %wide.vec)
%13 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec, 0
%14 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec, 1
%strided.vec94 = call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %wide.vec93)
%15 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec94, 0
%16 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec94, 1
%17 = getelementptr i8, ptr %memoryref_data14, i64 %6
%18 = getelementptr i8, ptr %memoryref_data14, i64 %8
%19 = getelementptr i8, ptr %17, i64 -4
%20 = getelementptr i8, ptr %18, i64 -4
%wide.vec95 = load <vscale x 16 x half>, ptr %19, align 2
%wide.vec96 = load <vscale x 16 x half>, ptr %20, align 2
%strided.vec97 = call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %wide.vec95)
%21 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec97, 0
%22 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec97, 1
%strided.vec98 = call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %wide.vec96)
%23 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec98, 0
%24 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec98, 1
%25 = fmul <vscale x 8 x half> %13, %21
%26 = fmul <vscale x 8 x half> %15, %23
%27 = fmul <vscale x 8 x half> %14, %22
%28 = fmul <vscale x 8 x half> %16, %24
%29 = fadd <vscale x 8 x half> %25, %27
%30 = fadd <vscale x 8 x half> %26, %28
%31 = fmul <vscale x 8 x half> %13, %22
%32 = fmul <vscale x 8 x half> %15, %24
%33 = fmul <vscale x 8 x half> %14, %21
%34 = fmul <vscale x 8 x half> %16, %23
%35 = fsub <vscale x 8 x half> %31, %33
%36 = fsub <vscale x 8 x half> %32, %34
%37 = call half @llvm.vector.reduce.fadd.nxv8f16(half %vec.phi, <vscale x 8 x half> %29)
%38 = call half @llvm.vector.reduce.fadd.nxv8f16(half %37, <vscale x 8 x half> %30)
%39 = call half @llvm.vector.reduce.fadd.nxv8f16(half %vec.phi92, <vscale x 8 x half> %35)
%40 = call half @llvm.vector.reduce.fadd.nxv8f16(half %39, <vscale x 8 x half> %36)
%index.next = add nuw i64 %index, %5
%41 = icmp eq i64 %index.next, %n.vec
br i1 %41, label %middle.block, label %vector.body
[...] julia> code_native(complex_dot, NTuple{2,Vector{ComplexF16}}; debuginfo=:none) .LBB0_6: // %vector.body
// =>This Inner Loop Header: Depth=1
ld2h { z2.h, z3.h }, p0/z, [x15]
ld2h { z4.h, z5.h }, p0/z, [x15, #2, mul vl]
addvl x15, x15, #4
addvl x14, x14, #-1
ld2h { z6.h, z7.h }, p0/z, [x16]
fmul z18.h, z2.h, z6.h
fmul z20.h, z3.h, z7.h
ld2h { z16.h, z17.h }, p0/z, [x16, #2, mul vl]
fmul z19.h, z4.h, z16.h
fmul z21.h, z5.h, z17.h
addvl x16, x16, #4
fadd z18.h, z18.h, z20.h
fmul z20.h, z2.h, z7.h
fmul z2.h, z3.h, z6.h
fadd z19.h, z19.h, z21.h
fmul z21.h, z4.h, z17.h
fmul z3.h, z5.h, z16.h
fadda h0, p0, h0, z18.h
fsub z2.h, z20.h, z2.h
fsub z3.h, z21.h, z3.h
fadda h1, p0, h1, z2.h
fadda h0, p0, h0, z19.h
fadda h1, p0, h1, z3.h
cbnz x14, .LBB0_6
// %bb.7: // %middle.block
cbz x13, .LBB0_10 Note the operations on I guess the takeaway is that LLVM changed some heuristics to prefer SVE vs NEON when targeting CPUs with 128-bit SVE registers (same as NEON). |
Yep, looks like this is an implementation issue on llvm's side introduced by that commit. I've made a PR request to undo the breaking parts of that change and restore the functionality and unnecessary API breakage there: llvm/llvm-project#122297 We can backport that to our llvm 19 branch (with some minor conflicts), or simply have that be broken until llvm 20 is released. |
To summarise:
Should we run PkgEval tests again? The previous logs seem to be gone now. |
I think it would be best to wait till this PR has the rest of the fixes in, but no harm in doing so. The report is at https://raw.githubusercontent.com/JuliaCI/NanosoldierReports/refs/heads/master/pkgeval/by_hash/128354f_vs_0bedaae/report.md |
presumably the fix for the analyzeGC issue is to backport llvm/llvm-project#122330 |
Including #55650 till that's merged.