Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump LLVM to v19.1.1+1 #56130

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Bump LLVM to v19.1.1+1 #56130

wants to merge 15 commits into from

Conversation

Zentrik
Copy link
Member

@Zentrik Zentrik commented Oct 12, 2024

Including #55650 till that's merged.

@Zentrik Zentrik added compiler:llvm For issues that relate to LLVM building Build system, or building Julia or its dependencies external dependencies Involves LLVM, OpenBLAS, or other linked libraries JLLs labels Oct 12, 2024
@Zentrik
Copy link
Member Author

Zentrik commented Oct 12, 2024

The /home/rag/Documents/Code/julia/usr/include/llvm/ADT/StringRef.h:871:20: warning: 'int __builtin_memcmp_eq(const void*, const void*, long unsigned int)' specified bound 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Wstringop-overread] warning seems to be a gcc bug, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101361

EDIT: gcc is giving a slightly different warning on CI but likely a bug as well

@giordano
Copy link
Contributor

In that case you can guard the offending line with something like

#pragma GCC diagnostic push
#if defined(_COMPILER_GCC_) && __GNUC__ >= 12 // if this is version-dependent
// Explain why this is being ignored...
#pragma GCC diagnostic ignored "-Wstringop-overflow"
#endif
...
#pragma GCC diagnostic pop

@giordano
Copy link
Contributor

Also, all tests are passing already on aarch64-darwin 🥳

@Zentrik
Copy link
Member Author

Zentrik commented Oct 12, 2024

Looks like the only two issues are a incorrect warning during the build (mose's suggestion doesn't seem to work though #pragma GCC diagnostic ignored "-Wstringop-overflow" by itself does) and /cache/build/tester-demeter6-13/julialang/julia-master/tmp/test-asan/asan/usr/bin/julia: error while loading shared libraries: libclang_rt.asan-x86_64.so: cannot open shared object file: No such file or directory. (I'll push the analyzegc fix in a bit).

@Zentrik
Copy link
Member Author

Zentrik commented Oct 12, 2024

Guessing the asan problem is clang -print-runtime-dir is returning toolchain/usr/lib/clang/19/lib/x86_64-unknown-linux-gnu instead of toolchain/usr/lib/clang/19/lib/linux so we don't find libclang_rt.asan... to copy to the build folder.

@giordano
Copy link
Contributor

though #pragma GCC diagnostic ignored "-Wstringop-overflow" by itself does

Without the push-and-pop, that becomes closer to just adding -Wstringop-overflow to

julia/src/Makefile

Lines 23 to 25 in 80e60c8

ifeq ($(USEGCC),1) # GCC bug #25509 (void)__attribute__((warn_unused_result))
FLAGS += -Wno-unused-result
endif
(please always add a comment to explain why the warning is being skipped) but it'd be nicer to keep the ignore as local as possible, if feasible.

@Zentrik Zentrik force-pushed the llvm-19-actual branch 2 times, most recently from 5c3c307 to 24f4d93 Compare October 13, 2024 09:48
@Zentrik
Copy link
Member Author

Zentrik commented Oct 13, 2024

Looks like remaining issue is some failures in analyzegc that I guess weren't caught by previous versions of clang.

@giordano giordano added the needs pkgeval Tests for all registered packages should be run with this change label Oct 13, 2024
@Zentrik
Copy link
Member Author

Zentrik commented Oct 13, 2024

Second analyzegc failure looks incorrect, it seems to incorrectly think uv_dup could return 0 when dupfd is set to -1 (minimal reproducer https://godbolt.org/z/4Wo99v1nv, llvm/llvm-project#43015 looks to be the same issue).

@Zentrik
Copy link
Member Author

Zentrik commented Oct 13, 2024

@nanosoldier runbenchmarks(ALL, vs=":master")

@giordano
Copy link
Contributor

Need to rebase on master now that #56133 has been merged.

@Zentrik Zentrik force-pushed the llvm-19-actual branch 2 times, most recently from f6b2376 to e12026f Compare October 13, 2024 20:43
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@vtjnash
Copy link
Member

vtjnash commented Oct 14, 2024

First analyzegc failure looks incorrect, it seems to incorrectly think uv_dup could return 0 when dupfd is set to -1 (minimal reproducer https://godbolt.org/z/4Wo99v1nv, llvm/llvm-project#43015 looks to be the same issue).

FWIW, the standard way to deal with this is to add an assert and a comment, as that will make both reviewers and bots happy

@Zentrik
Copy link
Member Author

Zentrik commented Oct 16, 2024

The union.array.(perf_sum, Float32, 0) regression is due to SIMD not occurring. This is due to a change in the result from the instcombine pass. See https://godbolt.org/z/qqdzM7oxh,
The change from

%12 = select i1 %exactly_isa.not, float -0.000000e+00, float %immutable_union.sroa.0.0.copyload, !dbg !77
%value_phi6 = fadd float %value_phi, %12, !dbg !77

to

%12 = fadd float %value_phi, %immutable_union.sroa.0.0.copyload, !dbg !77
%value_phi6 = select i1 %exactly_isa.not, float %value_phi, float %12, !dbg !77

prevents the loopvectorize pass from SIMDing the code.

I suspect the other regression for the union.array benchmarks are similar.

@Zentrik
Copy link
Member Author

Zentrik commented Oct 17, 2024

I'll try rebasing on top of #52850 and see if that fixes the regressions.

@Zentrik
Copy link
Member Author

Zentrik commented Oct 17, 2024

@nanosoldier runtests()

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@oscardssmith
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs=":gb/pipeline-fun")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@oscardssmith
Copy link
Member

oscardssmith commented Oct 19, 2024

looks like there are a few 8x (vectorization probably) regressions here.

@giordano
Copy link
Contributor

If that helps, the signature was added with #50630, to give an idea of where to look at in the code.

@Zentrik
Copy link
Member Author

Zentrik commented Dec 11, 2024

Thanks. On the function signature not printing it looks like we null out the printer here which prevents the printing
https://github.com/llvm/llvm-project/blob/9cdb7d2b6c333874ec969ef6ac64e0354bb3aa91/llvm/lib/CodeGen/AsmPrinter/DebugHandlerBase.cpp#L105-L108

@giordano
Copy link
Contributor

giordano commented Jan 6, 2025

@Zentrik what's needed to move this forward? It looks like you at least sorted the debug printing and had a decent idea of where the function signature was getting lost.

I'm concerned about SVE not working on aarch64 (interestingly enough, vscale vectorisation does work on riscv64 with vector 1.0 extension), but don't know how to debug that, if anyone has hints please speak up!

@Zentrik
Copy link
Member Author

Zentrik commented Jan 6, 2025

I'm largely stuck on getting the function signature printing working. I'm sure what can be done here, it looks like the rename of AsmPrinterHandler to DebugHandlerBase in llvm/llvm-project@117b53a actually changed the behaviour a bit. If someone more knowledgeable than me at LLVM once to take a look that would be great.

Is there any particular rush getting this in? I was assuming this would go in at the earliest after 1.12 is branched off.

@giordano
Copy link
Contributor

giordano commented Jan 6, 2025

Is there any particular rush getting this in? I was assuming this would go in at the earliest after 1.12 is branched off.

No particular rush related to v1.12 (it's not going to make it at this point), but I wanted to get a sense of where we're stuck and see what people can help with.

Zentrik and others added 12 commits January 8, 2025 20:34
`clang -print-runtime-dir` reports a non-existent directory as we build with `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=OFF`. See llvm/llvm-project#102834. I suspect llvm/llvm-project@b6a1473 caused the change by chaning the code in Driver.cpp to not check whether the printed directory existed.
Locally I get:
```
In file included from /home/rag/Documents/Code/julia/usr/include/llvm/Object/ObjectFile.h:18,
                 from /home/rag/Documents/Code/julia/usr/include/llvm/DebugInfo/DIContext.h:18,
                 from /home/rag/Documents/Code/julia/src/debuginfo.cpp:6:
In function 'bool llvm::operator==(llvm::StringRef, llvm::StringRef)',
    inlined from 'bool llvm::operator!=(llvm::StringRef, llvm::StringRef)' at /home/rag/Documents/Code/julia/usr/include/llvm/ADT/StringRef.h:874:71,
    inlined from 'objfileentry_t find_object_file(uint64_t, llvm::StringRef)' at /home/rag/Documents/Code/julia/src/debuginfo.cpp:948:43,
    inlined from 'bool jl_dylib_DI_for_fptr(size_t, llvm::object::SectionRef*, int64_t*, llvm::DIContext**, bool, bool*, uint64_t*, void**, char**, char**)' at /home/rag/Documents/Code/julia/src/debuginfo.cpp:1135:34:
/home/rag/Documents/Code/julia/usr/include/llvm/ADT/StringRef.h:871:20: warning: 'int __builtin_memcmp_eq(const void*, const void*, long unsigned int)' specified bound 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Wstringop-overread]
  871 |     return ::memcmp(LHS.data(), RHS.data(), LHS.size()) == 0;
      |            ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rag/Documents/Code/julia/src/debuginfo.cpp: In function 'bool jl_dylib_DI_for_fptr(size_t, llvm::object::SectionRef*, int64_t*, llvm::DIContext**, bool, bool*, uint64_t*, void**, char**, char**)':
/home/rag/Documents/Code/julia/src/debuginfo.cpp:1133:11: note: source object allocated here
 1133 |     fname = dlinfo.dli_fname;
 ```

 On CI:
```
In file included from /cache/build/builder-amdci4-2/julialang/julia-master/usr/include/llvm/Object/ObjectFile.h:18,
                 from /cache/build/builder-amdci4-2/julialang/julia-master/usr/include/llvm/DebugInfo/DIContext.h:18,
                 from /cache/build/builder-amdci4-2/julialang/julia-master/src/debuginfo.cpp:6:
In function 'bool llvm::operator==(llvm::StringRef, llvm::StringRef)',
    inlined from 'bool llvm::operator!=(llvm::StringRef, llvm::StringRef)' at /cache/build/builder-amdci4-2/julialang/julia-master/usr/include/llvm/ADT/StringRef.h:874:71,
    inlined from 'objfileentry_t find_object_file(uint64_t, llvm::StringRef)' at /cache/build/builder-amdci4-2/julialang/julia-master/src/debuginfo.cpp:943:43,
    inlined from 'bool jl_dylib_DI_for_fptr(size_t, llvm::object::SectionRef*, int64_t*, llvm::DIContext**, bool, bool*, uint64_t*, void**, char**, char**)' at /cache/build/builder-amdci4-2/julialang/julia-master/src/debuginfo.cpp:1126:47:
/cache/build/builder-amdci4-2/julialang/julia-master/usr/include/llvm/ADT/StringRef.h:871:20: error: 'int __builtin_memcmp_eq(const void*, const void*, long unsigned int)' specified size 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=]
  871 |     return ::memcmp(LHS.data(), RHS.data(), LHS.size()) == 0;
      |            ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
The change to `GCChecker.cpp` is so the static analyzer doesn't see e.g. a call to a function pointer in llvm and then complain that it might be a safepoint.
e7698a13e319a9919af04d3d693a6f6ea7168a44 isn't in llvm 19
@vtjnash
Copy link
Member

vtjnash commented Jan 8, 2025

Rebased and added your fixes with mine. The "Function Signature" text is still missing which I couldn't investigate today

@giordano
Copy link
Contributor

giordano commented Jan 9, 2025

Regarding the SVE issue, it may be due to LLVM optimization passes preferring NEON over SVE for some reason when targeting Neoverse V2 (which has 128-bit SVE registers, same width as NEON):

With both IRs, opt v18.1.0 always uses vscale, opt v19.1.0 always uses fixed-width <2 x double>.

However, for the same input, when targeting A64FX (which has 512-bit SVE registers) instead of Neoverse V2, vscale is always used:

Thanks @gbaraldi for suggesting doing this comparison between targets with different SVE register widths.

@giordano
Copy link
Contributor

giordano commented Jan 9, 2025

Finally, to reassure myself that things still work, I tried a complex dot product on half precision, for which LLVM prefers SVE over NEON also on neoverse-v2:

julia> function complex_dot(a::Vector{T}, b::Vector{T}) where {T}
           out = zero(eltype(a))
           for idx in eachindex(a, b)
               @fastmath out += conj(a[idx]) * b[idx]
           end
           out
       end
complex_dot (generic function with 1 method)

julia> code_llvm(complex_dot, NTuple{2,Vector{ComplexF16}}; debuginfo=:none)
[...]
vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.phi = phi half [ 0xH0000, %vector.ph ], [ %38, %vector.body ]
  %vec.phi92 = phi half [ 0xH0000, %vector.ph ], [ %40, %vector.body ]
  %offset.idx = shl i64 %index, 2
  %6 = or disjoint i64 %offset.idx, 4
  %7 = shl i64 %4, 5
  %8 = add i64 %7, %6
  %9 = getelementptr i8, ptr %memoryref_data, i64 %6
  %10 = getelementptr i8, ptr %memoryref_data, i64 %8
  %11 = getelementptr i8, ptr %9, i64 -4
  %12 = getelementptr i8, ptr %10, i64 -4
  %wide.vec = load <vscale x 16 x half>, ptr %11, align 2
  %wide.vec93 = load <vscale x 16 x half>, ptr %12, align 2
  %strided.vec = call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %wide.vec)
  %13 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec, 0
  %14 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec, 1
  %strided.vec94 = call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %wide.vec93)
  %15 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec94, 0
  %16 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec94, 1
  %17 = getelementptr i8, ptr %memoryref_data14, i64 %6
  %18 = getelementptr i8, ptr %memoryref_data14, i64 %8
  %19 = getelementptr i8, ptr %17, i64 -4
  %20 = getelementptr i8, ptr %18, i64 -4
  %wide.vec95 = load <vscale x 16 x half>, ptr %19, align 2
  %wide.vec96 = load <vscale x 16 x half>, ptr %20, align 2
  %strided.vec97 = call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %wide.vec95)
  %21 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec97, 0
  %22 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec97, 1
  %strided.vec98 = call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %wide.vec96)
  %23 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec98, 0
  %24 = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %strided.vec98, 1
  %25 = fmul <vscale x 8 x half> %13, %21
  %26 = fmul <vscale x 8 x half> %15, %23
  %27 = fmul <vscale x 8 x half> %14, %22
  %28 = fmul <vscale x 8 x half> %16, %24
  %29 = fadd <vscale x 8 x half> %25, %27
  %30 = fadd <vscale x 8 x half> %26, %28
  %31 = fmul <vscale x 8 x half> %13, %22
  %32 = fmul <vscale x 8 x half> %15, %24
  %33 = fmul <vscale x 8 x half> %14, %21
  %34 = fmul <vscale x 8 x half> %16, %23
  %35 = fsub <vscale x 8 x half> %31, %33
  %36 = fsub <vscale x 8 x half> %32, %34
  %37 = call half @llvm.vector.reduce.fadd.nxv8f16(half %vec.phi, <vscale x 8 x half> %29)
  %38 = call half @llvm.vector.reduce.fadd.nxv8f16(half %37, <vscale x 8 x half> %30)
  %39 = call half @llvm.vector.reduce.fadd.nxv8f16(half %vec.phi92, <vscale x 8 x half> %35)
  %40 = call half @llvm.vector.reduce.fadd.nxv8f16(half %39, <vscale x 8 x half> %36)
  %index.next = add nuw i64 %index, %5
  %41 = icmp eq i64 %index.next, %n.vec
  br i1 %41, label %middle.block, label %vector.body
[...]
julia> code_native(complex_dot, NTuple{2,Vector{ComplexF16}}; debuginfo=:none)
.LBB0_6:                                // %vector.body
                                        // =>This Inner Loop Header: Depth=1
        ld2h    { z2.h, z3.h }, p0/z, [x15]
        ld2h    { z4.h, z5.h }, p0/z, [x15, #2, mul vl]
        addvl   x15, x15, #4
        addvl   x14, x14, #-1
        ld2h    { z6.h, z7.h }, p0/z, [x16]
        fmul    z18.h, z2.h, z6.h
        fmul    z20.h, z3.h, z7.h
        ld2h    { z16.h, z17.h }, p0/z, [x16, #2, mul vl]
        fmul    z19.h, z4.h, z16.h
        fmul    z21.h, z5.h, z17.h
        addvl   x16, x16, #4
        fadd    z18.h, z18.h, z20.h
        fmul    z20.h, z2.h, z7.h
        fmul    z2.h, z3.h, z6.h
        fadd    z19.h, z19.h, z21.h
        fmul    z21.h, z4.h, z17.h
        fmul    z3.h, z5.h, z16.h
        fadda   h0, p0, h0, z18.h
        fsub    z2.h, z20.h, z2.h
        fsub    z3.h, z21.h, z3.h
        fadda   h1, p0, h1, z2.h
        fadda   h0, p0, h0, z19.h
        fadda   h1, p0, h1, z3.h
        cbnz    x14, .LBB0_6
// %bb.7:                               // %middle.block
        cbz     x13, .LBB0_10

Note the operations on <vscale x 8 x half> in LLVM IR, and on z registers in native code.

I guess the takeaway is that LLVM changed some heuristics to prefer SVE vs NEON when targeting CPUs with 128-bit SVE registers (same as NEON).

@vtjnash
Copy link
Member

vtjnash commented Jan 9, 2025

I'm sure what can be done here, it looks like the rename of AsmPrinterHandler to DebugHandlerBase in llvm/llvm-project@117b53a actually changed the behaviour a bit

Yep, looks like this is an implementation issue on llvm's side introduced by that commit. I've made a PR request to undo the breaking parts of that change and restore the functionality and unnecessary API breakage there: llvm/llvm-project#122297

We can backport that to our llvm 19 branch (with some minor conflicts), or simply have that be broken until llvm 20 is released.

@giordano
Copy link
Contributor

giordano commented Jan 9, 2025

To summarise:

Should we run PkgEval tests again? The previous logs seem to be gone now.

@Zentrik
Copy link
Member Author

Zentrik commented Jan 10, 2025

Should we run PkgEval tests again? The previous logs seem to be gone now.

I think it would be best to wait till this PR has the rest of the fixes in, but no harm in doing so. The report is at https://raw.githubusercontent.com/JuliaCI/NanosoldierReports/refs/heads/master/pkgeval/by_hash/128354f_vs_0bedaae/report.md

@oscardssmith
Copy link
Member

presumably the fix for the analyzeGC issue is to backport llvm/llvm-project#122330

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
building Build system, or building Julia or its dependencies compiler:llvm For issues that relate to LLVM external dependencies Involves LLVM, OpenBLAS, or other linked libraries JLLs needs pkgeval Tests for all registered packages should be run with this change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants