-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance improvements for any
and all
#57091
base: master
Are you sure you want to change the base?
Conversation
any
and all
any
and all
Here's the julia> code_llvm(all, Tuple{Tuple{Vararg{Bool, 32}}}; debuginfo=:none)
; Function Signature: all(NTuple{32, Bool})
define i8 @julia_all_2401(ptr nocapture noundef nonnull readonly align 1 dereferenceable(32) %"itr::Tuple") #0 {
top:
%0 = load <32 x i8>, ptr %"itr::Tuple", align 1
%1 = icmp eq <32 x i8> %0, zeroinitializer
%2 = bitcast <32 x i1> %1 to i32
%3 = icmp eq i32 %2, 0
%4 = zext i1 %3 to i8
ret i8 %4
}
julia> code_llvm(any, Tuple{Tuple{Vararg{Bool, 32}}}; debuginfo=:none)
; Function Signature: any(NTuple{32, Bool})
define i8 @julia_any_2403(ptr nocapture noundef nonnull readonly align 1 dereferenceable(32) %"itr::Tuple") #0 {
top:
%0 = load <32 x i8>, ptr %"itr::Tuple", align 1
%1 = call i8 @llvm.vector.reduce.or.v32i8(<32 x i8> %0)
%2 = icmp ne i8 %1, 0
%3 = zext i1 %2 to i8
ret i8 %3
}
julia> code_llvm(all, Tuple{Tuple{Vararg{Bool, 64}}}; debuginfo=:none)
; Function Signature: all(NTuple{64, Bool})
define i8 @julia_all_2405(ptr nocapture noundef nonnull readonly align 1 dereferenceable(64) %"itr::Tuple") #0 {
top:
%wide.load = load <32 x i8>, ptr %"itr::Tuple", align 1
%0 = icmp ne <32 x i8> %wide.load, zeroinitializer
%1 = getelementptr inbounds i8, ptr %"itr::Tuple", i64 32
%wide.load.1 = load <32 x i8>, ptr %1, align 1
%2 = icmp ne <32 x i8> %wide.load.1, zeroinitializer
%3 = and <32 x i1> %0, %2
%4 = bitcast <32 x i1> %3 to i32
%5 = icmp eq i32 %4, -1
%6 = zext i1 %5 to i8
ret i8 %6
}
julia> code_llvm(any, Tuple{Tuple{Vararg{Bool, 64}}}; debuginfo=:none)
; Function Signature: any(NTuple{64, Bool})
define i8 @julia_any_2407(ptr nocapture noundef nonnull readonly align 1 dereferenceable(64) %"itr::Tuple") #0 {
top:
%wide.load = load <32 x i8>, ptr %"itr::Tuple", align 1
%0 = getelementptr inbounds i8, ptr %"itr::Tuple", i64 32
%wide.load.1 = load <32 x i8>, ptr %0, align 1
%1 = or <32 x i8> %wide.load, %wide.load.1
%2 = icmp ne <32 x i8> %1, zeroinitializer
%3 = bitcast <32 x i1> %2 to i32
%4 = icmp ne i32 %3, 0
%5 = zext i1 %4 to i8
ret i8 %5
}
julia> versioninfo()
Julia Version 1.12.0-DEV.unknown
Commit 319082c (2025-01-18 07:54 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: 8 × AMD Ryzen 3 5300U with Radeon Graphics
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, znver2)
GC: Built with stock GC
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores) So it vectorizes, like in #55673, but it also vectorizes for lengths above 32. |
68013f3
to
a6a6f6b
Compare
Do you have some benchmarks showing the improved performance? |
Benchmarking scriptusing BenchmarkTools
for f ∈ (all, any)
println("f: $f")
for b ∈ (false, true)
println(" b: $b")
for l ∈ 32:32:96
println(" l: $l")
print(" ")
@btime ($f)(t) setup=(t = ntuple(Returns($b), $l);)
end
end
end Results
|
This comment was marked as resolved.
This comment was marked as resolved.
56bdef8
to
cd02788
Compare
0626951
to
e022f12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be more conservative by restricting the new methods to homogeneous tuples.
In particular: * Help ensure vectorization for homogeneous tuples of `Bool`. Inspired by JuliaLang#55673, but more general by using a loop, thus being performant for any input length. * Delete single-argument methods, instead define methods dispatching on `typeof(identity)`. This makes the methods more generally useful. * Make some optimizations defined for `all` also be defined for `any` in a symmetric manner. * Delete the methods specific to the empty tuple, as they're not required for such calls to be foldable. Closes JuliaLang#55673
Because the short-circuiting is promised by the docs.
83c59d5
to
481adeb
Compare
any
andall
for homogeneous tuples ofBool
, by avoiding bounds-checking and avoiding short-circuiting. Inspired by fast methods forany
andall
forBool
tuples #55673, but more general by relying on a loop instead of on recursion, thus being performant for any input length.Bool
or onMissing
any more.all
are deletedany
are deletedCloses #55673