Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults/wrong results from matmul! #193

Open
tam724 opened this issue Oct 17, 2024 · 3 comments
Open

Segfaults/wrong results from matmul! #193

tam724 opened this issue Oct 17, 2024 · 3 comments

Comments

@tam724
Copy link

tam724 commented Oct 17, 2024

Follow up on LuxDL/Lux.jl#980.
When running the following script I get the wrong results/segfaults:

using Pkg
Pkg.activate(temp=true)
Pkg.add("Octavian")
using Octavian

N = 3000;
a = ones(10, N);
b = ones(N, 10);

c = zeros(10, 10);
Octavian.matmul!(c, a, b, true, false) # gives wrong results

function my_matmul(a, b)
    c = similar(a, size(a, 1), size(b, 2))
    Octavian.matmul!(c, a, b, true, false)
    return c
end

my_matmul(a, b) # segfaults after running multiple times
run/core dump/versioninfo
julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × 11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)

julia> using Pkg

julia> Pkg.activate(temp=true)
  Activating new project at `/tmp/jl_2DruNR`

julia> Pkg.add("Octavian")
   Resolving package versions...
    Updating `/tmp/jl_2DruNR/Project.toml`
  [6fd5a793] + Octavian v0.3.28
    Updating `/tmp/jl_2DruNR/Manifest.toml`
  [79e6a3ab] + Adapt v4.0.4
  [4fba245c] + ArrayInterface v7.16.0
  [62783981] + BitTwiddlingConvenienceFunctions v0.1.6
  [2a0fbf3d] + CPUSummary v0.2.6
  [fb6a15b2] + CloseOpenIntervals v0.1.13
  [f70d9fcc] + CommonWorldInvalidations v1.0.0
  [34da2185] + Compat v4.16.0
  [adafc99b] + CpuId v0.3.1
  [ffbed154] + DocStringExtensions v0.9.3
  [3e5b6fbb] + HostCPUFeatures v0.1.17
  [615f187c] + IfElse v0.1.1
  [10f19ff3] + LayoutPointers v0.1.17
  [bdcacae8] + LoopVectorization v0.12.171
  [d125e4d3] + ManualMemory v0.1.8
  [6fd5a793] + Octavian v0.3.28
  [6fe1bfb0] + OffsetArrays v1.14.1
  [1d0040c9] + PolyesterWeave v0.2.2
  [aea7be01] + PrecompileTools v1.2.1
  [21216c6a] + Preferences v1.4.3
  [ae029012] + Requires v1.3.0
  [94e857df] + SIMDTypes v0.1.0
  [476501e8] + SLEEFPirates v0.6.43
  [aedffcd0] + Static v1.1.1
  [0d7ed370] + StaticArrayInterface v1.8.0
  [8290d209] + ThreadingUtilities v0.5.2
  [3a884ed6] + UnPack v1.0.2
  [3d5dd08c] + VectorizationBase v0.21.70
  [56f22d72] + Artifacts v1.11.0
  [2a0f44e3] + Base64 v1.11.0
  [ade2ca70] + Dates v1.11.0
  [76f85450] + LibGit2 v1.11.0
  [8f399da3] + Libdl v1.11.0
  [37e2e46d] + LinearAlgebra v1.11.0
  [d6f4376e] + Markdown v1.11.0
  [ca575930] + NetworkOptions v1.2.0
  [de0858da] + Printf v1.11.0
  [9a3f8284] + Random v1.11.0
  [ea8e919c] + SHA v0.7.0
  [fa267f1f] + TOML v1.0.3
  [cf7118a7] + UUIDs v1.11.0
  [4ec0a83e] + Unicode v1.11.0
  [e66e0078] + CompilerSupportLibraries_jll v1.1.1+0
  [e37daf67] + LibGit2_jll v1.7.2+0
  [29816b5a] + LibSSH2_jll v1.11.0+1
  [c8ffd9c3] + MbedTLS_jll v2.28.6+0
  [4536629a] + OpenBLAS_jll v0.3.27+1
  [8e850b90] + libblastrampoline_jll v5.11.0+0

julia> using Octavian

julia> N = 3000;

julia> a = ones(10, N);

julia> b = ones(N, 10);

julia> c = zeros(10, 10);

julia> Octavian.matmul!(c, a, b, true, false) # gives wrong results
10×10 Matrix{Float64}:
 3000.0  3000.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  3000.0
 3000.0  3000.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  3000.0
 3000.0  3000.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  3000.0
 3000.0  3000.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  3000.0
 3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0
 3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0
 3000.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  3000.0
 3000.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  3000.0
 3000.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  3000.0
 3000.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  3000.0

julia> function my_matmul(a, b)
           c = similar(a, size(a, 1), size(b, 2))
           Octavian.matmul!(c, a, b, true, false)
           return c
       end
my_matmul (generic function with 1 method)

julia> my_matmul(a, b) # segfaults after running multiple times
10×10 
[186775] signal 11 (128): Segmentation fault
in expression starting at none:0
jl_gc_pool_alloc_inner at /cache/build/builder-demeter6-6/julialang/julia-master/src/gc.c:1335
jl_gc_pool_alloc_noinline at /cache/build/builder-demeter6-6/julialang/julia-master/src/gc.c:1392 [inlined]
jl_gc_alloc_ at /cache/build/builder-demeter6-6/julialang/julia-master/src/julia_internal.h:523 [inlined]
jl_gc_alloc at /cache/build/builder-demeter6-6/julialang/julia-master/src/gc.c:3952
_new_genericmemory_ at /cache/build/builder-demeter6-6/julialang/julia-master/src/genericmemory.c:56 [inlined]
jl_alloc_genericmemory at /cache/build/builder-demeter6-6/julialang/julia-master/src/genericmemory.c:99
ijl_array_grow_end at /cache/build/builder-demeter6-6/julialang/julia-master/src/array.c:229
ijl_module_names at /cache/build/builder-demeter6-6/julialang/julia-master/src/module.c:1001
#unsorted_names#9 at ./reflection.jl:96 [inlined]
unsorted_names at ./reflection.jl:96 [inlined]
make_typealias at ./show.jl:621
show_typealias at ./show.jl:802
_show_type at ./show.jl:967
show at ./show.jl:962
print at ./strings/io.jl:35
showarg at ./show.jl:3209 [inlined]
array_summary at ./show.jl:3152 [inlined]
summary at ./show.jl:3149 [inlined]
show at ./arrayshow.jl:368
unknown function (ip: 0x7f5d54341006)
#68 at /cache/build/builder-demeter6-6/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:367
jfptr_YY.68_10032 at /home/tamme/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_GYsA8.so (unknown line)
with_repl_linfo at /cache/build/builder-demeter6-6/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:661
jfptr_with_repl_linfo_10162 at /home/tamme/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_GYsA8.so (unknown line)
display at /cache/build/builder-demeter6-6/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:353
display at /cache/build/builder-demeter6-6/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:372 [inlined]
display at ./multimedia.jl:340
jfptr_display_13539 at /home/tamme/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_GYsA8.so (unknown line)
jl_apply at /cache/build/builder-demeter6-6/julialang/julia-master/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-demeter6-6/julialang/julia-master/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
print_response at /cache/build/builder-demeter6-6/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:409
#70 at /cache/build/builder-demeter6-6/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:378
jfptr_YY.70_10070 at /home/tamme/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_GYsA8.so (unknown line)
with_repl_linfo at /cache/build/builder-demeter6-6/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:661
jfptr_with_repl_linfo_10162 at /home/tamme/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_GYsA8.so (unknown line)
print_response at /cache/build/builder-demeter6-6/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:376
do_respond at /cache/build/builder-demeter6-6/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:1003
jfptr_do_respond_10225 at /home/tamme/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_GYsA8.so (unknown line)
jl_apply at /cache/build/builder-demeter6-6/julialang/julia-master/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-demeter6-6/julialang/julia-master/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_interface at /cache/build/builder-demeter6-6/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2755
jfptr_run_interface_8706 at /home/tamme/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_GYsA8.so (unknown line)
run_frontend at /cache/build/builder-demeter6-6/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:1471
#75 at /cache/build/builder-demeter6-6/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:480
jfptr_YY.75_10127 at /home/tamme/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_GYsA8.so (unknown line)
jl_apply at /cache/build/builder-demeter6-6/julialang/julia-master/src/julia.h:2157 [inlined]
start_task at /cache/build/builder-demeter6-6/julialang/julia-master/src/task.c:1202
Allocations: 32345489 (Pool: 32343573; Big: 1916); GC: 25
Segmentation fault (core dumped)

I was able to reproduce this on julia v1.11.0 and v1.10.5 on my machine and reliably on v1.11.0 on another machine.

run different machine
julia> versioninfo()
Julia Version 1.11.0
Commit 501a4f25c2b (2024-10-07 11:40 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 9374F 32-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 128 virtual cores)

julia> using Pkg

julia> Pkg.activate(temp=true)
  Activating new project at `/tmp/jl_Mfzlrb`

julia> Pkg.add("Octavian")
   Resolving package versions...
    Updating `/tmp/jl_Mfzlrb/Project.toml`
  [6fd5a793] + Octavian v0.3.28
    Updating `/tmp/jl_Mfzlrb/Manifest.toml`
  [79e6a3ab] + Adapt v4.0.4
  [4fba245c] + ArrayInterface v7.16.0
  [62783981] + BitTwiddlingConvenienceFunctions v0.1.6
  [2a0fbf3d] + CPUSummary v0.2.6
  [fb6a15b2] + CloseOpenIntervals v0.1.13
  [f70d9fcc] + CommonWorldInvalidations v1.0.0
  [34da2185] + Compat v4.16.0
  [adafc99b] + CpuId v0.3.1
  [ffbed154] + DocStringExtensions v0.9.3
  [3e5b6fbb] + HostCPUFeatures v0.1.17
  [615f187c] + IfElse v0.1.1
  [10f19ff3] + LayoutPointers v0.1.17
  [bdcacae8] + LoopVectorization v0.12.171
  [d125e4d3] + ManualMemory v0.1.8
  [6fd5a793] + Octavian v0.3.28
  [6fe1bfb0] + OffsetArrays v1.14.1
  [1d0040c9] + PolyesterWeave v0.2.2
  [aea7be01] + PrecompileTools v1.2.1
  [21216c6a] + Preferences v1.4.3
  [ae029012] + Requires v1.3.0
  [94e857df] + SIMDTypes v0.1.0
  [476501e8] + SLEEFPirates v0.6.43
  [aedffcd0] + Static v1.1.1
  [0d7ed370] + StaticArrayInterface v1.8.0
  [8290d209] + ThreadingUtilities v0.5.2
  [3a884ed6] + UnPack v1.0.2
  [3d5dd08c] + VectorizationBase v0.21.70
  [56f22d72] + Artifacts v1.11.0
  [2a0f44e3] + Base64 v1.11.0
  [ade2ca70] + Dates v1.11.0
  [76f85450] + LibGit2 v1.11.0
  [8f399da3] + Libdl v1.11.0
  [37e2e46d] + LinearAlgebra v1.11.0
  [d6f4376e] + Markdown v1.11.0
  [ca575930] + NetworkOptions v1.2.0
  [de0858da] + Printf v1.11.0
  [9a3f8284] + Random v1.11.0
  [ea8e919c] + SHA v0.7.0
  [fa267f1f] + TOML v1.0.3
  [cf7118a7] + UUIDs v1.11.0
  [4ec0a83e] + Unicode v1.11.0
  [e66e0078] + CompilerSupportLibraries_jll v1.1.1+0
  [e37daf67] + LibGit2_jll v1.7.2+0
  [29816b5a] + LibSSH2_jll v1.11.0+1
  [c8ffd9c3] + MbedTLS_jll v2.28.6+0
  [4536629a] + OpenBLAS_jll v0.3.27+1
  [8e850b90] + libblastrampoline_jll v5.11.0+0

julia> using Octavian

julia> N = 3000;

julia> a = ones(10, N);

julia> b = ones(N, 10);

julia> c = zeros(10, 10);

julia> Octavian.matmul!(c, a, b, true, false) # gives wrong results
10×10 Matrix{Float64}:
 3000.0  3000.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  3000.0
 3000.0  3000.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  3000.0
 3000.0  3000.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  3000.0
 3000.0  3000.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  2998.0  3000.0
 3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0
 3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0
 3000.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  3000.0
 3000.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  3000.0
 3000.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  3000.0
 3000.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  2999.0  3000.0

julia> function my_matmul(a, b)
           c = similar(a, size(a, 1), size(b, 2))
           Octavian.matmul!(c, a, b, true, false)
           return c
       end
my_matmul (generic function with 1 method)

julia> my_matmul(a, b) # segfaults after running multiple times
10×10 
[24457] signal 11 (128): Segmentation fault
in expression starting at none:0
jl_gc_pool_alloc_inner at /cache/build/builder-amdci5-1/julialang/julia-master/src/gc.c:1335
jl_gc_pool_alloc_noinline at /cache/build/builder-amdci5-1/julialang/julia-master/src/gc.c:1392 [inlined]
jl_gc_alloc_ at /cache/build/builder-amdci5-1/julialang/julia-master/src/julia_internal.h:523 [inlined]
jl_gc_alloc at /cache/build/builder-amdci5-1/julialang/julia-master/src/gc.c:3952
_new_genericmemory_ at /cache/build/builder-amdci5-1/julialang/julia-master/src/genericmemory.c:56 [inlined]
jl_alloc_genericmemory at /cache/build/builder-amdci5-1/julialang/julia-master/src/genericmemory.c:99
ijl_array_grow_end at /cache/build/builder-amdci5-1/julialang/julia-master/src/array.c:229
ijl_module_names at /cache/build/builder-amdci5-1/julialang/julia-master/src/module.c:1006
#unsorted_names#9 at ./reflection.jl:96 [inlined]
unsorted_names at ./reflection.jl:96 [inlined]
make_typealias at ./show.jl:629
show_typealias at ./show.jl:810
_show_type at ./show.jl:975
show at ./show.jl:970
print at ./strings/io.jl:35
showarg at ./show.jl:3217 [inlined]
array_summary at ./show.jl:3160 [inlined]
summary at ./show.jl:3157 [inlined]
show at ./arrayshow.jl:368
unknown function (ip: 0x7ff42025af26)
#68 at /cache/build/builder-amdci5-1/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:348
jfptr_YY.68_10156 at /storage/home/tam/.julia/juliaup/julia-1.11.0+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_bFCI4.so (unknown line)
with_repl_linfo at /cache/build/builder-amdci5-1/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:646
jfptr_with_repl_linfo_10298 at /storage/home/tam/.julia/juliaup/julia-1.11.0+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_bFCI4.so (unknown line)
display at /cache/build/builder-amdci5-1/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:334
display at /cache/build/builder-amdci5-1/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:353 [inlined]
display at ./multimedia.jl:340
jfptr_display_13763 at /storage/home/tam/.julia/juliaup/julia-1.11.0+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_bFCI4.so (unknown line)
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-master/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-amdci5-1/julialang/julia-master/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1054 [inlined]
invokelatest at ./essentials.jl:1051 [inlined]
print_response at /cache/build/builder-amdci5-1/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:390
#70 at /cache/build/builder-amdci5-1/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:359
jfptr_YY.70_10194 at /storage/home/tam/.julia/juliaup/julia-1.11.0+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_bFCI4.so (unknown line)
with_repl_linfo at /cache/build/builder-amdci5-1/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:646
jfptr_with_repl_linfo_10298 at /storage/home/tam/.julia/juliaup/julia-1.11.0+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_bFCI4.so (unknown line)
print_response at /cache/build/builder-amdci5-1/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:357
do_respond at /cache/build/builder-amdci5-1/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:988
jfptr_do_respond_10361 at /storage/home/tam/.julia/juliaup/julia-1.11.0+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_bFCI4.so (unknown line)
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-master/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-amdci5-1/julialang/julia-master/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1054 [inlined]
invokelatest at ./essentials.jl:1051 [inlined]
run_interface at /cache/build/builder-amdci5-1/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2749
jfptr_run_interface_8811 at /storage/home/tam/.julia/juliaup/julia-1.11.0+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_bFCI4.so (unknown line)
run_frontend at /cache/build/builder-amdci5-1/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:1456
#75 at /cache/build/builder-amdci5-1/julialang/julia-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:461
jfptr_YY.75_10252 at /storage/home/tam/.julia/juliaup/julia-1.11.0+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_bFCI4.so (unknown line)
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-master/src/julia.h:2157 [inlined]
start_task at /cache/build/builder-amdci5-1/julialang/julia-master/src/task.c:1202
Allocations: 31945795 (Pool: 31943896; Big: 1899); GC: 25
Segmentation fault (core dumped)
@tam724
Copy link
Author

tam724 commented Oct 18, 2024

On all machines I was able to test the code above results in segfaults, if:

julia> Octavian.has_feature(Val(:x86_64_avx512f))
static(true)

Overriding the trait

Octavian.has_feature(::Val{:x86_64_avx512f}) = Octavian.static(false)

before calling the matmul!, "fixes" the segfault, although this is probably only a workaround.

Debugging this lead me to the following matmul! implementation, where the computed block/iter sizes seem wrong, which probably leads to out-of-bounds memory accesses.

(Mblock, Mblock_Mrem, Mremfinal, Mrem, Miter),

This is what I get (using julia v1.10.5):

julia> versioninfo()
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × 11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)

julia> Octavian.has_feature(Val(:x86_64_avx512f))
static(true)

julia> Octavian.matmul!(c, a, b, true, false)
((Mblock, Mblock_Mrem, Mremfinal, Mrem, Miter), (Kblock, Kblock_Krem, Krem, Kiter)) = ((24, 24, 10, 0, 0), (3000, 3001, 0, 1))

julia> Octavian.has_feature(::Val{:x86_64_avx512f}) = Octavian.static(false)

julia> Octavian.matmul!(c, a, b, true, false)
((Mblock, Mblock_Mrem, Mremfinal, Mrem, Miter), (Kblock, Kblock_Krem, Krem, Kiter)) = ((0, 0, 10, 0, 1), (600, 601, 0, 5))

The fact that the function HostCPUFeatures.has_feature(Val(:x86_64_avx512f)) returns different values depending on the julia version (LLVM version related?) made it hard to reproduce on another machine. But it happens consistently under v1.11.1.

versioninfo / has_feature

(v.1.11.1)

julia> using HostCPUFeatures

julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 9374F 32-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 128 virtual cores)

julia> has_feature(Val(:x86_64_avx512f))
static(true)

(v.1.10.5)

julia> using HostCPUFeatures

julia> versioninfo()
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 9374F 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 128 virtual cores)

julia> has_feature(Val(:x86_64_avx512f))
static(false)

@chriselrod
Copy link
Collaborator

Octavian's blocking strategy/algorithm is outright bad/suboptimal.

Anyway, why is it even trying to block?

The overall size of the arrays is large, but the reduction dimension is so small, that it can fit the answer in registers.
It won't, though, because currently it'll store a 24 x 9 block in registers.

You can pick a location and do a size-check a simple hotfix would be to edit this line:

(nᵣ N) && @goto LOOPMUL

To also check if W*mᵣ >= M.

@tam724
Copy link
Author

tam724 commented Oct 28, 2024

Thanks! This resolves the initial issue.
However, when playing with different matrix sizes I encountered the following:

julia> using Octavian

julia> M, K, N = 25, 3000, 10
(25, 3000, 10)

julia> A = ones(M, K);

julia> B = ones(K, N);

julia> C = zeros(size(A, 1), size(B, 2));

julia> Octavian.matmul!(C, A, B)
((Mblock, Mblock_Mrem, Mremfinal, Mrem, Miter), (Kblock, Kblock_Krem, Krem, Kiter)) = ((24, 24, 1, 0, 1), (3000, 3001, 0, 1))
25×10 Matrix{Float64}:
 3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
                                                                       
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0

julia> @assert C  A*B
ERROR: AssertionError: C  A * B

The printed block-sizes are the ones computed in the following line:

) = solve_McKc(

It seems as if for this case Miter should be Miter+1.

Again, this only occurs if

julia> Octavian.has_feature(Val(:x86_64_avx512f))
static(true)

Otherwise, the computed block sizes are different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants