Skip to content

Conversation

@vchuravy
Copy link
Member

No description provided.

@vchuravy vchuravy added the Julia v1.12 Related to compatibility with Julia v1.12 label Oct 20, 2025
@codecov
Copy link

codecov bot commented Oct 20, 2025

Codecov Report

❌ Patch coverage is 98.79032% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.05%. Comparing base (27294f3) to head (7c06ed4).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
src/compiler/optimize.jl 98.99% 2 Missing ⚠️
src/llvm/transforms.jl 96.87% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2675      +/-   ##
==========================================
- Coverage   72.58%   68.05%   -4.54%     
==========================================
  Files          58       58              
  Lines       18746    18460     -286     
==========================================
- Hits        13607    12563    -1044     
- Misses       5139     5897     +758     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 20, 2025

Benchmark Results

main 7c06ed4... main / 7c06ed4...
basics/make_zero/namedtuple 0.0529 ± 0.0028 μs 0.0535 ± 0.0024 μs 0.988 ± 0.068
basics/make_zero/struct 0.255 ± 0.0071 μs 0.251 ± 0.0068 μs 1.01 ± 0.039
basics/overhead 4.34 ± 0.01 ns 5.25 ± 1.8 ns 0.826 ± 0.29
basics/remake_zero!/namedtuple 0.241 ± 0.01 μs 0.241 ± 0.0087 μs 0.998 ± 0.055
basics/remake_zero!/struct 0.238 ± 0.011 μs 0.235 ± 0.012 μs 1.01 ± 0.069
fold_broadcast/multidim_sum_bcast/1D 10.3 ± 0.27 μs 10.3 ± 0.37 μs 0.996 ± 0.044
fold_broadcast/multidim_sum_bcast/2D 12.1 ± 0.28 μs 12.1 ± 0.24 μs 1 ± 0.03
time_to_load 1.26 ± 0.0016 s 1.24 ± 0.014 s 1.02 ± 0.011

Benchmark Plots

A plot of the benchmark results has been uploaded as an artifact at https://github.com/EnzymeAD/Enzyme.jl/actions/runs/18876631945/artifacts/4393981985.

@gbaraldi
Copy link
Collaborator

Where are we registering the Enzyme passes?

@vchuravy
Copy link
Member Author

We are not yet, I am not even sure we need to? Right now I want a working legalization and pre-opt step which is just Julia based passes.

@vchuravy
Copy link
Member Author

Right now autodiff(Forward, identity, Duplicated(1.0, 1.0)) fails due to missing legalization

@vchuravy vchuravy marked this pull request as ready for review October 24, 2025 14:44
@vchuravy vchuravy requested a review from gbaraldi October 24, 2025 14:44
@github-actions
Copy link
Contributor

github-actions bot commented Oct 24, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic main) to apply these changes.

Click here to view the suggested changes.
diff --git a/src/compiler.jl b/src/compiler.jl
index 96b1c01c..2ab0e65b 100644
--- a/src/compiler.jl
+++ b/src/compiler.jl
@@ -1263,7 +1263,7 @@ function nested_codegen!(
     edges = edges::Vector{Any}
     push!(edges, funcspec)
 
-    LLVM.@dispose pb=LLVM.NewPMPassBuilder() begin
+    LLVM.@dispose pb = LLVM.NewPMPassBuilder() begin
         registerEnzymeAndPassPipeline!(pb)
         LLVM.add!(pb, LLVM.NewPMModulePassManager()) do mpm
             LLVM.add!(mpm, PreserveNVVMPass())
@@ -2755,7 +2755,7 @@ function enzyme!(
     for f in collect(functions(mod))
         API.EnzymeFixupBatchedJuliaCallingConvention(f)
     end
-    run!(DCEPass(), mod)
+            run!(DCEPass(), mod)
     fix_decayaddr!(mod)
     adjointf = adjointf == nothing ? nothing : functions(mod)[adjointfname]
     augmented_primalf =
@@ -4502,7 +4502,7 @@ function GPUCompiler.compile_unhooked(output::Symbol, job::CompilerJob{<:EnzymeT
         permit_inlining!(f)
     end
 
-    LLVM.@dispose pb=LLVM.NewPMPassBuilder() begin
+    LLVM.@dispose pb = LLVM.NewPMPassBuilder() begin
         registerEnzymeAndPassPipeline!(pb)
         LLVM.add!(pb, LLVM.NewPMModulePassManager()) do mpm
             LLVM.add!(mpm, PreserveNVVMPass())
@@ -5186,7 +5186,7 @@ end
         augmented_primalf = nothing
     end
 
-    LLVM.@dispose pb=LLVM.NewPMPassBuilder() begin
+    LLVM.@dispose pb = LLVM.NewPMPassBuilder() begin
         registerEnzymeAndPassPipeline!(pb)
         LLVM.add!(pb, LLVM.NewPMModulePassManager()) do mpm
             LLVM.add!(mpm, PreserveNVVMEndPass())
diff --git a/src/compiler/optimize.jl b/src/compiler/optimize.jl
index a4f4334f..ab3f34d4 100644
--- a/src/compiler/optimize.jl
+++ b/src/compiler/optimize.jl
@@ -1,6 +1,6 @@
 function registerEnzymeAndPassPipeline!(pb::NewPMPassBuilder)
     enzyme_callback = cglobal((:registerEnzymeAndPassPipeline, API.libEnzyme))
-    LLVM.API.LLVMPassBuilderExtensionsPushRegistrationCallbacks(pb.exts, enzyme_callback)
+    return LLVM.API.LLVMPassBuilderExtensionsPushRegistrationCallbacks(pb.exts, enzyme_callback)
 end
 
 LLVM.@function_pass "jl-inst-simplify" JLInstSimplifyPass
@@ -26,7 +26,7 @@ Addr13NoAliasPass() = NewPMModulePass("addr13_noalias", addr13NoAlias)
 RewriteGenericMemoryPass() = NewPMModulePass("rewrite_generic_memory", rewrite_generic_memory!)
 
 function optimize!(mod::LLVM.Module, tm::LLVM.TargetMachine)
-    @dispose pb = NewPMPassBuilder() begin
+    return @dispose pb = NewPMPassBuilder() begin
         registerEnzymeAndPassPipeline!(pb)
         register!(pb, Addr13NoAliasPass())
         register!(pb, RewriteGenericMemoryPass())
@@ -51,7 +51,7 @@ function optimize!(mod::LLVM.Module, tm::LLVM.TargetMachine)
             add!(mpm, AlwaysInlinerPass())
             add!(mpm, NewPMFunctionPassManager()) do fpm
                 add!(fpm, AllocOptPass())
-            end            
+            end
 
             add!(mpm, GlobalOptPass())
             add!(mpm, NewPMFunctionPassManager()) do fpm
@@ -74,7 +74,7 @@ function optimize!(mod::LLVM.Module, tm::LLVM.TargetMachine)
                 add!(fpm, ReassociatePass())
                 add!(fpm, EarlyCSEPass())
                 add!(fpm, AllocOptPass())
-                add!(fpm, NewPMLoopPassManager(use_memory_ssa=true)) do lpm
+                add!(fpm, NewPMLoopPassManager(use_memory_ssa = true)) do lpm
                     add!(lpm, LoopIdiomRecognizePass())
                     add!(lpm, LoopRotatePass())
                     add!(lpm, LowerSIMDLoopPass())
@@ -89,7 +89,7 @@ function optimize!(mod::LLVM.Module, tm::LLVM.TargetMachine)
                     add!(lpm, IndVarSimplifyPass())
                     add!(lpm, LoopDeletionPass())
                 end
-                add!(fpm, LoopUnrollPass(opt_level=2))
+                add!(fpm, LoopUnrollPass(opt_level = 2))
                 add!(fpm, AllocOptPass())
                 add!(fpm, SROAPass())
                 add!(fpm, GVNPass())
@@ -120,7 +120,7 @@ function optimize!(mod::LLVM.Module, tm::LLVM.TargetMachine)
                 add!(fpm, JLInstSimplifyPass())
 
                 # GC passes
-                add!(fpm, GCInvariantVerifierPass(strong=false))
+                add!(fpm, GCInvariantVerifierPass(strong = false))
                 add!(fpm, SimplifyCFGPass())
                 add!(fpm, InstCombinePass())
                 add!(fpm, JLInstSimplifyPass())
@@ -158,7 +158,7 @@ function addOptimizationPasses!(mpm::LLVM.NewPMPassManager)
 
     add!(mpm, AlwaysInlinerPass())
 
-    add!(mpm, NewPMFunctionPassManager()) do fpm
+    return add!(mpm, NewPMFunctionPassManager()) do fpm
         # Running `memcpyopt` between this and `sroa` seems to give `sroa` a hard time
         # merging the `alloca` for the unboxed data and the `alloca` created by the `alloc_opt`
         # pass.
@@ -182,7 +182,7 @@ function addOptimizationPasses!(mpm::LLVM.NewPMPassManager)
         # remove those before optimizing loops.
         add!(fpm, AllocOptPass())
 
-        add!(fpm, NewPMLoopPassManager(use_memory_ssa=true)) do lpm
+        add!(fpm, NewPMLoopPassManager(use_memory_ssa = true)) do lpm
             add!(lpm, LoopRotatePass())
             # moving IndVarSimplify here prevented removing the loop in perf_sumcartesian(10:-1:1)
             add!(lpm, LoopIdiomRecognizePass())
@@ -198,7 +198,7 @@ function addOptimizationPasses!(mpm::LLVM.NewPMPassManager)
             add!(lpm, IndVarSimplifyPass())
             add!(lpm, LoopDeletionPass())
         end
-        add!(fpm, LoopUnrollPass(opt_level=2))
+        add!(fpm, LoopUnrollPass(opt_level = 2))
 
         # Run our own SROA on heap objects before LLVM's
         add!(fpm, AllocOptPass())
@@ -242,18 +242,18 @@ function addOptimizationPasses!(mpm::LLVM.NewPMPassManager)
 end
 
 function addMachinePasses!(mpm::LLVM.NewPMPassManager)
-    add!(mpm, NewPMFunctionPassManager()) do fpm
+    return add!(mpm, NewPMFunctionPassManager()) do fpm
         if VERSION < v"1.12.0-DEV.1390"
             add!(fpm, CombineMulAddPass())
         end
         add!(fpm, DivRemPairsPass())
         add!(fpm, DemoteFloat16Pass())
-        add!(fpm, GVNPass())              
+        add!(fpm, GVNPass())
     end
 end
 
 function addJuliaLegalizationPasses!(mpm::LLVM.NewPMPassManager, lower_intrinsics::Bool = true)
-    if lower_intrinsics
+    return if lower_intrinsics
         add!(mpm, NewPMFunctionPassManager()) do fpm
             add!(fpm, ReinsertGCMarkerPass())
             if VERSION < v"1.13.0-DEV.36"
@@ -275,7 +275,7 @@ function addJuliaLegalizationPasses!(mpm::LLVM.NewPMPassManager, lower_intrinsic
         end
         # We need these two passes and the instcombine below
         # after GC lowering to let LLVM do some constant propagation on the tags.
-        # and remove some unnecessary write barrier checks.        
+        # and remove some unnecessary write barrier checks.
         add!(mpm, NewPMFunctionPassManager()) do fpm
             add!(fpm, GVNPass())
             add!(fpm, SCCPPass())
@@ -288,10 +288,12 @@ function addJuliaLegalizationPasses!(mpm::LLVM.NewPMPassManager, lower_intrinsic
             add!(fpm, InstCombinePass())
             add!(fpm, JLInstSimplifyPass())
             aggressiveSimplifyCFGOptions =
-                (forward_switch_cond=true,
-                   switch_range_to_icmp=true,
-                   switch_to_lookup=true,
-                   hoist_common_insts=true)
+                (
+                forward_switch_cond = true,
+                switch_range_to_icmp = true,
+                switch_to_lookup = true,
+                hoist_common_insts = true,
+            )
             add!(fpm, SimplifyCFGPass(; aggressiveSimplifyCFGOptions...))
         end
     else
diff --git a/src/llvm/transforms.jl b/src/llvm/transforms.jl
index 5462fd43..67255c3a 100644
--- a/src/llvm/transforms.jl
+++ b/src/llvm/transforms.jl
@@ -2372,7 +2372,7 @@ end
 function rewrite_generic_memory!(mod::LLVM.Module)
     @static if VERSION < v"1.11-"
         return false
-    else    
+    else
         for f in functions(mod), bb in blocks(f)
             iter = LLVM.API.LLVMGetFirstInstruction(bb)
             while iter != C_NULL
@@ -2381,7 +2381,7 @@ function rewrite_generic_memory!(mod::LLVM.Module)
                 if !isa(inst, LLVM.LoadInst)
                     continue
                 end
-        
+
                 if isa(operands(inst)[1], LLVM.ConstantExpr)
                     legal2, obj = absint(inst)
                     if legal2 && obj isa Memory && obj == typeof(obj).instance

@vchuravy
Copy link
Member Author

Current status is that minimal functionality works.

autodiff(Forward, identity, Duplicated(1.0, 1.0))

also switching the NewPM pieces on for 1.11

@vchuravy
Copy link
Member Author

Currently hitting JuliaLLVM/LLVM.jl#528

Still needs to handle:

  • API.AddPreserveNVVMPass!

@vchuravy vchuravy changed the title Support NewPM Switch to NewPM Oct 28, 2025
@vchuravy
Copy link
Member Author

@wsmoses the 1.11 error is https://gist.github.com/vchuravy/dc0c635a3ebcc6eb85e7990b587f7956 perhaps due to a small change in optimization. Any ideas on how to fix that?

@wsmoses
Copy link
Member

wsmoses commented Oct 28, 2025

can we perhaps split this into smaller pieces to figure out where the differences in pass setup comes from?

@vchuravy
Copy link
Member Author

It somewhere in optimize! which is one gigantic function. I tried very hard to match 1:1 and 1.10 passes, so the question is could we make that detection more robust? It's also wild that it is only one test-case.

@wsmoses
Copy link
Member

wsmoses commented Oct 28, 2025

could we do a PR for everything except optimize, and then a separate one for otpimize.

I can review them both separately in depth and try to look at the error

@vchuravy
Copy link
Member Author

replaced by #2713 #2711 #2710

@vchuravy vchuravy closed this Oct 28, 2025
@vchuravy vchuravy deleted the vc/pm branch October 28, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Julia v1.12 Related to compatibility with Julia v1.12

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enzyme errors on simple example from documentation

4 participants