Add heuristic to order harness codegen #4257

AlexanderPortland · 2025-07-31T19:08:12Z

Motivation

When compiling with more than a single thread (as in #4236), the order in which we codegen harnesses can have a non-negligable impact on performance. Specifically, if we handle harness that will generate a lot of code near the end of compilation, the main compiler thread can get stuck waiting for worker threads to export that code, slowing down overall compilation. If we could ensure that more difficult to export harnesses are handled first, the thread pool will have more time to write them to disk off the critical path of compilation.

Approach

This PR switches reachability analysis for all harnesses to happen before codegen. It can then use the information we gather from reachability to reorder codegen based on the total number of items reachable from each harness (MostReachableItems).

This heuristic was chosen because it is:

simple -- only requiring a single field access for each harness, and taking only ~10µs to fully sort a typical CodegenUnit of 66 harnesses (acceptable given it took ~20ms to codegen the fastest harnesses in that same CodegenUnit)
seemingly effective -- on the s2n-codec benchmark it only ever places a slower-to-export harness after a faster one if their actual export times were within 10ms (or 7%). Since the goal is just to roughly order harnesses based on order of magnitude, this is a very good result 🎉

Results

This change made a small, but not insignificant 2.9% reduction in the end to end compile time of the standard library (@ commit 177d0fd).

And below is a recreation of the graph of the compiler's per-thread CPU usage from #4248 now that codegen is ordered.

While functionally complete on its own, this change will not really make a performance impact until #4236 is merged.

Resolves #4248

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

tautschnig · 2025-08-01T11:07:11Z

Let's get #4236 merged first so that this PR can get concrete data points in its PR description.

tautschnig · 2025-08-04T10:20:55Z

#4236 has now been merged, so this one should be ready to be acted upon.

AlexanderPortland · 2025-08-05T20:07:28Z

kani-compiler/src/kani_middle/codegen_order.rs

+        self.map(|mut harnesses| {
+            reorder_harnesses::<H>(&mut harnesses);
+            harnesses
+        })


Suggested change

})

}).collect::<Vec<_>>().into_iter()

I could not tell you why tbh, but testing is showing that forcing all of the ordering to happen up front with a collect rather than lazily with a map makes this change go from slightly worse to significantly better.

I need to look into why this is, as collecting seems to sneakily guzzle memory (peak goes from normal ~2 GB to ~6 GB), I think bc it requires one transformer per codegen unit to be kept in memory? If this change is doable without that drawback, it may be worthwhile but, if not, it should totally be dropped

The cause of the memory overhead is that, now that we are doing reachability in a single pass first, we're inadvertently storing the reachable call tree for every harness while we figure out which order to codegen them in. Definitely not ideal as this takes up a ton of space :(. I'll look into how to fix this and if other heuristics could help

zhassan-aws

Thanks! Is it possible to add a test that ensures the order is what we expect it to be?

Kani currently has a special `handle_quantifiers()` pass after codegen that we need to recursively inline function calls inside quantifier expressions as needed by our hooks. This has to be done for every value in the symbol table, but our current implementation implicitly clones **all `Stmt` type values**, even if they don't contain any quantifiers and wouldn't need to change. This makes the pass currently take up **~7% of codegen on crates that don't even use quantifiers**. Now, the `handle_quantifiers_in_stmt()` function just returns an option with a new `Stmt` if inlining was needed, or `None` otherwise, allowing us to avoid any allocations and even needing to insert back into the table in the common case where a `Stmt` doesn't contain any quantifiers. ### Results Flamegraphing shows that this change reduces the `handle_quantifiers()` pass from 7.1% to just 0.8% of codegen for the `s2n-codec` crate. There was a much less significant change on larger codebases like the standard library, which only saw **a <1% improvement in end to end compile time** (at std commit [177d0fd](model-checking/verify-rust-std@177d0fd), assuming #4257 & #4259 are merged into Kani). That being said, there was a significant **1-7% decrease in codegen time** for individual crates of the standard library, so it seems to still make a difference, even if not reflected in e2e time. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

Internally, Kani uses the `call_with_panic_debug_info()` function to call closures with a string that explains what they're trying to do, allowing better error messages if they internally panics. However, this function is currently called for every harness we codegen and has to pretty format the harness' function instance--something that calls into rustc internals and takes a non-negligible amount of time. We should instead accept a closure that can generate debug info, but only when (and if) it's needed. ### Results Based on initial flamegraphing, the formatting passed to `call_with_panic_debug_info()` went from taking 2.3% of codegen on `s2n-codec` to not being called a single time (since there are no compiler errors in a typical execution). This change made a solid **4.6% reduction in the end to end compile time of the standard library** (at std commit [177d0fd](model-checking/verify-rust-std@177d0fd); assuming #4257, #4259 & #4268 are already merged into Kani). By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

Currently, we have to create a new `BodyTransformer` for each harness individually. However, this takes a lot of time (as we have to constantly re-validate all our Kani functions and reinitialize everything based on the `QueryDb`) and the transformer's options are all shared between harnesses of the same codegen unit anyway. This PR makes the `BodyTransformer` struct `Clone`-able, allowing us to initialize a single 'template' transformer for each codegen unit and then cheaply clone the template for each harness within the unit. Based on testing, the clone takes just ~3µs when using the default # of transformation passes, which is much faster than initialization. ### Results This change made a noticeable **4.8% reduction in the end to end compile time of the standard library** (at std commit [177d0fd](model-checking/verify-rust-std@177d0fd) & assuming #4257 is merged into Kani). By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

AlexanderPortland added 2 commits July 31, 2025 10:57

do reachability for all harnesses before codegen

3899557

apply heuristics to order harness codegen

96e9c30

AlexanderPortland requested a review from a team as a code owner July 31, 2025 19:08

github-actions bot added Z-EndToEndBenchCI Tag a PR to run benchmark CI Z-CompilerBenchCI Tag a PR to run benchmark CI labels Jul 31, 2025

remove accidental print statement

7230032

tautschnig assigned AlexanderPortland Aug 1, 2025

Merge branch 'main' into codegen-ordering

697bd53

AlexanderPortland commented Aug 5, 2025

View reviewed changes

zhassan-aws reviewed Aug 5, 2025

View reviewed changes

This was referenced Aug 5, 2025

Clone a template BodyTransformer to avoid re-initialization #4259

Merged

Avoid updating irrelevant symbols when handling quantifiers #4268

Merged

Lazily evaluate debug info #4269

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add heuristic to order harness codegen #4257

Add heuristic to order harness codegen #4257

Uh oh!

AlexanderPortland commented Jul 31, 2025 •

edited

Loading

Uh oh!

tautschnig commented Aug 1, 2025

Uh oh!

tautschnig commented Aug 4, 2025

Uh oh!

AlexanderPortland Aug 5, 2025

Uh oh!

AlexanderPortland Aug 14, 2025

Uh oh!

AlexanderPortland Aug 14, 2025

Uh oh!

zhassan-aws left a comment

Uh oh!

Uh oh!

Add heuristic to order harness codegen #4257

Are you sure you want to change the base?

Add heuristic to order harness codegen #4257

Uh oh!

Conversation

AlexanderPortland commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Approach

Results

Uh oh!

tautschnig commented Aug 1, 2025

Uh oh!

tautschnig commented Aug 4, 2025

Uh oh!

AlexanderPortland Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

AlexanderPortland Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

AlexanderPortland Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

zhassan-aws left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AlexanderPortland commented Jul 31, 2025 •

edited

Loading