Summary
Bench::keep() is intended to prevent the compiler from optimizing away benchmark results, but its implementation heap-allocates a Ref and constructs a trait object on every call. This adds ~6 ns of overhead per iteration on top of the existing closure dispatch cost, severely inflating measurements for fast operations.
Reproduction
// bench_wbtest.mbt
struct Point { x : Int; y : Int } derive(Show)
fn make_point() -> Point {
{ x: 42, y: 99 }
}
test (b : @bench.T) {
// With keep: ~10 ns
b.bench(name="with_keep", fn() { b.keep(make_point()) })
// With let _: ~3.5 ns
b.bench(name="with_let", fn() { let _ = make_point() })
}
make_point() is trivial (~1 ns), but b.keep() inflates it to ~10 ns. Using let _ = instead gives ~3.5 ns (remaining cost is closure dispatch overhead, see #3407).
Root cause
Bench::keep() implementation:
pub fn[Any] Bench::keep(self : Bench, value : Any) -> Unit {
let trait_object : &OpaqueValue = @ref.new(value)
self._storage = trait_object
}
Every call does:
@ref.new(value) — heap-allocates a Ref[T] wrapper to box the value
- Trait object construction — creates an
OpaqueValue trait object (vtable + pointer)
moonbit_decref on the previous _storage value — frees the old Ref
- Store the new trait object into
self._storage
From the generated C (per call):
// 1. Heap-allocate Ref[T]
ref_ptr = ref_new(value);
// 2. Construct trait object
trait_object = { vtable_id, ref_ptr };
// 3. Decref old storage
old = self->storage;
if (old.ptr) moonbit_decref(old.ptr);
// 4. Store new
self->storage = trait_object;
// 5. Decref self (bench object)
moonbit_decref(self);
This is 5 operations (including a heap allocation and deallocation) executed on every benchmark iteration, inside the timed region.
Overhead breakdown
Measured on x86-64 Linux:
| Approach |
Time per iteration |
Overhead |
b.keep(result) |
~10 ns |
~6 ns from keep + ~3 ns closure dispatch |
sink = result (mutable local) |
~7 ns |
~3 ns from sink RC + ~3 ns closure dispatch |
let _ = result |
~3.5 ns |
~3 ns closure dispatch only |
| Actual operation |
~1 ns |
— |
Comparison with Go
Go's equivalent is assigning to a package-level variable:
var sink Point
func BenchmarkMakePoint(b *testing.B) {
for i := 0; i < b.N; i++ {
sink = makePoint() // single store, no allocation
}
}
This is a single memory store — no heap allocation, no reference counting, no trait object construction.
Suggestion
keep() should prevent dead-code elimination without heap-allocating. Possible approaches:
- Volatile store: Write the value to a volatile memory location that the compiler cannot optimize away, without wrapping it in a
Ref.
- Compiler intrinsic: A
#[no_optimize] annotation or black_box() intrinsic (similar to Rust's std::hint::black_box) that marks a value as observable without any runtime cost.
- Opaque identity function: An
extern function that takes and returns a value, preventing the compiler from reasoning about whether it's used, with zero actual overhead.
Summary
Bench::keep()is intended to prevent the compiler from optimizing away benchmark results, but its implementation heap-allocates aRefand constructs a trait object on every call. This adds ~6 ns of overhead per iteration on top of the existing closure dispatch cost, severely inflating measurements for fast operations.Reproduction
make_point()is trivial (~1 ns), butb.keep()inflates it to ~10 ns. Usinglet _ =instead gives ~3.5 ns (remaining cost is closure dispatch overhead, see #3407).Root cause
Bench::keep()implementation:Every call does:
@ref.new(value)— heap-allocates aRef[T]wrapper to box the valueOpaqueValuetrait object (vtable + pointer)moonbit_decrefon the previous_storagevalue — frees the oldRefself._storageFrom the generated C (per call):
This is 5 operations (including a heap allocation and deallocation) executed on every benchmark iteration, inside the timed region.
Overhead breakdown
Measured on x86-64 Linux:
b.keep(result)sink = result(mutable local)let _ = resultComparison with Go
Go's equivalent is assigning to a package-level variable:
This is a single memory store — no heap allocation, no reference counting, no trait object construction.
Suggestion
keep()should prevent dead-code elimination without heap-allocating. Possible approaches:Ref.#[no_optimize]annotation orblack_box()intrinsic (similar to Rust'sstd::hint::black_box) that marks a value as observable without any runtime cost.externfunction that takes and returns a value, preventing the compiler from reasoning about whether it's used, with zero actual overhead.