Skip to content

bench: Bench::keep() heap-allocates on every call, inflating micro-benchmark timings #3408

@bikallem

Description

@bikallem

Summary

Bench::keep() is intended to prevent the compiler from optimizing away benchmark results, but its implementation heap-allocates a Ref and constructs a trait object on every call. This adds ~6 ns of overhead per iteration on top of the existing closure dispatch cost, severely inflating measurements for fast operations.

Reproduction

// bench_wbtest.mbt

struct Point { x : Int; y : Int } derive(Show)

fn make_point() -> Point {
  { x: 42, y: 99 }
}

test (b : @bench.T) {
  // With keep: ~10 ns
  b.bench(name="with_keep", fn() { b.keep(make_point()) })

  // With let _: ~3.5 ns
  b.bench(name="with_let", fn() { let _ = make_point() })
}

make_point() is trivial (~1 ns), but b.keep() inflates it to ~10 ns. Using let _ = instead gives ~3.5 ns (remaining cost is closure dispatch overhead, see #3407).

Root cause

Bench::keep() implementation:

pub fn[Any] Bench::keep(self : Bench, value : Any) -> Unit {
  let trait_object : &OpaqueValue = @ref.new(value)
  self._storage = trait_object
}

Every call does:

  1. @ref.new(value) — heap-allocates a Ref[T] wrapper to box the value
  2. Trait object construction — creates an OpaqueValue trait object (vtable + pointer)
  3. moonbit_decref on the previous _storage value — frees the old Ref
  4. Store the new trait object into self._storage

From the generated C (per call):

// 1. Heap-allocate Ref[T]
ref_ptr = ref_new(value);

// 2. Construct trait object
trait_object = { vtable_id, ref_ptr };

// 3. Decref old storage
old = self->storage;
if (old.ptr) moonbit_decref(old.ptr);

// 4. Store new
self->storage = trait_object;

// 5. Decref self (bench object)
moonbit_decref(self);

This is 5 operations (including a heap allocation and deallocation) executed on every benchmark iteration, inside the timed region.

Overhead breakdown

Measured on x86-64 Linux:

Approach Time per iteration Overhead
b.keep(result) ~10 ns ~6 ns from keep + ~3 ns closure dispatch
sink = result (mutable local) ~7 ns ~3 ns from sink RC + ~3 ns closure dispatch
let _ = result ~3.5 ns ~3 ns closure dispatch only
Actual operation ~1 ns

Comparison with Go

Go's equivalent is assigning to a package-level variable:

var sink Point

func BenchmarkMakePoint(b *testing.B) {
    for i := 0; i < b.N; i++ {
        sink = makePoint()  // single store, no allocation
    }
}

This is a single memory store — no heap allocation, no reference counting, no trait object construction.

Suggestion

keep() should prevent dead-code elimination without heap-allocating. Possible approaches:

  • Volatile store: Write the value to a volatile memory location that the compiler cannot optimize away, without wrapping it in a Ref.
  • Compiler intrinsic: A #[no_optimize] annotation or black_box() intrinsic (similar to Rust's std::hint::black_box) that marks a value as observable without any runtime cost.
  • Opaque identity function: An extern function that takes and returns a value, preventing the compiler from reasoning about whether it's used, with zero actual overhead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions