Skip to content

Vibing a more performant ocaml sexp parser than Jane Street

Notifications You must be signed in to change notification settings

hdresearch/parsexp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

parsexp

The goal of this project was to vibe a more performant and memory efficient parser in OCaml than Jane Street with cloud coding agents and it was a success.

Phases

Phase 1

After being given the original code and existing test suite, I asked the agent to create a parser that's more performant than the existing one and to include benchmarks for measurable scores. It made benchmarks for performance and memory with a few of the categories being actually more performant than the original parser. All being done in exactly 69 steps.

However, it wasn't instructed to thoroughly beat all the benchmarks which resulted in an upsetting outcome.

Phase 2

Using the minimum context (tests and parser equivalence checks but no underlying logic), a new session was created with the explicit instruction of beating all the benchmarks in terms of performance and memory.

This was a success but contained a few correctness issues (ie specific character sequences or atoms that aren't covered by the test suite).

Phase 3

The prior phases reveal two key things:

  1. A coding agent can be fired off and work till a test suite is fulfilled
  2. A single agent can write more performant Ocaml code than Jane Street

Which resulted in the third phase of this project where: a swarm is spun up with a subordinate to make a parser which beats benchmarks while passing tests, a subordinate to make more tests for strings that aren't currently covered, and a subordinate to make additional 'benchmarks' to compare against the original parser.

Benchmark Results

The results speak for themselves: all 7 original benchmarks won in both speed and memory (up to 3.07× faster, up to 5.75× less memory), plus the test agent generated 370 new test cases — see the Phase 3 test results for details.

Phase 2 vs Phase 3

The Phase 3 parser is 239 lines vs Phase 2's 378 (37% smaller), using direct mutual recursion instead of array-stack machinery. It trades some performance on list-heavy shapes for a simpler, better-tested parser: Phase 3 wins 4/7 benchmarks on speed (up to 47% faster on quoted strings) while Phase 2 holds an edge on large flat lists and deep nesting. The real gain is in coverage — 370 tests vs ~30, and 19 benchmark scenarios vs 7.

Ergo, distributing work among purpose-specific agents in a swarm can result with more performant outcomes than an individual one!

Verifying Benchmark Results Locally

Prerequisites

  • OCaml 5.2+ (tested with 5.3.0)
  • opam with the following packages installed:
    opam install dune sexplib0 parsexp core_bench

Running Tests

Each phase is a self-contained dune project. From the repo root:

eval $(opam env)

# Phase 1 — fast_parsexp (4/7 benchmarks won)
cd fast_parsexp && dune runtest && cd ..

# Phase 2 — benchmarked_parsexp (7/7 benchmarks won, all tests pass)
cd benchmarked_parsexp && dune runtest && cd ..

# Phase 3 — self_improving_parsexp (7/7 benchmarks won, 5 known test failures)
cd self_improving_parsexp && dune runtest && cd ..

Phase 3 will exit with code 1 due to 5 known failures in the comprehensive test suite. The 5 failures are from two bugs: ## atom parsing and comments before close-parens in lists. See the Phase 3 README for a full breakdown.

Running Benchmarks

Benchmarks use Jane Street's core_bench. Each run takes ~2–3 minutes:

eval $(opam env)

# Phase 1
cd fast_parsexp && dune exec bench/bench.exe && cd ..

# Phase 2
cd benchmarked_parsexp && dune exec bench/bench.exe && cd ..

# Phase 3 — core 7 benchmarks
cd self_improving_parsexp && dune exec bench/bench.exe && cd ..

Each benchmark prints a core_bench table comparing the custom parser against parsexp on the same inputs, followed by a memory usage analysis. Lower Time/Run and mWd/Run numbers are better.

Phase 3 also has 12 extended benchmarks (dune exec bench/bench_extra.exe), but these use large inputs (up to 100KB) and may take significantly longer to run.

What to Look For

  • Speed: In the core_bench table, compare Time/Run between (Benchmarked_parsexp) and (Parsexp) rows for each benchmark.
  • Memory: The mWd/Run column shows minor-heap words allocated per run. The memory analysis section at the bottom prints explicit ratios.
  • Correctness: dune runtest runs equivalence tests comparing output against parsexp as the reference implementation. Phases 1 and 2 should pass cleanly. Phase 3 has the 5 known failures documented above.

About

Vibing a more performant ocaml sexp parser than Jane Street

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published