-
Notifications
You must be signed in to change notification settings - Fork 17
Refactor examples with common interface, plotting and benchmarking #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
1200f0b
Add Sugarscape IG examples with Mesa and Mesa-Frames backends
adamamer20 aadee32
Update .gitignore and pyproject.toml for benchmarks and new dependencies
adamamer20 96e0ac8
Update README.md: Remove redundant documentation section on related d…
adamamer20 9ebbf6a
Merge branch 'main' into split/examples-benchmarks
Ben-geo 61172f3
Merge branch 'main' into split/examples-benchmarks
Ben-geo 288e7ed
Merge branch 'split/examples-benchmarks' of https://github.com/projec…
adamamer20 acb0d95
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d0cfd44
Add clarification comment for benchmarking initialization time in _pa…
adamamer20 266f854
Refactor run function to use Optional[Path] for results_dir and compu…
adamamer20 c69a15e
update uv.lock
adamamer20 ec3cccd
Fix typo in README for results-dir option description
adamamer20 af68918
Add user feedback for saved results in run function
adamamer20 d62d440
Remove unused imports from backend_mesa.py
adamamer20 5cf28ed
Add confirmation message for saved CSV results in run function
adamamer20 380d5fd
Remove unnecessary blank line in run function
adamamer20 c5bcce4
Remove redundant seed value assignment in run function
adamamer20 a16a0a0
Fix model type annotation in AntAgent constructor
adamamer20 5f86385
Fix hyphenation in README for clarity on agents' population dynamics
adamamer20 c9f6369
Remove unused pandas import from model.py
adamamer20 1862147
Enhance legend styling in plot functions for better readability acros…
adamamer20 7c84351
Enhance run command to support multiple model and agent inputs for im…
adamamer20 6b7c369
Merge branch 'main' of https://github.com/projectmesa/mesa-frames int…
adamamer20 38bad2f
Merge branch 'split/examples-benchmarks' of https://github.com/projec…
adamamer20 7db5334
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 9fa6f5c
Remove unused pandas import from backend_mesa.py
adamamer20 6e5bfbb
Merge branch 'split/examples-benchmarks' of https://github.com/projec…
adamamer20 1aab6ca
Fix documentation links in agents.py and model.py to point to the cor…
adamamer20 22cface
Refactor gini function to simplify sugar array sorting
adamamer20 d7cbbca
Fix order of exports in plotting.py to include plot_model_metrics
adamamer20 43a0160
Enhance CLI output for benchmark results and add tests for CSV saving…
adamamer20 277884b
Format code for better readability in benchmark and sugarscape tests
adamamer20 148b31c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 991eb8a
Merge branch 'split/examples-benchmarks' of https://github.com/projec…
adamamer20 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,88 @@ | ||
| # Benchmarks | ||
|
|
||
| Performance benchmarks compare Mesa Frames backends ("frames") with classic Mesa ("mesa") | ||
| implementations for a small set of representative models. They help track runtime scaling | ||
| and regressions. | ||
|
|
||
| Currently included models: | ||
|
|
||
| - **boltzmann**: Simple wealth exchange ("Boltzmann wealth") model. | ||
| - **sugarscape**: Sugarscape Immediate Growback variant (square grid sized relative to agent count). | ||
|
|
||
| ## Quick start | ||
|
|
||
| ```bash | ||
| uv run benchmarks/cli.py | ||
| ``` | ||
|
|
||
| That command (with defaults) will: | ||
|
|
||
| - Benchmark both models (`boltzmann`, `sugarscape`). | ||
| - Use agent counts 1000, 2000, 3000, 4000, 5000. | ||
| - Run 100 steps per simulation. | ||
| - Repeat each configuration once. | ||
| - Save CSV results and generate plots. | ||
|
|
||
| ## CLI options | ||
|
|
||
| Invoke `uv run benchmarks/cli.py --help` to see full help. Key options: | ||
|
|
||
| | Option | Default | Description | | ||
| | ------ | ------- | ----------- | | ||
| | `--models` | `all` | Comma list or `all`; accepted: `boltzmann`, `sugarscape`. | | ||
| | `--agents` | `1000:5000:1000` | Single int or range `start:stop:step`. | | ||
| | `--steps` | `100` | Steps per simulation run. | | ||
| | `--repeats` | `1` | How many repeats per (model, backend, agents) config. Seed increments per repeat. | | ||
| | `--seed` | `42` | Base RNG seed. Incremented by repeat index. | | ||
| | `--save / --no-save` | `--save` | Persist per‑model CSVs. | | ||
| | `--plot / --no-plot` | `--plot` | Generate scaling plots (PNG + possibly other formats). | | ||
| | `--results-dir` | `benchmarks/results` | Root directory that will receive a timestamped subdirectory. | | ||
|
|
||
| Range parsing: `A:B:S` includes `A, A+S, ... <= B`. Final value > B is dropped. | ||
|
|
||
| ## Output layout | ||
|
|
||
| Each invocation uses a single UTC timestamp, e.g. `20251016_173702`: | ||
|
|
||
| ```text | ||
| benchmarks/ | ||
| results/ | ||
| 20251016_173702/ | ||
| boltzmann_perf_20251016_173702.csv | ||
| sugarscape_perf_20251016_173702.csv | ||
| plots/ | ||
| boltzmann_runtime_20251016_173702_dark.png | ||
| sugarscape_runtime_20251016_173702_dark.png | ||
| ... (other themed variants if enabled) | ||
| ``` | ||
|
|
||
| CSV schema (one row per completed run): | ||
|
|
||
| | Column | Meaning | | ||
| | ------ | ------- | | ||
| | `model` | Model key (`boltzmann`, `sugarscape`). | | ||
| | `backend` | `mesa` or `frames`. | | ||
| | `agents` | Agent count for that run. | | ||
| | `steps` | Steps simulated. | | ||
| | `seed` | Seed used (base seed + repeat index). | | ||
| | `repeat_idx` | Repeat counter starting at 0. | | ||
| | `runtime_seconds` | Wall-clock runtime for that run. | | ||
| | `timestamp` | Shared timestamp identifier for the benchmark batch. | | ||
|
|
||
| ## Performance tips | ||
|
|
||
| - Ensure the environment variable `MESA_FRAMES_RUNTIME_TYPECHECKING` is **unset** or set to `0` / `false` when collecting performance numbers. Enabling it adds runtime type validation overhead and the CLI will warn you. | ||
| - Run multiple repeats (`--repeats 5`) to smooth variance. | ||
|
|
||
| ## Extending benchmarks | ||
|
|
||
| To benchmark an additional model: | ||
|
|
||
| 1. Add or import both a Mesa implementation and a Frames implementation exposing a `simulate(agents:int, steps:int, seed:int|None, ...)` function. | ||
| 2. Register it in `benchmarks/cli.py` inside the `MODELS` dict with two backends (names must be `mesa` and `frames`). | ||
| 3. Ensure any extra spatial parameters are derived from `agents` inside the runner lambda (see sugarscape example). | ||
| 4. Run the CLI to verify new CSV columns still align. | ||
|
|
||
| ## Related documentation | ||
|
|
||
| See `docs/user-guide/5_benchmarks.md` (user-facing narrative) and the main project `README.md` for overall context. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,266 @@ | ||
| """Typer CLI for running mesa vs mesa-frames performance benchmarks.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from dataclasses import dataclass | ||
| from datetime import datetime, timezone | ||
| import os | ||
| from pathlib import Path | ||
| from time import perf_counter | ||
| from typing import Literal, Annotated, Protocol, Optional | ||
|
|
||
| import math | ||
| import polars as pl | ||
| import typer | ||
|
|
||
| from examples.boltzmann_wealth import backend_frames as boltzmann_frames | ||
| from examples.boltzmann_wealth import backend_mesa as boltzmann_mesa | ||
| from examples.sugarscape_ig.backend_frames import model as sugarscape_frames | ||
| from examples.sugarscape_ig.backend_mesa import model as sugarscape_mesa | ||
| from examples.plotting import ( | ||
| plot_performance as _examples_plot_performance, | ||
| ) | ||
|
|
||
| app = typer.Typer(add_completion=False) | ||
|
|
||
|
|
||
| class RunnerP(Protocol): | ||
| def __call__(self, agents: int, steps: int, seed: int | None = None) -> None: ... | ||
|
|
||
|
|
||
| @dataclass(slots=True) | ||
| class Backend: | ||
| name: Literal["mesa", "frames"] | ||
| runner: RunnerP | ||
|
|
||
|
|
||
| @dataclass(slots=True) | ||
| class ModelConfig: | ||
| name: str | ||
| backends: list[Backend] | ||
|
|
||
|
|
||
| MODELS: dict[str, ModelConfig] = { | ||
| "boltzmann": ModelConfig( | ||
| name="boltzmann", | ||
| backends=[ | ||
| Backend(name="mesa", runner=boltzmann_mesa.simulate), | ||
| Backend(name="frames", runner=boltzmann_frames.simulate), | ||
| ], | ||
| ), | ||
| "sugarscape": ModelConfig( | ||
| name="sugarscape", | ||
| backends=[ | ||
| Backend( | ||
| name="mesa", | ||
| runner=lambda agents, steps, seed=None: sugarscape_mesa.simulate( | ||
| agents=agents, | ||
| steps=steps, | ||
| width=int(max(20, math.ceil((agents) ** 0.5) * 2)), | ||
| height=int(max(20, math.ceil((agents) ** 0.5) * 2)), | ||
| seed=seed, | ||
| ), | ||
| ), | ||
| Backend( | ||
| name="frames", | ||
| # Benchmarks expect a runner signature (agents:int, steps:int, seed:int|None) | ||
| # Sugarscape frames simulate requires width/height; choose square close to agent count. | ||
| runner=lambda agents, steps, seed=None: sugarscape_frames.simulate( | ||
| agents=agents, | ||
| steps=steps, | ||
| width=int(max(20, math.ceil((agents) ** 0.5) * 2)), | ||
| height=int(max(20, math.ceil((agents) ** 0.5) * 2)), | ||
| seed=seed, | ||
| ), | ||
| ), | ||
| ], | ||
| ), | ||
| } | ||
|
|
||
|
|
||
| def _parse_agents(value: str) -> list[int]: | ||
| value = value.strip() | ||
| if ":" in value: | ||
| parts = value.split(":") | ||
| if len(parts) != 3: | ||
| raise typer.BadParameter("Ranges must use start:stop:step format") | ||
| try: | ||
| start, stop, step = (int(part) for part in parts) | ||
| except ValueError as exc: | ||
| raise typer.BadParameter("Range values must be integers") from exc | ||
| if step <= 0: | ||
| raise typer.BadParameter("Step must be positive") | ||
| if start < 0 or stop <= 0: | ||
| raise typer.BadParameter("Range endpoints must be positive") | ||
| if start > stop: | ||
| raise typer.BadParameter("Range start must be <= stop") | ||
| counts = list(range(start, stop + step, step)) | ||
| if counts[-1] > stop: | ||
| counts.pop() | ||
| return counts | ||
| try: | ||
| agents = int(value) | ||
| except ValueError as exc: # pragma: no cover - defensive | ||
| raise typer.BadParameter("Agent count must be an integer") from exc | ||
| if agents <= 0: | ||
| raise typer.BadParameter("Agent count must be positive") | ||
| return [agents] | ||
|
|
||
|
|
||
| def _parse_models(value: str) -> list[str]: | ||
| """Parse models option into a list of model keys. | ||
|
|
||
| Accepts: | ||
| - "all" -> returns all available model keys | ||
| - a single model name -> returns [name] | ||
| - a comma-separated list of model names -> returns list | ||
|
|
||
| Validates that each selected model exists in MODELS. | ||
| """ | ||
| value = value.strip() | ||
| if value == "all": | ||
| return list(MODELS.keys()) | ||
| # support comma-separated lists | ||
| parts = [part.strip() for part in value.split(",") if part.strip()] | ||
| if not parts: | ||
| raise typer.BadParameter("Model selection must not be empty") | ||
| unknown = [p for p in parts if p not in MODELS] | ||
| if unknown: | ||
| raise typer.BadParameter(f"Unknown model selection: {', '.join(unknown)}") | ||
| # preserve order and uniqueness | ||
| seen = set() | ||
| result: list[str] = [] | ||
| for p in parts: | ||
| if p not in seen: | ||
| seen.add(p) | ||
| result.append(p) | ||
| return result | ||
|
|
||
|
|
||
| def _plot_performance( | ||
| df: pl.DataFrame, model_name: str, output_dir: Path, timestamp: str | ||
| ) -> None: | ||
| """Wrap examples.plotting.plot_performance to ensure consistent theming. | ||
|
|
||
| The original benchmark implementation used simple seaborn styles (whitegrid / darkgrid). | ||
| Our example plotting utilities define a much darker, high-contrast *true* dark theme | ||
| (custom rc params overriding bg/fg colors). Reuse that logic here so the | ||
| benchmark dark plots match the example dark plots users see elsewhere. | ||
| """ | ||
| if df.is_empty(): | ||
| return | ||
| stem = f"{model_name}_runtime_{timestamp}" | ||
| _examples_plot_performance( | ||
| df.select(["agents", "runtime_seconds", "backend"]), | ||
| output_dir=output_dir, | ||
| stem=stem, | ||
| # Prefer more concise, publication-style wording | ||
| title=f"{model_name.title()} runtime scaling", | ||
| ) | ||
|
|
||
|
|
||
| @app.command() | ||
| def run( | ||
| models: Annotated[ | ||
| str, | ||
| typer.Option( | ||
| help="Models to benchmark: boltzmann, sugarscape, or all", | ||
| callback=_parse_models, | ||
| ), | ||
| ] = "all", | ||
| agents: Annotated[ | ||
| str, | ||
| typer.Option( | ||
| help="Agent count or range (start:stop:step)", callback=_parse_agents | ||
| ), | ||
| ] = "1000:5000:1000", | ||
| steps: Annotated[ | ||
| int, | ||
| typer.Option( | ||
| min=0, | ||
| help="Number of steps per run.", | ||
| ), | ||
| ] = 100, | ||
| repeats: Annotated[int, typer.Option(help="Repeats per configuration.", min=1)] = 1, | ||
| seed: Annotated[int, typer.Option(help="Optional RNG seed.")] = 42, | ||
| save: Annotated[bool, typer.Option(help="Persist benchmark CSV results.")] = True, | ||
| plot: Annotated[bool, typer.Option(help="Render performance plots.")] = True, | ||
| results_dir: Annotated[ | ||
| Path, | ||
| typer.Option( | ||
| help=( | ||
| "Base directory for benchmark outputs. A timestamped subdirectory " | ||
| "(e.g. results/20250101_120000) is created with CSV files at the root " | ||
| "and a 'plots/' subfolder for images." | ||
| ), | ||
| ), | ||
| ] = Path(__file__).resolve().parent / "results", | ||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ) -> None: | ||
| """Run performance benchmarks for the models models.""" | ||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| runtime_typechecking = os.environ.get("MESA_FRAMES_RUNTIME_TYPECHECKING", "") | ||
| if runtime_typechecking and runtime_typechecking.lower() not in {"0", "false"}: | ||
| typer.secho( | ||
| "Warning: MESA_FRAMES_RUNTIME_TYPECHECKING is enabled; benchmarks may run significantly slower.", | ||
| fg=typer.colors.YELLOW, | ||
| ) | ||
| rows: list[dict[str, object]] = [] | ||
| # Single timestamp per CLI invocation so all model results are co-located. | ||
| timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S") | ||
| # Create unified output layout: <results_dir>/<timestamp>/{CSV files, plots/} | ||
| base_results_dir = results_dir | ||
| timestamp_dir = (base_results_dir / timestamp).resolve() | ||
| plots_subdir: Path = timestamp_dir / "plots" | ||
| for model in models: | ||
| config = MODELS[model] | ||
| typer.echo(f"Benchmarking {model} with agents {agents}") | ||
| for agents_count in agents: | ||
| for repeat_idx in range(repeats): | ||
| run_seed = seed + repeat_idx | ||
| for backend in config.backends: | ||
| start = perf_counter() | ||
| backend.runner(agents_count, steps, run_seed) | ||
| runtime = perf_counter() - start | ||
| rows.append( | ||
| { | ||
| "model": model, | ||
| "backend": backend.name, | ||
| "agents": agents_count, | ||
| "steps": steps, | ||
| "seed": run_seed, | ||
| "repeat_idx": repeat_idx, | ||
| "runtime_seconds": runtime, | ||
| "timestamp": timestamp, | ||
| } | ||
| ) | ||
| # Report completion of this run to the CLI | ||
| typer.echo( | ||
| f"Completed {backend.name} for model={model} agents={agents_count} steps={steps} seed={run_seed} repeat={repeat_idx} in {runtime:.3f}s" | ||
| ) | ||
| # Finished all runs for this model | ||
| typer.echo(f"Finished benchmarking model {model}") | ||
|
|
||
| if not rows: | ||
| typer.echo("No benchmark data collected.") | ||
| return | ||
| df = pl.DataFrame(rows) | ||
| if save: | ||
| timestamp_dir.mkdir(parents=True, exist_ok=True) | ||
| for model in models: | ||
| model_df = df.filter(pl.col("model") == model) | ||
| csv_path = timestamp_dir / f"{model}_perf_{timestamp}.csv" | ||
| model_df.write_csv(csv_path) | ||
| typer.echo(f"Saved {model} results to {csv_path}") | ||
| if plot: | ||
| plots_subdir.mkdir(parents=True, exist_ok=True) | ||
| for model in models: | ||
| model_df = df.filter(pl.col("model") == model) | ||
| _plot_performance(model_df, model, plots_subdir, timestamp) | ||
| typer.echo(f"Saved {model} plots under {plots_subdir}") | ||
|
|
||
| typer.echo( | ||
| f"Unified benchmark outputs written under {timestamp_dir} (CSV files) and {plots_subdir} (plots)" | ||
| ) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| app() | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.