Skip to content

[Feat] runtime scaling and flamegraph benchmark CLIs#117

Open
leifdenby wants to merge 2 commits intomllam:mainfrom
leifdenby:feat/runtime-scaling-flamegraph-benchmarks
Open

[Feat] runtime scaling and flamegraph benchmark CLIs#117
leifdenby wants to merge 2 commits intomllam:mainfrom
leifdenby:feat/runtime-scaling-flamegraph-benchmarks

Conversation

@leifdenby
Copy link
Copy Markdown
Member

@leifdenby leifdenby commented Mar 25, 2026

Describe your changes

It is clear that the current implementation of weather-model-graphs can lead to quite long runtime with the number of grid points in dataset that people have been working (for example @observingClouds and @joeloskarsson have seen this).

To try and quantify the scaling and to try an identify where improvements could be made this PR adds two CLI scripts in tests/benchmarks/:

  1. A runtime-scaling script (uv run python -m tests.benchmarks.graph_creation_scaling) that executes graph creation over progressively larger input grid sizes. It generates a plot of execution time vs. number of input nodes alongside a linear $O(N)$ reference line, allowing us to easily visualize how the algorithm's performance degrades as coordinate sizes grow.
  2. A hierarchical profiling script (uv run python -m tests.benchmarks.graph_creation_flamegraph) powered by pyinstrument. It generates and automatically serves an interactive HTML call-stack flamegraph, making it simple to pinpoint exactly which internal functions (e.g., scipy KDTree queries, or networkx node iteration) are responsible for performance bottlenecks.

I thought these CLI tools could provide a baseline for future optimisation work - we could even eventually add CI tests checking that the execution time doesn't drastically blow up as we work on the codebase.

  • pyinstrument has been added to the dev dependencies in pyproject.toml to support the interactive flamegraphs.

Runtime scaling benchmark

flat flat_multiscale hierarchical
scaling_keisler scaling_graphcast scaling_oskarsson_hierarchical

The good news is that runtime scales roughly linearly with number of input coord positions, of course this will vary with coordinate density, layout etc, this is for regular rectilinearly laid out input grid nodes.

The bad news is that for order $10^5$ nodes the runtime (at least on my laptop) is order ~6s, about $10^6$ we are looking at runtime of a minute, but another order of magnitude $10^7$ would be an hour.

But, I think there are some clear optimisations we can do, which brings me to the flamegraphs...

Flamegraph profiling script

I'm attaching here two screenshots taken from the browser window that opens when you run the flamegraph benchmark script. This is the pyinstrument interface:

Call Stack Timeline
Screenshot 2026-03-25 at 16 40 00 Screenshot 2026-03-25 at 16 40 06

I've included the .html-profile result page too) in case someone wants to look further into this example.

It looks to me (as I expected) that there a repeated calls inside connect_nodes_across_graphs which could be vectorized quite easily where we would query many points at once and create all the edges with one call on the networkx.DiGraph object being worked on. However, the composing of subgraphs together and relabelling them at the end appears to be quite costly too, the former might require a different datastructure that nx.DiGraphs which might require a rethink, for the latter maybe we can do something smarter (it shouldn't be that costly I don't think...)

It is not my aim to start a long discussion here about optimisation that could be done. I suggest we open a separate issue for that. Instead, I would simply like to add these tools so that we can establish a way to investigate the issue and plan a course of action.

Issue Link

This PR supplements the CPU vs GPU based benchmark that #62 is introducing.

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

@leifdenby leifdenby added this to the v0.5.0 (proposed) milestone Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant