[Feat] runtime scaling and flamegraph benchmark CLIs by leifdenby · Pull Request #117 · mllam/weather-model-graphs

leifdenby · 2026-03-25T14:56:04Z

Describe your changes

It is clear that the current implementation of weather-model-graphs can lead to quite long runtime with the number of grid points in dataset that people have been working (for example @observingClouds and @joeloskarsson have seen this).

To try and quantify the scaling and to try an identify where improvements could be made this PR adds two CLI scripts in tests/benchmarks/:

A runtime-scaling script (uv run python -m tests.benchmarks.graph_creation_scaling) that executes graph creation over progressively larger input grid sizes. It generates a plot of execution time vs. number of input nodes alongside a linear $O(N)$ reference line, allowing us to easily visualize how the algorithm's performance degrades as coordinate sizes grow.
A hierarchical profiling script (uv run python -m tests.benchmarks.graph_creation_flamegraph) powered by pyinstrument. It generates and automatically serves an interactive HTML call-stack flamegraph, making it simple to pinpoint exactly which internal functions (e.g., scipy KDTree queries, or networkx node iteration) are responsible for performance bottlenecks.

I thought these CLI tools could provide a baseline for future optimisation work - we could even eventually add CI tests checking that the execution time doesn't drastically blow up as we work on the codebase.

pyinstrument has been added to the dev dependencies in pyproject.toml to support the interactive flamegraphs.

Runtime scaling benchmark

`flat`	`flat_multiscale`	`hierarchical`

The good news is that runtime scales roughly linearly with number of input coord positions, of course this will vary with coordinate density, layout etc, this is for regular rectilinearly laid out input grid nodes.

The bad news is that for order $10^5$ nodes the runtime (at least on my laptop) is order ~6s, about $10^6$ we are looking at runtime of a minute, but another order of magnitude $10^7$ would be an hour.

But, I think there are some clear optimisations we can do, which brings me to the flamegraphs...

Flamegraph profiling script

I'm attaching here two screenshots taken from the browser window that opens when you run the flamegraph benchmark script. This is the pyinstrument interface:

Call Stack	Timeline

I've included the .html-profile result page too) in case someone wants to look further into this example.

It looks to me (as I expected) that there a repeated calls inside connect_nodes_across_graphs which could be vectorized quite easily where we would query many points at once and create all the edges with one call on the networkx.DiGraph object being worked on. However, the composing of subgraphs together and relabelling them at the end appears to be quite costly too, the former might require a different datastructure that nx.DiGraphs which might require a rethink, for the latter maybe we can do something smarter (it shouldn't be that costly I don't think...)

It is not my aim to start a long discussion here about optimisation that could be done. I suggest we open a separate issue for that. Instead, I would simply like to add these tools so that we can establish a way to investigate the issue and plan a course of action.

Issue Link

This PR supplements the CPU vs GPU based benchmark that #62 is introducing.

Type of change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
I have performed a self-review of my code
For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
I have updated the documentation to cover introduced code changes
I have added tests that prove my fix is effective or that my feature works
I have given the PR a name that clearly describes the change, written in imperative form (context).
I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

the code is readable
the code is well tested
the code is documented (including return types and parameters)
the code is easy to maintain

Author checklist after completed review

I have added a line to the CHANGELOG describing this change, in a section
reflecting type of change (add section where missing):
- added: when you have added new functionality
- changed: when default behaviour of the code has been changed
- fixes: when your contribution fixes a bug

Checklist for assignee

PR is up to date with the base branch
the tests pass
author has added an entry to the changelog (and designated the change as added, changed or fixed)
Once the PR is ready to be merged, squash commits and merge the PR.

leifdenby added 2 commits March 25, 2026 15:17

add pyinstruments flamegraph benchmark script

fc43ff5

add runtime scaling benchmark cli script

8a08198

leifdenby added this to the v0.5.0 (proposed) milestone Mar 25, 2026

joeloskarsson assigned leifdenby Apr 12, 2026

joeloskarsson modified the milestones: v0.5.0 (proposed), v0.5.0 Apr 13, 2026

yuvraajnarula mentioned this pull request Apr 13, 2026

[Feat] Memory Profiling and Automated Performance Regression Testing #137

Open

4 tasks

Raj-Taware mentioned this pull request Apr 18, 2026

Vectorize KDTree queries and edge insertion in connect_nodes_across_graphs #138

Open

6 tasks

yuvraajnarula mentioned this pull request Apr 21, 2026

feat: add memory profiling and JSON output to scaling benchmark #140

Open

20 tasks

Raj-Taware mentioned this pull request Apr 25, 2026

perf: Vectorize KDTree queries and edge insertion in connect_nodes_across_graphs #142

Open

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] runtime scaling and flamegraph benchmark CLIs#117

[Feat] runtime scaling and flamegraph benchmark CLIs#117
leifdenby wants to merge 2 commits into
mllam:mainfrom
leifdenby:feat/runtime-scaling-flamegraph-benchmarks

leifdenby commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leifdenby commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Runtime scaling benchmark

Flamegraph profiling script

Issue Link

Type of change

Checklist before requesting a review

Checklist for reviewers

Author checklist after completed review

Checklist for assignee

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leifdenby commented Mar 25, 2026 •

edited

Loading