[Feat] runtime scaling and flamegraph benchmark CLIs#117
Open
leifdenby wants to merge 2 commits intomllam:mainfrom
Open
[Feat] runtime scaling and flamegraph benchmark CLIs#117leifdenby wants to merge 2 commits intomllam:mainfrom
leifdenby wants to merge 2 commits intomllam:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes
It is clear that the current implementation of
weather-model-graphscan lead to quite long runtime with the number of grid points in dataset that people have been working (for example @observingClouds and @joeloskarsson have seen this).To try and quantify the scaling and to try an identify where improvements could be made this PR adds two CLI scripts in
tests/benchmarks/:uv run python -m tests.benchmarks.graph_creation_scaling) that executes graph creation over progressively larger input grid sizes. It generates a plot of execution time vs. number of input nodes alongside a linearuv run python -m tests.benchmarks.graph_creation_flamegraph) powered bypyinstrument. It generates and automatically serves an interactive HTML call-stack flamegraph, making it simple to pinpoint exactly which internal functions (e.g.,scipyKDTree queries, ornetworkxnode iteration) are responsible for performance bottlenecks.I thought these CLI tools could provide a baseline for future optimisation work - we could even eventually add CI tests checking that the execution time doesn't drastically blow up as we work on the codebase.
pyinstrumenthas been added to thedevdependencies inpyproject.tomlto support the interactive flamegraphs.Runtime scaling benchmark
flatflat_multiscalehierarchicalThe good news is that runtime scales roughly linearly with number of input coord positions, of course this will vary with coordinate density, layout etc, this is for regular rectilinearly laid out input grid nodes.
The bad news is that for order$10^5$ nodes the runtime (at least on my laptop) is order $10^6$ we are looking at runtime of a minute, but another order of magnitude $10^7$ would be an hour.
~6s, aboutBut, I think there are some clear optimisations we can do, which brings me to the flamegraphs...
Flamegraph profiling script
I'm attaching here two screenshots taken from the browser window that opens when you run the flamegraph benchmark script. This is the pyinstrument interface:
I've included the
.html-profile result page too) in case someone wants to look further into this example.It looks to me (as I expected) that there a repeated calls inside
connect_nodes_across_graphswhich could be vectorized quite easily where we would query many points at once and create all the edges with one call on thenetworkx.DiGraphobject being worked on. However, the composing of subgraphs together and relabelling them at the end appears to be quite costly too, the former might require a different datastructure thatnx.DiGraphswhich might require a rethink, for the latter maybe we can do something smarter (it shouldn't be that costly I don't think...)It is not my aim to start a long discussion here about optimisation that could be done. I suggest we open a separate issue for that. Instead, I would simply like to add these tools so that we can establish a way to investigate the issue and plan a course of action.
Issue Link
This PR supplements the CPU vs GPU based benchmark that #62 is introducing.
Type of change
Checklist before requesting a review
pullwith--rebaseoption if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
reflecting type of change (add section where missing):
Checklist for assignee