Skip to content

add benchmarks of Graph using asv #797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

martinfleis
Copy link
Member

@martinfleis martinfleis commented Jul 18, 2025

Starting with some simple benchmarks of Graph. My interest is to compare performance on main vs in a PR, not necessarily to compare it with W. Hence the assumption is that we will be comparing different JSON outputs.

Given I need to disable caching in base.py, let's not merge this.

Any suggestions what to add? Something that could be sensitive to changes of the way we represent adjacency, since I am mostly interested in removing the overheads of reindexing we have now.

@martinfleis martinfleis changed the title add benchmarks of Graph add benchmarks of Graph (not to be merged) Jul 18, 2025
Copy link

codecov bot commented Jul 18, 2025

Codecov Report

Attention: Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 85.4%. Comparing base (44ede54) to head (0b9b4cd).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
libpysal/graph/base.py 66.7% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff          @@
##            main    #797   +/-   ##
=====================================
  Coverage   85.4%   85.4%           
=====================================
  Files        150     150           
  Lines      15971   15951   -20     
=====================================
- Hits       13632   13617   -15     
+ Misses      2339    2334    -5     
Files with missing lines Coverage Δ
libpysal/graph/base.py 96.5% <66.7%> (-0.2%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jGaboardi
Copy link
Member

possible/beneficial to add in memory profiling?

@martinfleis
Copy link
Member Author

Possibly, though I am probably not the one who's going to write it.

@knaaptime
Copy link
Member

i think we need to test lags and such. The graph ordering is critical to the sparse representation/alignment, and we need to maintain the integrity of that for all the linear algebra stuff. i.e. if we want to test indexing stuff, we need to test not only creation of sparse, but the math that goes through it

@martinfleis
Copy link
Member Author

Why? Everything beyond sparse creation is out of our reach.

@martinfleis
Copy link
Member Author

I mean, correctness is covered by tests, if that's what you mean. And performance on top of sparse has nothing to do with graph.

@knaaptime
Copy link
Member

i guess what im getting at is, in my gut, i don't think we're going to be able to avoid the performance overhead from sorting the graph (sorting for sparse will be unavoidable, but maybe we'll have a faster way than a pandas reindex or something). The question is when the hit occurs (when the graph is instantiated, when the sparse attribute is called, etc) so might be nice to compare different options

@martinfleis
Copy link
Member Author

Yeah, that's precisely aim of this benchmark. And I think to capture that, benchmarking lag or any other op on top of sparse is redundant.

@knaaptime
Copy link
Member

cool, i trust your lead then. I was imagining a scenario where we considered things like reimplementing lag or transform using groupby/apply instead of matrix operation which would have implications for when you need to build sparse (i.e. try to skip re-ordering by avoiding sparse whenever possible), then what tasks/packages that burden falls hardest onto

@martinfleis
Copy link
Member Author

Ah, yeah. If that happens we should certainly include these in the benchmarks.

@martinfleis martinfleis changed the title add benchmarks of Graph (not to be merged) add benchmarks of Graph using asv Jul 19, 2025
@martinfleis
Copy link
Member Author

martinfleis commented Jul 19, 2025

I have updated this to use asv for benchmarks. Which also means that this is now mergeable.

@jGaboardi in the same way, asv can do memory profiling if we want to do it.

@martinfleis martinfleis marked this pull request as ready for review July 19, 2025 20:21
@jGaboardi
Copy link
Member

Given I need to disable caching in base.py, let's not merge this.

Which also means that this is not mergeable.

So we can keep this PR open in the docket and simply trigger a re-run of it after other PRs are merged to compare vs. main. Am I understanding correctly?

@martinfleis
Copy link
Member Author

martinfleis commented Jul 19, 2025

Nope, those comments are now obsolete. This is completely mergeable now, following the same model as geopandas does with benchmarks (which do not work at the moment but that is not relevant here).

not was supposed to be now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants