Skip to content

Exploring SciPy sparse array migration from sparse matrices #785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

dschult
Copy link

@dschult dschult commented May 12, 2025

This PR explores changes needed for migrating to SciPy sparse arrays instead of sparse matrices. The migration guide suggests that the process be done in 2 "passes", and so probably in 2 PRs. This PR implements "Pass 1" where we find use of * and ** and other spmatrix idioms and replace them with e.g. @ and other idioms that work for both spmatrix and sparray.

Specifically, this PR:

  • adds a pytest conftest.py file that monkey-patches SciPy to raise whenever spmatrix objects use * and other idioms.
  • adjusts the code to update to @ and other idioms that work for both spmatrix and sparray.

The additional conftest.py file should not be merged to your codebase -- but I wanted to show what that file would look like because you have many repos to be converter -- hence the "Draft" nature of this PR. We should revert that commit before merging this. Running pytest locally with this file in place raises exceptions until the fixes are in place. If there's a better place to store this conftest.py file let me know.

SciPy version: While sparray is implemented in SciPy 1.8, the constructor functions like random_array() are not available until 1.12. There are workarounds for the constructor functions. But you should think about bumping the minimum version to SciPy 1.12.

More about the "harder" decisions needed for this migration:
Note that "pass 1" doesn't actually switch anything to sparray. The goal is code that will work with sparray sparse objects. The next step is "pass 2" where we change to using sparray internally. It will require deciding what to do about functions that currently return spmatrix objects.

For example:

  • functions returning spmatrix and having a sparse input: return the same sparray/spmatrix style as the input .
  • functions returning spmatrix and no sparse input: add a kwarg something like as_spmatrix=True -- or find some other way to determine whether to return sparray or spmatrix.

If the library uses any sparse libraries that require int32 index arrays for CSR index arrays, we will need to pay attention to potential changes. The spmatrix and sparray apis choose index array dtypes slightly differently. So let me know if you have use any 32-bit sparse libraries. E.g. some scipy.csgraph functions, pyamg and others.

@sjsrey sjsrey self-assigned this May 12, 2025
Copy link

codecov bot commented May 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.4%. Comparing base (4b9970d) to head (8881bf4).
Report is 9 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main    #785     +/-   ##
=======================================
- Coverage   85.4%   85.4%   -0.0%     
=======================================
  Files        150     150             
  Lines      15989   16036     +47     
=======================================
+ Hits       13655   13688     +33     
- Misses      2334    2348     +14     
Files with missing lines Coverage Δ
libpysal/weights/raster.py 59.3% <100.0%> (+0.5%) ⬆️
libpysal/weights/set_operations.py 59.2% <100.0%> (ø)
libpysal/weights/tests/test_util.py 100.0% <100.0%> (ø)
libpysal/weights/util.py 78.7% <100.0%> (+0.7%) ⬆️
libpysal/weights/weights.py 93.3% <100.0%> (ø)

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@martinfleis
Copy link
Member

martinfleis commented May 13, 2025

This looks good to me. Regarding the decision to make, we mostly need to figure out how to migrate W.sparse which now returns a matrix and is a property, so does not allow us to add a temporary keyword facilitating the migration. We certainly need to ensure that all the PySAL's downstream packages like spreg are ready for sparse arrays before doing this.

cc @pedrovma

@dschult
Copy link
Author

dschult commented May 13, 2025

I agree that the property W.sparse is important to get right. People could easily be using that for arithmetic that could use * instead of @ or multiply. But maybe they don't:

The W class is already a little hybrid in its spmatrix/sparray handling. The W.to_sparse() method returns an sparray. But the property gives an spmatrix. So if you haven't heard many complaints/questions about that, its a good sign. (Also the graph interface uses sparray while the weights interface uses spmatrix, so presumably there is some cross-talk there and you would know if there are concerns.)

As for providing user control of the property, one proposal is to provide a temporary keyword arg to the W class __init__ method. Presumably people will want to use spmatrix or sparray consistently through a workflow -- so they can choose which they want the property to provide during construction. You would need to store that value as an attribute, so in a pinch a user could switch the selection by changing the attribute value True<->False. I guess making it a public attribute is a friendly approach, but a private attribute might be better because you will want to remove this option when spmatrix is removed in the long run. Something like: w = W(...., as_spmatrix=True) and then if self._as_spmatrix: .... That name is a little strange because the attribute only affects the property -- and not the output of to_sparse(). So maybe use_spmatrix_property? ...

If there isn't a ecosystem problem/discussion about W.sparse vs W.to_sparse() being different, you might be able to get away with shifting the property to sparray so long as the pysal repos are all compatible with both. But I don't know how much your users hack on the W.sparse property in a way that depends on * and ** being matrix operations.

@martinfleis
Copy link
Member

I didn't even know we have W.to_sparse to be fair and never heard any complaint. For a context, W is a legacy implementation of spatial graphs while Graph is a new re-implementation of the same concept. That is also why it is readily using sparse arrays.

Copy link
Author

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments on specific places in the code. (details below)

  • the PR handling using matrix_power only works after scipy1.12, but there is a workaround shown below.
  • the line flagged as untested could be tested by running the same test with row_st set.

Should I put (either of) these changes into this PR? I should also remove the conftest tool. I'll put it into the scipy migration to sparray document and link to it here. Actually I could put it into a PR/issue comment too. Maybe an issue on the pysal/pysal repo?

Comment on lines 526 to 529
wk = sum(w**x for x in range(1, k + 1))
wk = sum(sparse.linalg.matrix_power(w, k) for k in range(1, k+1))
shortest_path = False
else:
wk = w**k
wk = sparse.linalg.matrix_power(w, k)
Copy link
Author

@dschult dschult May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes to the matrix power computation are being raised in the "oldest" CI run. That makes sense because scipy.sparse.linalg.matrix_power is a scipy 1.12 feature. We can work around using this matrix_power function if you want to keep the minimum scipy requirement at 1.8.

if lower_order:
        wk = w.copy()
        for _ in range(k):
            wk += wk @ w
        shortest_path = False
    else:
        # work around for scipy.sparse.linalg.matrix_power needed until scipy1.12
        wk  = w.copy()
        x = 1
        while 2 * x < k:
            wk = wk @ wk
            x *= 2
        while x < k:
            wk = wk @ w
            x += 1

BTW the graph interface uses scipy.linalg.matrix_power, so that interface will not run for scipy<1.12. Another reason to shift the min scipy version I guess.

@@ -1225,7 +1225,7 @@ def lat2SW(nrows=3, ncols=5, criterion="rook", row_st=False):
m = sparse.dia_matrix((data, offsets), shape=(n, n), dtype=np.int8)
m = m + m.T
if row_st:
m = sparse.spdiags(1.0 / m.sum(1).T, 0, *m.shape) * m
m = sparse.dia_matrix(((1.0 / m.sum(1).T), [0]), shape=m.shape) @ m
Copy link
Author

@dschult dschult May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add [have added] a test that calls lat2SW with row_st=True to make sure this is working/tested. I tried it locally and The line is correct.

@dschult dschult force-pushed the sparray_convert_demo branch from febaf09 to 8881bf4 Compare May 14, 2025 17:41
@dschult dschult marked this pull request as ready for review May 14, 2025 17:41
@dschult
Copy link
Author

dschult commented May 14, 2025

I have removed Draft Status for this PR -- it is ready for review as a "pass 1" part of migration to sparray. ("pass 1" as discussed in the SciPy migration to sparray guide, means make sure the code can work with either spmatrix or sparray.)

This PR is constructed to work with the current minimum SciPy version of 1.8. But a couple of places will be simplified by bumping the minimum version of SciPy to 1.12. These places are commented as such with the one-line replacement code and the version of scipy that it needs. This mostly involves replacing W**k with repeated @ operations -- but avoiding the accumulation of round-off under k matmuls by squaring repeatedly followed by repeated @ up to a total of k. This is what the new function scipy.linalg.matrix_power does in scipy >=1.12.

This PR adds a test of the row_st feature within the lat2SW function. That wasn't previously covered and a bot complained when I updated the untested code. The test is the same as the test without row_st -- I just set row_st=True and unscale the results to enable comparison. There may be a better way to test this feature -- but at least it now has coverage.

The remaining failure in the "Oldest" CI seems to be unrelated to the changes here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants