Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specification for the __binsparse__ protocol #912

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hameerabbasi
Copy link
Contributor

@hameerabbasi hameerabbasi commented Mar 10, 2025

This pull request adds the specification for the binsparse protocol (closes #840).

@willow-ahrens @BenBrock from the binsparse team.
@mtsokol @ivirshup for scipy.sparse
@leofang for cupyx.sparse
@pearu for torch.sparse
@jakevdp for JAX/TensorFlow

Introduction

The binsparse protocol is meant to be a specification for on-disk storage of ND sparse arrays. It requires just two things from a back-end implementing it:

a. A way to store 1D and 2D (dense) arrays (we have this via DLPack)
b. A way to parse and interpret JSON (we have this via the json module)

Psuedocode implementation

Here's a psuedocode example using two libraries, xp1 and xp2, both supporting sparse arrays:

# In library code:
xp2_sparray = xp2.from_binsparse(xp1_sparray, ...)

# Or
xp2_sparray = xp2.asarray(xp1_sparray, ...)

# This psuedocode impl is common between `xp1` and `xp2`
def from_binsparse(x: object, /, *, device: device | None = None, copy: bool | None = None) -> array:
    binsparse_descr = getattr(x, "__binsparse_descriptor__", None)
    binsparse_impl = getattr(x, "__binsparse__", None)
    if binsparse_impl is None or binsparse_descr is None:
        raise TypeError(...)
    
    binsparse_descriptor = binsparse_descr()
    # Will raise an error if the format/descriptor is unsupported.
    sparse_type = _type_from_binsparse_descriptor(binsparse_descriptor)
    constituent_arrays = binsparse_impl()
    my_constituent_arrays = {
        k: from_dlpack(arr, device=device, copy=copy) for k, arr in constituent_arrays.items()
    }
    return sparse_type.from_strided_arrays(my_constituent_arrays, shape=...)

Compare this to the following example converting SciPy COO arrays to PyData/Sparse:

import sparse
import scipy.sparse as sps
import numpy as np

sparse_array = sparse.COO(np.stack(sps_array.coords), sps_array.data, shape=sps_array.shape)

Parallel implementation in sparse: pydata/sparse#764
Parallel implementation in SciPy: scipy/scipy#22553

@hameerabbasi hameerabbasi force-pushed the binsparse-protocol branch 2 times, most recently from f5d1642 to f01c98d Compare March 10, 2025 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC: In-memory sparse array interchange
1 participant