diff --git a/.gitignore b/.gitignore index cc40a3b43..c261fc313 100644 --- a/.gitignore +++ b/.gitignore @@ -35,3 +35,7 @@ tmp/ *.egg dist/ .DS_STORE + +# pixi environments +.pixi +*.egg-info diff --git a/spec/draft/design_topics/data_interchange.rst b/spec/draft/design_topics/data_interchange.rst index 3b3040672..f3d297c10 100644 --- a/spec/draft/design_topics/data_interchange.rst +++ b/spec/draft/design_topics/data_interchange.rst @@ -85,17 +85,40 @@ page gives a high-level specification for data exchange in Python using DLPack. below. They are not required to return an array object from ``from_dlpack`` which conforms to this standard. +binsparse: Extending to sparse arrays +------------------------------------- + +Sparse arrays can be represented in-memory by a collection of 1-dimensional and 2-dimensional +dense arrays, alongside some metadata on how to interpret these arrays. This allows us to re-use +the DLPack protocol for the storage of the constituent arrays. The work of specifying the +accompanying metadata has already been performed by the +`binsparse specification `_. + +While initially intended to target file formats, binsparse has relatively few requirements from +back-ends: + +1. The ability to represent and parse JSON. +2. The ability to represent/store a key-value store of 1-dimensional (and optionally 2-dimensional) + arrays. + +It is the only such specification for sparse representations to have these minimal requirements. +We can satisfy both: the former with the ``json`` built-in Python module or a Python ``dict`` and +the latter with the DLPack protocol. + +.. note:: + See the `RFC to adopt binsparse `_ + for discussion that preceded the adoption of the binsparse protocol. + + See :ref:`sparse_interchange` for the Python specification of this protocol. + + Non-supported use cases ----------------------- Use of DLPack requires that the data can be represented by a strided, in-memory layout on a single device. This covers usage by a large range of, but not all, known and possible array libraries. Use cases that are not supported by DLPack -include: - -- Distributed arrays, i.e., the data residing on multiple nodes or devices, -- Sparse arrays, i.e., sparse representations where a data value (typically - zero) is implicit. +include distributed arrays, i.e., the data residing on multiple nodes or devices. There may be other reasons why it is not possible or desirable for an implementation to materialize the array as strided data in memory. In such diff --git a/spec/draft/extensions/index.rst b/spec/draft/extensions/index.rst index 3b9409954..d6deb4683 100644 --- a/spec/draft/extensions/index.rst +++ b/spec/draft/extensions/index.rst @@ -32,3 +32,4 @@ the array API standard. See :ref:`api-specification`. fourier_transform_functions linear_algebra_functions + sparse_interchange diff --git a/spec/draft/extensions/sparse_interchange.rst b/spec/draft/extensions/sparse_interchange.rst new file mode 100644 index 000000000..dade7231a --- /dev/null +++ b/spec/draft/extensions/sparse_interchange.rst @@ -0,0 +1,99 @@ +.. _sparse_interchange: + +Sparse interchange +================== + + Array API specification for sparse interchange functions using `binsparse `_. + +Extension name and usage +------------------------ + +If implemented, this extension must be retrievable via:: + + >>> xp = x.__array_namespace__() + >>> if hasattr(xp, 'sparse'): + >>> # Use the extension + +To convert an object from another library supporting also supporting the sparse interchange extension:: + + >>> xp1 = xp1.sparse.from_binsparse(xp2_array) # Convert with the same formats + >>> xp1 = xp1.sparse.from_binsparse(xp2_array, descriptor=binsparse_descriptor) + +.. _binsparse_descriptor_examples: + +Examples of binsparse descriptors +--------------------------------- + +While the `binsparse specification `_ uses JSON for its descriptor, +we will work with equivalent Python objects instead. Here are some examples of binsparse descriptors:: + + >>> coo_2d_descriptor = { + "binsparse": { + "version": "0.1", + "format": "COOR", + "shape": [10, 12], + "number_of_stored_values": 20, + "data_types": { + "indices_0": "uint64", + "indices_1": "uint64", + "values": "float32", + }, + }, + "original_source": f"{library_name!s}, version {library_version!s}", + } + >>> csr_2d_descriptor = { + "binsparse": { + "version": "0.1", + "format": "CSR", + "shape": [20, 24], + "number_of_stored_values": 20, + "data_types": { + "pointers_to_1": "uint64", + "indices_1": "uint64", + "values": "float32", + }, + }, + "original_source": f"{library_name!s}, version {library_version!s}", + } + >>> compressed_vector_descriptor = { + "binsparse": { + "version": "0.1", + "format": "CVEC", + "shape": [30], + "number_of_stored_values": 3, + "data_types": { + "indices_0": "uint64", + "values": "float32", + }, + }, + "original_source": f"{library_name!s}, version {library_version!s}", + } + +Objects in API +-------------- + +.. currentmodule:: array_api + +A conforming implementation of this extension must provide and support the following +functions/methods. In addition, the ``asarray`` method must also be able to convert +objects with supported formats which implement the protocol. + +.. + NOTE: please keep the functions and their inverse together + +.. currentmodule:: array_api.sparse + +.. autosummary:: + :toctree: generated + :template: method.rst + + from_binsparse + +.. currentmodule:: array_api + +.. autosummary:: + :toctree: generated + :template: property.rst + + array.__binsparse__ + array.__binsparse_descriptor__ diff --git a/src/array_api_stubs/_draft/__init__.py b/src/array_api_stubs/_draft/__init__.py index 537ea8f85..08dc15ad2 100644 --- a/src/array_api_stubs/_draft/__init__.py +++ b/src/array_api_stubs/_draft/__init__.py @@ -16,6 +16,7 @@ from .utility_functions import * from . import linalg from . import fft +from . import sparse from .info import __array_namespace_info__ diff --git a/src/array_api_stubs/_draft/array_object.py b/src/array_api_stubs/_draft/array_object.py index 08d5c0b6e..1bae49bec 100644 --- a/src/array_api_stubs/_draft/array_object.py +++ b/src/array_api_stubs/_draft/array_object.py @@ -1246,5 +1246,52 @@ def to_device( Clarified behavior when a provided ``device`` object corresponds to the device on which an array instance resides. """ + def __binsparse_descriptor__(self) -> dict: + """ + Returns a `dict` equivalent to a parsed `binsparse JSON descriptor `_. + + Parameters + ---------- + self: array + array instance. + + Returns + ------- + out: dict + A ``dict`` equivalent to a parsed JSON binsparse descriptor of an array. See :ref:`sparse_interchange` for details. + """ + + def __binsparse__( + self, /, *, descriptor: Optional[dict] = None + ) -> dict[str, array]: + """ + Returns a key-value store of the constituent arrays of a sparse array, as specified by the `binsparse specification `_. + + Parameters + ---------- + self: array + array instance. + descriptor: Optional[dict] + If ``descriptor`` is not ``None``, the data returned must be in the format specified by it. + + Returns + ------- + out: dict[str, array] + A ``dict`` equivalent to a parsed JSON binsparse descriptor of an array. See :ref:`sparse_interchange` for details. + + Raises + ------ + TypeError + If ``descriptor`` is not ``None``, and the array library does not support converting to a format specified by it. + ValueError + If ``descriptor`` is not a valid binsparse descriptor. + + Notes + ----- + + - ``x.__binsparse_descriptor__()["binsparse"]["data_types"].keys() == x.__binsparse__().keys()`` must hold. + - ``descriptor["binsparse"]["data_types"].keys() == x.__binsparse__(descriptor=descriptor).keys()`` must hold. + """ + array = _array diff --git a/src/array_api_stubs/_draft/sparse.py b/src/array_api_stubs/_draft/sparse.py new file mode 100644 index 000000000..2dd1f0323 --- /dev/null +++ b/src/array_api_stubs/_draft/sparse.py @@ -0,0 +1,78 @@ +from __future__ import annotations + +from typing import Optional +from ._types import array, device + +__all__ = ["from_binsparse"] + + +def from_binsparse( + x: object, + /, + *, + descriptor: Optional[dict] = None, + device: Optional[device] = None, + copy: Optional[bool] = None, +) -> array: + """ + Returns a new array containing the data from another (array) object with a ``__binsparse__`` method, + assuming the format specified in `descriptor` is supported in this library. + + Parameters + ---------- + x: object + input (array) object. + descriptor: Optional[dict] + If ``descriptor`` is ``None``, the array must be returned in the format in which it is stored or materializable to. + Otherwise, it must be converted to the format specified by ``descriptor``. + + If ``copy`` is ``False``, no conversion should be performed, and only stored data should be returned. + + If the format specified by ``descriptor`` is unsupported by the library, a ``TypeError`` must be raised. + device: Optional[device] + device on which to place the created array. If ``device`` is ``None`` and ``x`` supports binsparse, the output array + must be on the same device as ``x``. Default: ``None``. + + The v2023.12 standard only mandates that a compliant library should offer a way for ``from_binsparse`` to return an array + whose underlying memory is accessible to the Python interpreter, when the corresponding ``device`` is provided. If the + array library does not support such cases at all, the function must raise ``BufferError``. If a copy must be made to + enable this support but ``copy`` is set to ``False``, the function must raise ``ValueError``. + + Other device kinds will be considered for standardization in a future version of this API standard. + copy: Optional[bool] + boolean indicating whether or not to copy the input. If ``True``, the function must always copy. If ``False``, the function must never copy, and raise ``BufferError`` in case a copy is deemed necessary (e.g. if a cross-device data movement is requested, and it is not possible without a copy). If ``None``, the function must reuse the existing memory buffer if possible and copy otherwise. Default: ``None``. + + + Returns + ------- + out: array + an array containing the data in `arrays` with a format specified by `descriptor`. + + .. admonition:: Note + :class: note + + The returned array may be either a copy or a view. See :ref:`data-interchange` for details. + + Raises + ------ + BufferError + The ``__binsparse__``, ``__binsparse_descriptor__``, ``__dlpack__`` or ``__dlpack_device__`` + methods on the input array or constituent arrays may raise ``BufferError`` when the data + cannot be exported as a binsparse-compatible array. (e.g., incompatible dtype, strides, or + device). It may also raise other errors when export fails for other reasons (e.g., not + enough memory available to materialize the data). ``from_dlpack`` must propagate such + exceptions. + AttributeError + If the ``__binsparse__`` and ``__binsparse_descriptor__`` methods are not present + on the input array. This may happen for libraries that are never able + to export their data with binsparse. + ValueError + If data exchange is possible via an explicit copy but ``copy`` is set to ``False``, or if the specified + descriptor is not valid. + TypeError + If ``descriptor`` is ``None``, the data received from the source library is not guaranteed to + be in a format that the target array library supports. In this case, a ``TypeError`` must be raised. + Additionally, if ``descriptor`` is not ``None``, it must be passed along to ``__binsparse__``, which + may raise a ``TypeError`` if the conversion is unsupported by the source library, which + ``from_binsparse`` must propagate. + """