Skip to content

Commit fd91ea3

Browse files
committed
Add __dataframe_namespace__
Taken over from the array API standard approach
1 parent 8268072 commit fd91ea3

File tree

2 files changed

+134
-2
lines changed

2 files changed

+134
-2
lines changed

spec/API_specification/dataframe_api/dataframe_object.py

+26
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,32 @@ class DataFrame:
3636
**Methods and Attributes**
3737
3838
"""
39+
def __dataframe_namespace__(
40+
self: DataFrame, /, *, api_version: Optional[str] = None
41+
) -> Any:
42+
"""
43+
Returns an object that has all the dataframe API functions on it.
44+
45+
Parameters
46+
----------
47+
api_version: Optional[str]
48+
String representing the version of the dataframe API specification
49+
to be returned, in ``'YYYY.MM'`` form, for example, ``'2023.04'``.
50+
If it is ``None``, it should return the namespace corresponding to
51+
latest version of the dataframe API specification. If the given
52+
version is invalid or not implemented for the given module, an
53+
error should be raised. Default: ``None``.
54+
55+
Returns
56+
-------
57+
namespace: Any
58+
An object representing the dataframe API namespace. It should have
59+
every top-level function defined in the specification as an
60+
attribute. It may contain other public names as well, but it is
61+
recommended to only include those names that are part of the
62+
specification.
63+
64+
"""
3965

4066
@classmethod
4167
def from_dict(cls, data: Mapping[str, Column]) -> DataFrame:

spec/purpose_and_scope.md

+108-2
Original file line numberDiff line numberDiff line change
@@ -239,17 +239,123 @@ sugar required for fast analysis of data.
239239

240240
## How to read this document
241241

242+
The API specification itself can be found under {ref}`api-specification`.
242243

244+
For guidance on how to read and understand the type annotations included in
245+
this specification, consult the Python
246+
[documentation](https://docs.python.org/3/library/typing.html).
243247

244248

249+
(how-to-adopt-this-api)=
245250
## How to adopt this API
246251

252+
Most (all) existing dataframe libraries will find something in this API standard
253+
that is incompatible with a current implementation, and that they cannot
254+
change due to backwards compatibility concerns. Therefore we expect that each
255+
of those libraries will want to offer a standard-compliant API in a _new
256+
namespace_. The question then becomes: how does a user access this namespace?
247257

258+
The simplest method is: document the import to use to directly access the
259+
namespace (e.g. `import package_name.dataframe_api`). This has two issues
260+
though:
248261

262+
1. Dataframe-consuming libraries that want to support multiple dataframe
263+
libraries then have to explicitly import each library.
264+
2. It is difficult to _version_ the dataframe API standard implementation (see
265+
{ref}`api-versioning`).
249266

250-
## Definitions
267+
To address both issues, a uniform way must be provided by a conforming
268+
implementation to access the API namespace, namely a [method on the dataframe
269+
object](DataFrame.__dataframe_namespace__):
251270

271+
```
272+
xp = x.__dataframe_namespace__()
273+
```
252274

275+
The method must take one keyword, `api_version=None`, to make it possible to
276+
request a specific API version:
253277

278+
```
279+
xp = x.__dataframe_namespace__(api_version='2023.04')
280+
```
281+
282+
The `xp` namespace must contain all functionality specified in
283+
{ref}`api-specification`. The namespace may contain other functionality; however,
284+
including additional functionality is not recommended as doing so may hinder
285+
portability and inter-operation of dataframe libraries within user code.
286+
287+
### Checking a dataframe object for Compliance
288+
289+
Dataframe-consuming libraries are likely to want a mechanism for determining
290+
whether a provided dataframe is specification compliant. The recommended
291+
approach to check for compliance is by checking whether a dataframe object has
292+
an `__dataframe_namespace__` attribute, as this is the one distinguishing
293+
feature of a dataframe-compliant object.
294+
295+
Checking for an `__dataframe_namespace__` attribute can be implemented as a
296+
small utility function similar to the following.
297+
298+
```python
299+
def is_dataframe_api_obj(x):
300+
return hasattr(x, '__dataframe_namespace__')
301+
```
302+
303+
304+
### Discoverability of conforming implementations
305+
306+
It may be useful to have a way to discover all packages in a Python
307+
environment which provide a conforming dataframe API implementation, and the
308+
namespace that that implementation resides in.
309+
To assist dataframe-consuming libraries which need to create dataframes originating
310+
from multiple conforming dataframe implementations, or developers who want to perform
311+
for example cross-library testing, libraries may provide an
312+
{pypa}`entry point <specifications/entry-points/>` in order to make a dataframe API
313+
namespace discoverable.
314+
315+
:::{admonition} Optional feature
316+
Given that entry points typically require build system & package installer
317+
specific implementation, this standard chooses to recommend rather than
318+
mandate providing an entry point.
319+
:::
320+
321+
The following code is an example for how one can discover installed
322+
conforming libraries:
323+
324+
```python
325+
from importlib.metadata import entry_points
326+
327+
try:
328+
eps = entry_points()['dataframe_api']
329+
ep = next(ep for ep in eps if ep.name == 'package_name')
330+
except TypeError:
331+
# The dict interface for entry_points() is deprecated in py3.10,
332+
# supplanted by a new select interface.
333+
ep = entry_points(group='dataframe_api', name='package_name')
334+
335+
xp = ep.load()
336+
```
337+
338+
An entry point must have the following properties:
339+
340+
- **group**: equal to `dataframe_api`.
341+
- **name**: equal to the package name.
342+
- **object reference**: equal to the dataframe API namespace import path.
343+
344+
345+
* * *
346+
347+
## Conformance
348+
349+
A conforming implementation of the dataframe API standard must provide and
350+
support all the functions, arguments, data types, syntax, and semantics
351+
described in this specification.
352+
353+
A conforming implementation of the dataframe API standard may provide
354+
additional values, objects, properties, data types, and functions beyond those
355+
described in this specification.
356+
357+
Libraries which aim to provide a conforming implementation but haven't yet
358+
completed such an implementation may, and are encouraged to, provide details on
359+
the level of (non-)conformance. For details on how to do this, see
360+
[Verification - measuring conformance](verification_test_suite.md).
254361

255-
## References

0 commit comments

Comments
 (0)