How to enable the Standard?

Probably the most fundamental question

Say we have
- `dfpd`, which is a pandas DataFrame
- `PandasStandardDataFrame`, which is the pandas implementation of the Standard. It takes a pandas DataFrame and returns a class which only has the methods supported by the Standard. Say this is available on PyPI as `pandas-standard`

(...and similarly for modin / cudf / vaex / anyone else)

Right. Say I want to write a function which can accept any DataFrame, like
```python
def clean_column_names(df: DataFrame) -> DataFrame:
    df_standard = <get the relevant Standard implementation>
    mapping = {}
    for column in df_standard.get_column_names():
        mapping[column] = column.lower()
    df_standard = df_standard.rename(mapping)
    return df_standard.dataframe
```

How do we do the first line, i.e. getting `df_standard`?

I think the ideal place to get to would be
```python
df_standard = df.__dataframe_standard__()
```

but this can't happen overnight, especially if we want to stick to the mantra @jbrockmendel had mentioned [here](https://github.com/data-apis/dataframe-api/issues/108#issuecomment-1468878516)

> id like to find an alternative that fits with the "assume pandas changes nothing" mantra

### Phase 1

This a hacky, but allows for quick experimentation without needing to depend on pandas' approvals, or on the pandas' relatively slow release cycle, or those of any other library.

Consumers of the Standard would need to write something like
```python
def enable_standard(df: DataFrame):
    if type(df).split('.')[0] == 'pandas':
        from pandas_standard import PandasStandardDataFrame  # TODO raise if not installed
        return PandasStandardDataFrame(df)
    # and similarly for any other package which might not
    # want to introduce `__dataframe_standard__` right away
    try:
        return df.__dataframe_standard__()
    except AttributeError:
        raise TypeError(f'Expected DataFrame Standard compliant DataFrame, got {type(df)}')
```

At the moment the Consortium is still relatively small, so enumerating the options in `if-then` statements should be manageable.

### Phase 2

Once we've seen that some libraries are actually able to use it to write portable code, then pandas could add a method like
```python
def __dataframe_standard__(self):
    import_optional_dependency("pandas_standard")  # this will raise if not installed
    from pandas_standard import PandasStandardDataFrame
    return PandasStandardDataFrame(self)
```
, and similarly for other libraries.

I'd like think that something so small would be a relatively easy sell to the pandas-dev team - @jbrockmendel , @jorisvandenbossche, do you agree? Would you be OK with this? Do you have other suggestions for how to opt-in to the standard?

Note that consumers would need to have `pandas-standard` installed for this to work.

### (optional) phase 3

`dfpd.__dataframe_standard__()` would work without requiring extra dependencies (either because `pandas_standard` has become a runtime dependency of pandas, or because the standard is implemented within pandas).
I think this is what some, such as  @aregm would like to see happen.

Usual reminder that I think this is unlikely to pass - nonetheless, it's not off the table, and some participants would find it desirable, so I've kept it in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to enable the Standard? #115

Phase 1

Phase 2

(optional) phase 3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to enable the Standard? #115

Description

Phase 1

Phase 2

(optional) phase 3

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions