Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should Series[Any] be used internally instead of Series? #1133

Open
MarcoGorelli opened this issue Feb 28, 2025 · 4 comments
Open

Should Series[Any] be used internally instead of Series? #1133

MarcoGorelli opened this issue Feb 28, 2025 · 4 comments

Comments

@MarcoGorelli
Copy link
Member

Currently, Series is used in several places where the inner type of the Series isn't known, e.g.:

@overload
def compare(
self,
other: Series,
align_axis: AxisColumn = ...,
keep_shape: bool = ...,
keep_equal: bool = ...,
) -> DataFrame: ...

There's a couple of issues I'm running into with this

First, the pyright-strict job marks this as partially unknown:

/home/runner/work/pandas-stubs/pandas-stubs/tests/test_series.py:1039:5 - error: Type of "compare" is partially unknown
Type of "compare" is "Overload[(other: Series[Unknown], align_axis: Literal['index', 0], keep_shape: bool = ..., keep_equal: bool = ...) -> Series[Unknown], (other: Series[Unknown], align_axis: Literal['columns', 1] = ..., keep_shape: bool = ..., keep_equal: bool = ...) -> DataFrame]" (reportUnknownMemberType)

Second, when using pyright with --verifytypes to look for uncovered parts of the public API, this is flagged as "unknown type":

            {
                "category": "function",
                "name": "pandas.testing.assert_series_equal",
                "referenceCount": 1,
                "isExported": true,
                "isTypeKnown": false,
                "isTypeAmbiguous": false,
                "diagnostics": [
                    {
                        "file": "/home/marcogorelli/type_coverage_py/.pyright_env_pandas/lib/python3.12/site-packages/pandas/_testing/__init__.pyi",
                        "severity": "error",
                        "message": "Type of parameter \"left\" is partially unknown\n  Parameter type is \"Series[Unknown]\"\n    Type argument 1 for class \"Series\" has unknown type",
                        "range": {
                            "start": {
                                "line": 4,
                                "character": 27
                            },
                            "end": {
                                "line": 4,
                                "character": 46
                            }
                        }
                    },

Would it be OK to use Series[Any] instead of just Series in such cases? Or, as some libraries do, to introduce a type alias Incomplete: TypeAlias = Any to mean "we should be able to narrow down the type but for now we're not doing so" and use that in some cases

The latter use-case (--verifytypes) can, I think, really help to prioritise which stubs to add

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Feb 28, 2025

We've had discussion about this in another PR. See #1093 (comment)

Idea is to create an UnknownSeries type that would correspond to when we don't know the type. I think this may solve the problem you raise above.

@Jeitan
Copy link

Jeitan commented Mar 18, 2025

Is there something similar happening for DataFrames? I'm also running afoul of things being partially unknown using strict with basedpyright, specifically in the return of read_excel for a particular overload (a single string passed into the io parameter).

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Mar 19, 2025

Is there something similar happening for DataFrames? I'm also running afoul of things being partially unknown using strict with basedpyright, specifically in the return of read_excel for a particular overload (a single string passed into the io parameter).

I don't think the return types of read_excel() should be seen as partially unknown. Can you provide a simple example that illustrates what you are seeing?

What might be happening is the following. Let's say you do df = pd.read_excel("your string") The result of that is either a dict of DataFrame objects or a single DataFrame. Let's say it is the latter. If you then do s = df["a"], then s will be partially unknown since it will have type Series[Any] . There are some contributions to change all references in pandas-stubs from Series to UnknownSeries. This is an ongoing effort.

@Jeitan
Copy link

Jeitan commented Mar 20, 2025

@Dr-Irv Okay gotcha. I probably need to do some better things on my end also, so I'll see if I can address it that way first. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants