Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce UnknownSeries and UnknownIndex, type core.strings.pyi using them #1146

Merged
merged 40 commits into from
Mar 11, 2025

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Mar 6, 2025

One step towards #1133

I think one way to address this issue could be to do it incrementally - when you type a module strictly, add that to the pyproject.toml so that it stays strictly typed. Then gradually the partially unknown types will go away

  • Closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added: Please use assert_type() to assert the type of any return value

🤔 this isn't quite working, trying to fix it up

@MarcoGorelli MarcoGorelli changed the title make typing in pandas_stubs.core.strings.pyi strict, add UnknownSeries and UnknownIndex Introduce UnknownSeries and UnknownIndex, type core.strings.pyi using them Mar 6, 2025
@MarcoGorelli MarcoGorelli marked this pull request as ready for review March 7, 2025 10:59
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't comment on the code changes suggested below, but I'd like to suggest the following:

  1. Change the references to StringMethods in core/series.pyi and core/indexes/base.pyi to make Series[str] and Index[str] the first argument. Then in core/strings.pyi the first argument of the Generic called T will get bound to that type.
  2. Update the tests for the string methods in test_series.py and test_indexes.py to test for the return type of Series[str] and Index[str] as appropriate.
  3. For test_indexes.py, we could use a set of tests on the string methods similar to the ones in test_series.py

If you think we should do this in a separate PR, I'm OK with that as well.

@MarcoGorelli MarcoGorelli marked this pull request as draft March 7, 2025 22:23
@MarcoGorelli MarcoGorelli force-pushed the strict-strings-typing branch from c8e6d8f to 92dc75d Compare March 7, 2025 22:34
@MarcoGorelli MarcoGorelli marked this pull request as ready for review March 7, 2025 22:56
@MarcoGorelli MarcoGorelli force-pushed the strict-strings-typing branch from c7e8187 to 17e280f Compare March 8, 2025 12:06
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing all of this work. It is a nice improvement to the stubs.

I think that all the methods that have ->T should be -> _TSTR

Because we know these are string methods, so even if the type of the Series (or Index) is unknown, we know we will be returning Series[str] or Index[str]

@MarcoGorelli
Copy link
Member Author

thanks, have updated

I think that all the methods that have ->T should be -> _TSTR

I think the only exception is str.slice, which preserves the type. but I've gone ahead and done this for others 👍

@MarcoGorelli
Copy link
Member Author

Regarding #1146 (comment), is it OK if we leave that to a separate PR please?

Partially because I feel like the scope here keeps increasing, and partially because I'm not sure it's correct - for example, if I have

import pandas as pd
from typing import Any

def func(a: pd.Series[Any]) -> None:
    reveal_type(a.str.upper())

then with that commit, we get Revealed type is "Any", whereas without it we get Revealed type is "pandas.core.series.Series[builtins.str]"

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Mar 11, 2025

Regarding #1146 (comment), is it OK if we leave that to a separate PR please?

Partially because I feel like the scope here keeps increasing, and partially because I'm not sure it's correct - for example, if I have

import pandas as pd
from typing import Any

def func(a: pd.Series[Any]) -> None:
    reveal_type(a.str.upper())

then with that commit, we get Revealed type is "Any", whereas without it we get Revealed type is "pandas.core.series.Series[builtins.str]"

OK. It's a mypy bug. Ugh.

python/mypy#15921

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple of tests to change, otherwise OK

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MarcoGorelli . Long journey, but a really nice improvement to the stubs!

@Dr-Irv Dr-Irv merged commit 2b0279e into pandas-dev:main Mar 11, 2025
13 checks passed
@MarcoGorelli
Copy link
Member Author

thanks for your careful review, much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants