From f0898cb533708e8cf7b9d86a715a82ee27571710 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Tue, 24 Jun 2025 10:34:07 +0200 Subject: [PATCH 1/2] move whatsnew items from 2.3.0 to 2.3.1 --- doc/source/whatsnew/v2.3.0.rst | 35 ------------------------------- doc/source/whatsnew/v2.3.1.rst | 38 ++++++++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 35 deletions(-) diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst index 6433fe8d2b060..8ca6c0006a604 100644 --- a/doc/source/whatsnew/v2.3.0.rst +++ b/doc/source/whatsnew/v2.3.0.rst @@ -31,39 +31,6 @@ Other enhancements - The :meth:`~Series.cumsum`, :meth:`~Series.cummin`, and :meth:`~Series.cummax` reductions are now implemented for :class:`StringDtype` columns (:issue:`60633`) - The :meth:`~Series.sum` reduction is now implemented for :class:`StringDtype` columns (:issue:`59853`) -.. --------------------------------------------------------------------------- -.. _whatsnew_230.notable_bug_fixes: - -Notable bug fixes -~~~~~~~~~~~~~~~~~ - -These are bug fixes that might have notable behavior changes. - -.. _whatsnew_230.notable_bug_fixes.string_comparisons: - -Comparisons between different string dtypes -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -In previous versions, comparing :class:`Series` of different string dtypes (e.g. ``pd.StringDtype("pyarrow", na_value=pd.NA)`` against ``pd.StringDtype("python", na_value=np.nan)``) would result in inconsistent resulting dtype or incorrectly raise. pandas will now use the hierarchy - - object < (python, NaN) < (pyarrow, NaN) < (python, NA) < (pyarrow, NA) - -in determining the result dtype when there are different string dtypes compared. Some examples: - -- When ``pd.StringDtype("pyarrow", na_value=pd.NA)`` is compared against any other string dtype, the result will always be ``boolean[pyarrow]``. -- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("pyarrow", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array. -- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("python", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array. - -.. _whatsnew_230.api_changes: - -API changes -~~~~~~~~~~~ - -- When enabling the ``future.infer_string`` option, :class:`Index` set operations (like - union or intersection) will now ignore the dtype of an empty :class:`RangeIndex` or - empty :class:`Index` with ``object`` dtype when determining the dtype of the resulting - Index (:issue:`60797`) - .. --------------------------------------------------------------------------- .. _whatsnew_230.deprecations: @@ -85,8 +52,6 @@ Numeric Strings ^^^^^^^ -- Bug in :meth:`.DataFrameGroupBy.min`, :meth:`.DataFrameGroupBy.max`, :meth:`.Resampler.min`, :meth:`.Resampler.max` where all NA values of string dtype would return float instead of string dtype (:issue:`60810`) -- Bug in :meth:`DataFrame.sum` with ``axis=1``, :meth:`.DataFrameGroupBy.sum` or :meth:`.SeriesGroupBy.sum` with ``skipna=True``, and :meth:`.Resampler.sum` with all NA values of :class:`StringDtype` resulted in ``0`` instead of the empty string ``""`` (:issue:`60229`) - Bug in :meth:`Series.__pos__` and :meth:`DataFrame.__pos__` where an ``Exception`` was not raised for :class:`StringDtype` with ``storage="pyarrow"`` (:issue:`60710`) - Bug in :meth:`Series.rank` for :class:`StringDtype` with ``storage="pyarrow"`` that incorrectly returned integer results with ``method="average"`` and raised an error if it would truncate results (:issue:`59768`) - Bug in :meth:`Series.replace` with :class:`StringDtype` when replacing with a non-string value was not upcasting to ``object`` dtype (:issue:`60282`) diff --git a/doc/source/whatsnew/v2.3.1.rst b/doc/source/whatsnew/v2.3.1.rst index 77d6825bccbdf..f8954acc852cf 100644 --- a/doc/source/whatsnew/v2.3.1.rst +++ b/doc/source/whatsnew/v2.3.1.rst @@ -15,6 +15,42 @@ Enhancements ~~~~~~~~~~~~ - +.. --------------------------------------------------------------------------- +.. _whatsnew_231.notable_bug_fixes: + +Notable bug fixes +~~~~~~~~~~~~~~~~~ + +These are bug fixes that might have notable behavior changes. + + +.. _whatsnew_231.notable_bug_fixes.string_comparisons: + +Comparisons between different string dtypes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In previous versions, comparing :class:`Series` of different string dtypes (e.g. ``pd.StringDtype("pyarrow", na_value=pd.NA)`` against ``pd.StringDtype("python", na_value=np.nan)``) would result in inconsistent resulting dtype or incorrectly raise. pandas will now use the hierarchy + + object < (python, NaN) < (pyarrow, NaN) < (python, NA) < (pyarrow, NA) + +in determining the result dtype when there are different string dtypes compared. Some examples: + +- When ``pd.StringDtype("pyarrow", na_value=pd.NA)`` is compared against any other string dtype, the result will always be ``boolean[pyarrow]``. +- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("pyarrow", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array. +- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("python", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array. + + +.. _whatsnew_231.api_changes: + +API changes +~~~~~~~~~~~ + +- When enabling the ``future.infer_string`` option, :class:`Index` set operations (like + union or intersection) will now ignore the dtype of an empty :class:`RangeIndex` or + empty :class:`Index` with ``object`` dtype when determining the dtype of the resulting + Index (:issue:`60797`) + + .. _whatsnew_231.regressions: Fixed regressions @@ -26,6 +62,8 @@ Fixed regressions Bug fixes ~~~~~~~~~ +- Bug in :meth:`.DataFrameGroupBy.min`, :meth:`.DataFrameGroupBy.max`, :meth:`.Resampler.min`, :meth:`.Resampler.max` where all NA values of string dtype would return float instead of string dtype (:issue:`60810`) +- Bug in :meth:`DataFrame.sum` with ``axis=1``, :meth:`.DataFrameGroupBy.sum` or :meth:`.SeriesGroupBy.sum` with ``skipna=True``, and :meth:`.Resampler.sum` with all NA values of :class:`StringDtype` resulted in ``0`` instead of the empty string ``""`` (:issue:`60229`) - Fixed bug in :meth:`DataFrame.explode` and :meth:`Series.explode` where methods would fail with ``dtype="str"`` (:issue:`61623`) .. --------------------------------------------------------------------------- From 3d72a090b086f59df45471828be2f27945860757 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Tue, 24 Jun 2025 10:44:23 +0200 Subject: [PATCH 2/2] restructure to focus on string dtype changes/fixes --- doc/source/whatsnew/v2.3.1.rst | 56 +++++++++++++++++++--------------- 1 file changed, 32 insertions(+), 24 deletions(-) diff --git a/doc/source/whatsnew/v2.3.1.rst b/doc/source/whatsnew/v2.3.1.rst index f8954acc852cf..64e5c1510e1da 100644 --- a/doc/source/whatsnew/v2.3.1.rst +++ b/doc/source/whatsnew/v2.3.1.rst @@ -9,22 +9,12 @@ including other versions of pandas. {{ header }} .. --------------------------------------------------------------------------- -.. _whatsnew_231.enhancements: - -Enhancements -~~~~~~~~~~~~ -- - -.. --------------------------------------------------------------------------- -.. _whatsnew_231.notable_bug_fixes: - -Notable bug fixes -~~~~~~~~~~~~~~~~~ - -These are bug fixes that might have notable behavior changes. +.. _whatsnew_231.string_fixes: +Improvements and fixes for the StringDtype +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. _whatsnew_231.notable_bug_fixes.string_comparisons: +.. _whatsnew_231.string_fixes.string_comparisons: Comparisons between different string dtypes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -39,16 +29,36 @@ in determining the result dtype when there are different string dtypes compared. - When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("pyarrow", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array. - When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("python", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array. +.. _whatsnew_231.string_fixes.ignore_empty: + +Index set operations ignore empty RangeIndex and object dtype Index +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When enabling the ``future.infer_string`` option, :class:`Index` set operations (like +union or intersection) will now ignore the dtype of an empty :class:`RangeIndex` or +empty :class:`Index` with ``object`` dtype when determining the dtype of the resulting +Index (:issue:`60797`). -.. _whatsnew_231.api_changes: +This ensures that combining such empty Index with strings will infer the string dtype +correctly, rather than defaulting to ``object`` dtype. For example: -API changes -~~~~~~~~~~~ +.. code-block:: python -- When enabling the ``future.infer_string`` option, :class:`Index` set operations (like - union or intersection) will now ignore the dtype of an empty :class:`RangeIndex` or - empty :class:`Index` with ``object`` dtype when determining the dtype of the resulting - Index (:issue:`60797`) + >>> pd.options.mode.infer_string = True + >>> df = pd.DataFrame() + >>> df.columns.dtype + dtype('int64') # default RangeIndex for empty columns + >>> df["a"] = [1, 2, 3] + >>> df.columns.dtype + # new columns use string dtype instead of object dtype + +.. _whatsnew_231.string_fixes.bugs: + +Bug fixes +^^^^^^^^^ +- Bug in :meth:`.DataFrameGroupBy.min`, :meth:`.DataFrameGroupBy.max`, :meth:`.Resampler.min`, :meth:`.Resampler.max` where all NA values of string dtype would return float instead of string dtype (:issue:`60810`) +- Bug in :meth:`DataFrame.sum` with ``axis=1``, :meth:`.DataFrameGroupBy.sum` or :meth:`.SeriesGroupBy.sum` with ``skipna=True``, and :meth:`.Resampler.sum` with all NA values of :class:`StringDtype` resulted in ``0`` instead of the empty string ``""`` (:issue:`60229`) +- Fixed bug in :meth:`DataFrame.explode` and :meth:`Series.explode` where methods would fail with ``dtype="str"`` (:issue:`61623`) .. _whatsnew_231.regressions: @@ -62,9 +72,7 @@ Fixed regressions Bug fixes ~~~~~~~~~ -- Bug in :meth:`.DataFrameGroupBy.min`, :meth:`.DataFrameGroupBy.max`, :meth:`.Resampler.min`, :meth:`.Resampler.max` where all NA values of string dtype would return float instead of string dtype (:issue:`60810`) -- Bug in :meth:`DataFrame.sum` with ``axis=1``, :meth:`.DataFrameGroupBy.sum` or :meth:`.SeriesGroupBy.sum` with ``skipna=True``, and :meth:`.Resampler.sum` with all NA values of :class:`StringDtype` resulted in ``0`` instead of the empty string ``""`` (:issue:`60229`) -- Fixed bug in :meth:`DataFrame.explode` and :meth:`Series.explode` where methods would fail with ``dtype="str"`` (:issue:`61623`) +- .. --------------------------------------------------------------------------- .. _whatsnew_231.other: