Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd # version 2.1.3
df1 = pd.DataFrame([[None, 1.0], [None, 2.0]])
df2 = pd.DataFrame([[3.0, None], [4.0, None]])
df3 = pd.concat([df1, df2], keys=["D1", "D2"])
df3.dtypes # 0:f64, 1:f64
df3
# 0 1
# D1 0 NaN 1.0
# 1 NaN 2.0
# D2 0 3.0 NaN
# 1 4.0 NaN
Issue Description
When moving to 2.1.0 a FutureWarning
is created when concatenating DataFrames where columns in one may be empty.
The warning provided is:
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
Expected Behavior
If you run the above code on 2.0.3 you will obtain exactly the same DataFrame, but without the Warning
.
If you perform some analysis to exclude empty columns on 2.1.3 you achieve exactly the same DataFrame (columns re-ordered) but without the FutureWarning
:
df3 = pd.concat([df1[[1]], df2[[0]]], keys=["D1", "D2"])
I cannot allow these FutureWarnings
to propagate through to end users (pandas is internal working for my purposes), therefore I must either filter them out (which I dont want to add unneccessary code for) or perform checks to remove empty columns from DataFrames (which also adds code overhead, and actually can change the represenation of my output which I dont want).
Is there anything else that can be done - is this really a bug?
Installed Versions
Replace this line with the output of pd.show_versions()