lib/datautils: groupby_agg
on string column with missing values introduces 0
#3515
Labels
groupby_agg
on string column with missing values introduces 0
#3515
While running the tests with the next version of pandas, I ran into this case:
etl/lib/datautils/tests/test_dataframes.py
Lines 309 to 329 in f9e2f93
where the expected result
df_out
column"value_02"
is a mixed integer/string column, because of a zero being introduced through the groupby'ssum
on a group with only missing values.While being tested as the expected result, I am wondering if this is the intended or desired behaviour to get a
0
in there (I assume this is not the case). Or whether this is just from using some dummy data in the test, and not something you encounter or want on the actual datasets? (for example, I don't know if you actually ever use the "sum" operation on a string column in practice)The reason I ran into this case is because this test is failing with pandas' future string dtype enabled. It is failing because the result now no longer has an integer
0
but a string"0"
(and hence theassert equals(..)
failed), which I think is even worse (so I opened an upstream issue to fix this in pandas: pandas-dev/pandas#60229, thanks to running your test suite!)The text was updated successfully, but these errors were encountered: