Skip to content

BUG: assignment via loc silently fails with differing dtypes #61346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
zbs opened this issue Apr 23, 2025 · 5 comments
Open
3 tasks done

BUG: assignment via loc silently fails with differing dtypes #61346

zbs opened this issue Apr 23, 2025 · 5 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Needs Discussion Requires discussion from core team before further action

Comments

@zbs
Copy link

zbs commented Apr 23, 2025

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
print(pd.__version__)
df = pd.DataFrame({'foo': ['2025-04-23', '2025-04-22']})
df['bar'] = pd.to_datetime(df['foo'], format='%Y-%m-%d')
df.loc[:, 'bar'] = df.loc[:, 'bar'].dt.strftime('%Y%m%d')
print(df)

# Yields
# 2.2.3
#           foo        bar
# 0  2025-04-23 2025-04-23
# 1  2025-04-22 2025-04-22

Issue Description

I expect bar to look like

20250423
20250422

instead of

2025-04-23
2025-04-22

Expected Behavior

bar should look like

20250423
20250422

Installed Versions

[ins] In [2]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit                : 0691c5cf90477d3503834d983f69350f250a6ff7
python                : 3.12.10
python-bits           : 64
OS                    : Linux
OS-release            : 4.18.0-372.32.1.el8_6.x86_64
Version               : #1 SMP Fri Oct 7 12:35:10 EDT 2022
machine               : x86_64
processor             : x86_64
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : en_US.UTF-8

pandas                : 2.2.3
numpy                 : 1.26.4
pytz                  : 2025.2
dateutil              : 2.9.0.post0
pip                   : 25.0.1
Cython                : 3.0.12
sphinx                : None
IPython               : 8.35.0
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : 4.13.3
blosc                 : None
bottleneck            : 1.4.2
dataframe-api-compat  : None
fastparquet           : None
fsspec                : 2024.9.0
html5lib              : None
hypothesis            : None
gcsfs                 : None
jinja2                : 3.1.6
lxml.etree            : 5.3.2
matplotlib            : 3.10.1
numba                 : 0.61.2
numexpr               : 2.10.2
odfpy                 : None
openpyxl              : 3.1.5
pandas_gbq            : None
psycopg2              : None
pymysql               : None
pyarrow               : 15.0.2
pyreadstat            : None
pytest                : 8.3.5
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : 1.15.2
sqlalchemy            : 2.0.39
tables                : 3.9.2
tabulate              : 0.9.0
xarray                : 2025.3.1
xlrd                  : 2.0.1
xlsxwriter            : 3.2.2
zstandard             : 0.23.0
tzdata                : 2025.2
qtpy                  : None
pyqt5                 : None
@zbs zbs added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 23, 2025
@arthurlw
Copy link
Contributor

arthurlw commented Apr 23, 2025

Confirmed on main. Still silently works when assigning differing dtypes to columns via .loc (in the example above, assigning strings to a datetime64 column).

It seems to me that this should be raising an error, consistent with the behavior introduced for other dtype mismatches (e.g., int64 ← str, which now raises a LossySetitemError when assigning with .loc).

@rhshadrach
Copy link
Member

rhshadrach commented Apr 24, 2025

This may be well known, but just in case, df['bar'] = df.loc[:, 'bar'].dt.strftime('%Y%m%d') gives the desired behavior for the OP and is what should be used when you want to overwrite a column with a (possibly) different dtype.

It seems to me that this should be raising an error, consistent with the behavior introduced for other dtype mismatches

This isn't so clear to me, e.g.

df = pd.DataFrame({"a": [1.0, 2.5, 3.0]})
df.loc[:, "a"] = 5
print(df)
#      a
# 0  5.0
# 1  5.0
# 2  5.0

Should this raise? I personally think the answer there is no. But I'm not sure we ever made any decisions on which implicit conversions should and should not be allowed. This is somewhat related to PDEP-6.

@rhshadrach
Copy link
Member

cc @pandas-dev/pandas-core

@rhshadrach rhshadrach added Needs Discussion Requires discussion from core team before further action Dtype Conversions Unexpected or buggy dtype conversions and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 24, 2025
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 24, 2025

I don't think that is what is going on here. It's not about incompatible types not being recognized. It's about the automatic conversion that is done with strings that are formatted datetime objects being assigned to a series that has datetime64 dtype. With these statements:

df['bar'] = pd.to_datetime(df['foo'], format='%Y-%m-%d')
df.loc[:, 'bar'] = df.loc[:, 'bar'].dt.strftime('%Y%m%d')

the first statement sets the dtype of "bar" to be datetime64. In the second statement, the expression df.loc[:, 'bar'].dt.strftime('%Y%m%d') has object dtype - it is a set of strings. But because it is being assigned to a column with datetime64 dtype, we first try to parse the strings to see if it is a valid date. So then we keep the dtype as datetime64.
For example:

>>> df.loc[:, "bar"] = ["290102", "300304"]
>>> df
          foo        bar
0  2025-04-23 2002-01-29
1  2025-04-22 2004-03-30

I'm not sure if we want to change the behavior in this case. If .loc is used to change values in a column with datetime64 dtype, the ability to parse a string is useful as it lets you fix individual values (or selected rows) without having to parse the strings into dates.

On the other hand, as shown in the example, if a user did something like that, it is unclear whether they wanted the dates parsed as YYMMDD or DDMMYY. So maybe we should be warning if things are ambiguous??

@MarcoGorelli
Copy link
Member

Yup, looks like it's going down the mixed formats path (🙀 )

In [8]: df = pd.DataFrame({'foo': ['2025-04-23', '2025-04-22']}); df['bar'] = pd.to_datetime(df['foo'], format='%Y-%m-%d
   ...: ')

In [9]: df.loc[:, 'bar'] = ['12/01/2020', '13/01/2020']

In [10]: df
Out[10]:
          foo        bar
0  2025-04-23 2020-12-01
1  2025-04-22 2020-01-13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

5 participants