Skip to content

BUG: PeriodIndex inconsistent deserialization with HDF5 - PyTables #41978

Open
@ra1nty

Description

@ra1nty
  • I have checked that this issue has not already been reported.
    There was a issue 5 years ago mentioned that .to_hdf() acts inconsistently across Python2 & 3 on PeriodIndex for fixed format
    DataFrame with PeriodIndex written in Python2 gets an Int64Index when read back in Python3 #16781

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.
    The bug exists, but behavior is different - see next comment


I noticed that the deserialization of a pandas Series/DataFrame with PeriodIndex from HDF5 file is inconsistent when using PyTables format: The retrieved series/df index will be converted to Int64Index instead of PeriodIndex: See code below for example

import pandas as pd
store = pd.HDFStore('test.h5')
series = pd.Series(index=pd.date_range(start='2015-01', end='2016-01', freq='M'), data=0).to_period('M')
df = pd.DataFrame(index=pd.date_range(start='2015-01', end='2016-01', freq='M'), data=0, columns=['a']).to_period('M')
store.put('/a/a', series, format='table')
store.put('/a/b', df, format='table')
store.select('/a/a')

Output:

540    0
541    0
542    0
543    0
544    0
545    0
546    0
547    0
548    0
549    0
550    0
551    0
dtype: int64
store.select('/a/b').index

Output:

Int64Index([540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551], dtype='int64')

Problem description

Inconsistent output with HDF5 file & PyTables format

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2cb9652
python : 3.9.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_United States.1252

pandas : 1.2.4
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 20.3.1
setuptools : 51.0.0.post20201207
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.4.3
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.24.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO HDF5read_hdf, HDFStorePeriodPeriod data type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions