Description
-
I have checked that this issue has not already been reported.
There was a issue 5 years ago mentioned that.to_hdf()
acts inconsistently across Python2 & 3 onPeriodIndex
for fixed format
DataFrame with PeriodIndex written in Python2 gets an Int64Index when read back in Python3 #16781 -
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
The bug exists, but behavior is different - see next comment
I noticed that the deserialization of a pandas Series/DataFrame with PeriodIndex
from HDF5 file is inconsistent when using PyTables format: The retrieved series/df index will be converted to Int64Index
instead of PeriodIndex
: See code below for example
import pandas as pd
store = pd.HDFStore('test.h5')
series = pd.Series(index=pd.date_range(start='2015-01', end='2016-01', freq='M'), data=0).to_period('M')
df = pd.DataFrame(index=pd.date_range(start='2015-01', end='2016-01', freq='M'), data=0, columns=['a']).to_period('M')
store.put('/a/a', series, format='table')
store.put('/a/b', df, format='table')
store.select('/a/a')
Output:
540 0
541 0
542 0
543 0
544 0
545 0
546 0
547 0
548 0
549 0
550 0
551 0
dtype: int64
store.select('/a/b').index
Output:
Int64Index([540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551], dtype='int64')
Problem description
Inconsistent output with HDF5 file & PyTables format
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 2cb9652
python : 3.9.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_United States.1252
pandas : 1.2.4
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 20.3.1
setuptools : 51.0.0.post20201207
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.4.3
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.24.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None