Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
path = ('filecache::s3://ncei-wcsd-archive/data/processed/SH1305/18kHz'
'/SaKe2013-D20130523-T080854_to_SaKe2013-D20130523-T085643.csv')
storage_options = {'s3': {'anon': True}, 'filecache': {'cache_storage': 'TMP', 'expiry_time': 259200}}
pd.read_csv(path, storage_options=storage_options) # <-- this produces the error
with fsspec.open(path,
**storage_options) as f:
df = pd.read_csv(f) # <-- this will work properly
Issue Description
The provided code snippet will fail and raise a ValueError. Although that the provided URL is indeed a valid fsspec file path.
Traceback (most recent call last):
File "/server_code/playground.py", line 31, in <module>
pd.read_csv(path, storage_options=storage_options)
File "/usr/local/lib/python3.11/dist-packages/pandas/io/parsers/readers.py", line 948, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/pandas/io/parsers/readers.py", line 611, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/pandas/io/parsers/readers.py", line 1448, in __init__
self._engine = self._make_engine(f, self.engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine
self.handles = get_handle(
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/pandas/io/common.py", line 718, in get_handle
ioargs = _get_filepath_or_buffer(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/pandas/io/common.py", line 441, in _get_filepath_or_buffer
raise ValueError(
ValueError: storage_options passed with file object or non-fsspec file path
Expected Behavior
pd.read_*
should integrate out of the box with the provided storage_option
as it contain valid keyword arguments for fsspec.open
Installed Versions
INSTALLED VERSIONS
commit : a671b5a
python : 3.11.9.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.153.1-microsoft-standard-WSL2
Version : #1 SMP Fri Mar 29 23:14:13 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.1.4
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 72.1.0
pip : 24.2
Cython : 3.0.11
pytest : 8.2.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.25.0
pandas_datareader : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : 2024.6.1
gcsfs : None
matplotlib : 3.7.5
numba : 0.60.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 17.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2024.6.1
scipy : 1.11.4
sqlalchemy : 2.0.32
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None