You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all! 😄 Is there support for loading parquet datasets with partitioning as described in the arrow docs?
Particularly, when I use load_dataset(), it currently seems not to be able to reconstruct the partition columns as described in the doc. For example, if I create a folder structure:
using Pyarrow datasets I can load this dataset, specify the partition columns, and it will automatically add the year, month etc as columns, but I don't see a way to do this through load_dataset (which ignores the key=value folder structure). Does someone know if there's a way around this?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi all! 😄 Is there support for loading parquet datasets with partitioning as described in the arrow docs?
Particularly, when I use
load_dataset()
, it currently seems not to be able to reconstruct the partition columns as described in the doc. For example, if I create a folder structure:using Pyarrow datasets I can load this dataset, specify the partition columns, and it will automatically add the
year
,month
etc as columns, but I don't see a way to do this throughload_dataset
(which ignores thekey=value
folder structure). Does someone know if there's a way around this?Thanks so much! 😄
Beta Was this translation helpful? Give feedback.
All reactions