Loading partitioned dataset from S3 to Redshift #1712
Unanswered
dhatch-niv
asked this question in
Q&A
Replies: 1 comment
-
Unfotunately Instead of using wr.redshift.copy_from_files did you consider wr.redshift.copy? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all -
I have a dataset that I am looking to incrementally extract to redshift. I have broken this into two steps:
For each new period of data:
The implementation roughly looks like:
The
copy_from_files
function fails because Redshift does not understand the partitions created byto_parquet
, andwr.s3.to_parquet
strips the partition column from the data files when writing to S3.A possible solution would be to add a keyword argument to
to_parquet
, something likepreserve_partition_columns=True
which would keep the partition columns in the Parquet file instead of dropping them.Is there a better way to achieve this without changes to the library?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions