-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend deltacat interface to support Iceberg bucketing #320
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 2
.
Benchmark suite | Current: ffb7357 | Previous: d77a208 | Ratio |
---|---|---|---|
deltacat/tests/compute/test_compact_partition_rebase_then_incremental.py::test_compact_partition_rebase_then_incremental[8-rebase-then-incremental-empty-csv-delta-case_V2] |
1.175799089098844 iter/sec (stddev: 0.11696467988849833 ) |
2.384150249629209 iter/sec (stddev: 0.07360279983044431 ) |
2.03 |
deltacat/tests/compute/test_compact_partition_rebase_then_incremental.py::test_compact_partition_rebase_then_incremental[9-rebase-then-incremental-single-hash-bucket_V2] |
1.5744004105431237 iter/sec (stddev: 0.09830580721307909 ) |
3.248115145578071 iter/sec (stddev: 0.08641540906761959 ) |
2.06 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks like good incremental progress. I think we can merge this as long as we refactor it to not modify the Redshift Manifest model directly. Next, I'll start working on merging the larger set of proposed changes to the internal storage model currently at https://github.com/pdames/deltacat/blob/iceberg/deltacat/storage/model/ with the more mature modeling of transforms in this PR.
@raghumdani LMK if you want to merge the changes as-is then incrementally work toward the recommended changes, since that's also a viable option here. |
Sure, I will incrementally work towards this. Created this task: #322 |
@pdames Can you approve so I can merge this? |
This PR contains interface changes and local deltacat support to read/writer delta partition spec. The motivation for this change is explained in a different document. However, to summarize we have a usecase where Delta must encapsulate Iceberg Manifest but there is no way to represent iceberg partition in DeltaCAT today. This PR adds partition spec model which allows us to specify identity and bucketing partition strategies where latter is what we will also be using with Iceberg. The interface changes are backward compatible.