Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend deltacat interface to support Iceberg bucketing #320

Merged
merged 4 commits into from
Jun 20, 2024

Conversation

raghumdani
Copy link
Collaborator

@raghumdani raghumdani commented Jun 15, 2024

This PR contains interface changes and local deltacat support to read/writer delta partition spec. The motivation for this change is explained in a different document. However, to summarize we have a usecase where Delta must encapsulate Iceberg Manifest but there is no way to represent iceberg partition in DeltaCAT today. This PR adds partition spec model which allows us to specify identity and bucketing partition strategies where latter is what we will also be using with Iceberg. The interface changes are backward compatible.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 2.

Benchmark suite Current: ffb7357 Previous: d77a208 Ratio
deltacat/tests/compute/test_compact_partition_rebase_then_incremental.py::test_compact_partition_rebase_then_incremental[8-rebase-then-incremental-empty-csv-delta-case_V2] 1.175799089098844 iter/sec (stddev: 0.11696467988849833) 2.384150249629209 iter/sec (stddev: 0.07360279983044431) 2.03
deltacat/tests/compute/test_compact_partition_rebase_then_incremental.py::test_compact_partition_rebase_then_incremental[9-rebase-then-incremental-single-hash-bucket_V2] 1.5744004105431237 iter/sec (stddev: 0.09830580721307909) 3.248115145578071 iter/sec (stddev: 0.08641540906761959) 2.06

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Member

@pdames pdames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks like good incremental progress. I think we can merge this as long as we refactor it to not modify the Redshift Manifest model directly. Next, I'll start working on merging the larger set of proposed changes to the internal storage model currently at https://github.com/pdames/deltacat/blob/iceberg/deltacat/storage/model/ with the more mature modeling of transforms in this PR.

deltacat/aws/redshift/model/manifest.py Show resolved Hide resolved
deltacat/storage/model/partition_spec.py Show resolved Hide resolved
@pdames
Copy link
Member

pdames commented Jun 20, 2024

@raghumdani LMK if you want to merge the changes as-is then incrementally work toward the recommended changes, since that's also a viable option here.

@raghumdani
Copy link
Collaborator Author

Sure, I will incrementally work towards this. Created this task: #322

@raghumdani
Copy link
Collaborator Author

@pdames Can you approve so I can merge this?

@pdames pdames merged commit 55ce837 into main Jun 20, 2024
4 checks passed
@raghumdani raghumdani deleted the iceberg-write-support branch June 20, 2024 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants