-
Notifications
You must be signed in to change notification settings - Fork 225
[EPIC] Iceberg-rust Write support #700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could I work on supporting table property updates? I'm also interested in working on the commit path, but it's not clear to me which tasks can be started on in parallel with the ones that are currently in progress. |
i am also interested in the write path, any suggestion about is there any issue that independent with the ongoing works and could be worked in parallel? i'd happy to take one of these issues too. |
so do I! rewrite rust is the new trending, community and contributors are truly interested in write support progresses! |
@barronw Sure thing! I've created a separate ticket for it: #730 Feel free to comment on it so I can assign it to you. @flaneur2020 Sounds good, what do you think of the summary generation? See #724 for details |
@barronw sure! Just comment in the one you prefer, then we can assign you |
sorry came up late after a business trip, if there's still open task available, please assign to me 🙏 |
Sure :) |
Hey everyone :) It's a bit hard to follow the progress here - would it make sense to create a github project or maybe convert this to a tracker issue with child issues? |
@ZENOTME You're completely right, I've updated the post. Thanks for creating the new ticket |
Hi! I’m really interested in getting involved. If there’s an available issue, could you assign it to me? |
@takaebato Feel free to comment under an issue that you want to work on that hasn't been touched yet, and a maintainer can assign it to you! |
@jonathanc-n Thanks!! |
I'm interested in helping with the project, is there a clear roadmap of python bindings and state of current project? Having worked on delta-rs for sometime now, I can properly apply some learning from there in iceberg rs and python |
@ion-elgreco great to see you here! big fan of delta-rs and its python bindings :) This issue just tracks write support for the rust project.
Not yet! But there are a couple things we're working on. I can start an issue to track it. For context, pyiceberg is more feature rich than iceberg-rust. But we're trying to use the iceberg-rust python bindings in a few places.
Happy to chat more about other areas you might be interested in |
@kevinjqliu happy to hear that :) Yes, I think it would be easier to build the bindings from the ground up instead of trying to smash it into pyiceberg. One of the issues with delta-rs/deltalake was the long lasting dependency on pyarrow for the writer, and from what I see in pyiceberg it's mostly pyarrow. I need to read a bit more about what Iceberg supports, and then I can see where to start from. But I would like to work on write support with focus to expose it in python package, maybe (iceberg-rs) this would allow you to at least have distuingishment of pyiceberg. For the writer I think some datafusion code can be re-used from delta-rs, especially the lazy table providers for streamed writes |
Iceberg-rust Write support
I've noticed a lot of interest in write support in Iceberg-rust. This issue aims to break this down into smaller pieces so they can be picked up in parallel.
Appetizers
If you're not super familiar with the codebase, feel free to pick up one of the appetizers below. They are, or are not, related to the write path, but are good things to get in, and are good to get to know the code base:
DataFileWriterBuilder
tests #726Commit path
The commit path entails writing a new metadata JSON.
initial-default
#737add_files
.This is done with the Java API where during writing the upper, lower bound are tracked and the number of null- and nan records are counted.. Most of this is in, except theNaN
counts: Implement nan_value_counts && distinct_counts metrics in parquet writer #417Related operations
These are not on the critical path to enable writes, but are related to it:
SchemaUpdate
logic to Iceberg-Rust #697unionByName
to easily union two schemas to provide easy schema evolution: Update a TableSchema from a Schema #698Metadata tables
Metadata tables are used to inspect the table. Having these tables also allows easy implementation of the maintenance procedures since you can easily list all the snapshots, and expire the ones that are older than a certain threshold.
Integration Tests
Integration tests with other engines like spark.
Contribute
If you want to contribute to the upcoming milestone, feel free to comment on this issue. If there is anything unclear or missing, feel free to reach out here as well 👍
The text was updated successfully, but these errors were encountered: