Description
Why
There has been increased interest in Iceberg's Puffin file format recently. This is partially driven by the fact that the Iceberg V3 Spec added support for deletion vectors which are expected to be stored in Puffin files.
However, as was recently noted on the dev-list, currently only the iceberg-java SDK supports reading or writing Puffin files. Iceberg-rust itself has zero support for the Puffin file format today. The purpose of this ticket is to change that by adding support for Puffin to iceberg-rust and iceberg-python (through exposed bindings).
How
I have already raised a PR for adding Puffin support to iceberg-rust here: #714. However as that PR is quite large, I am splitting up the code and submitting it as multiple PRs in the following order:
- feat(puffin): Add Puffin crate and CompressionCodec #745
- refactor(puffin): Move puffin crate contents inside iceberg crate #789
- feat(puffin): Parse Puffin FileMetadata #765
- feat(puffin): Add PuffinReader #892
- feat(puffin): Add PuffinWriter #959
- feat(puffin): Make Puffin APIs public #1165
-
feat(puffin): Add Python bindings- Let's wait for someone to request this feature first. I already see iceberg-python has started work on it's own Puffin implementation.
-
Optimize puffin file metadata parsing (see comment for more details)- Moved into a separate ticket: Optimize Puffin file metadata parsing #1198