Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation of Snapshot Summaries #724

Closed
Fokko opened this issue Nov 27, 2024 · 3 comments · Fixed by #1139
Closed

Generation of Snapshot Summaries #724

Fokko opened this issue Nov 27, 2024 · 3 comments · Fixed by #1139
Assignees

Comments

@Fokko
Copy link
Contributor

Fokko commented Nov 27, 2024

With each snapshot comes a summary map, optional in V1, required in V2 and later:

image

The summary contains information such as what kind of files the snapshot contains (data/delete), and what the changes are in rows and bytes. The best way to replicate this metrics collection is by looking at the Java code: https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotSummary.java

This class works closely alongside the SnapshotProducer and tracks what happens with the snapshot.

@barronw
Copy link
Contributor

barronw commented Nov 29, 2024

Can I pick this up?

@c-thiel
Copy link
Collaborator

c-thiel commented Nov 30, 2024

@barronw gladly! Assigned the issue to you :)

@barronw
Copy link
Contributor

barronw commented Feb 9, 2025

@c-thiel Feel free to re-assign this to someone else (maybe @jonathanc-n). I won't be able to complete this soon, so I pushed what I have to my fork for reference (link).

liurenjie1024 pushed a commit that referenced this issue Mar 21, 2025
## Which issue does this PR close?

- Related to #724 

## What changes are included in this PR?
This is the building block to implementing snapshot summaries. Most of
the implementation code is built off of @barronw 's very nice commit.
Most of the added lines are also just tests. A follow up pr will be
included integrating it with the rest of the codebase.

## Are these changes tested?
Yes, unit test
liurenjie1024 added a commit that referenced this issue Mar 25, 2025
…mmary` (#1122)

## Which issue does this PR close?
- Part of #724 .

## What changes are included in this PR?
Added functionality for merging summary and adding manifest file
information.


## Are these changes tested?

One unit test for merge functionality and add manifest

---------

Co-authored-by: Renjie Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants