Skip to content

Rename the Snapshot Retention policy #1284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

flyrain
Copy link
Contributor

@flyrain flyrain commented Mar 31, 2025

I'm proposing to rename the table maintenance policy "snapshot-retention" to "snapshot-expiry". While "snapshot-retention" is a reasonable name, it's a bit too general. I believe "snapshot-expiry" more precisely captures the intent of the policy, which is specifically about removing expired snapshots.

I think this is the right time to make the change before we include the policy schemas in an upcoming release.

Will change related classes in a followup PR.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Mar 31, 2025
Copy link
Collaborator

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well, Thanks @flyrain !

"description": "Inheritable Polaris policy schema for Iceberg table snapshot retention",
"$id": "https://polaris.apache.org/schemas/policies/system/snapshot-expiry/2025-02-03.json",
"title": "Snapshot Expiry Policy",
"description": "Inheritable Polaris policy schema for Iceberg table snapshot expiry",
Copy link
Contributor

@dimas-b dimas-b Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for me, given the example below, the term "retention" looks more relevant. Expiry means something is not valid/usable anymore, while the purpose of this policy (although unstated) appears to be controlling when snapshots are discarded, while still being logically valid.

(edit: removed "minor", let's discuss)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.
Retention and expiration are two different things.
The term retention is more correct for this use case.

Copy link
Contributor

@eric-maynard eric-maynard Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little ambiguous. It seems like the intent of the policy is to determine under what conditions expiry should happen. Similarly, we talk about masking policies, or compaction policies. Expiry seems marginally more in line with this, while retention describes the state when no maintenance action is taken.

Maybe the contention is around the exact definition of "expire"? If it's interpreted as something that happens passively, such as when a policy becomes too old, then I agree calling this an expiry policy doesn't make as much sense. I was interpreting expire to be used in the same way that Iceberg uses it, where "expire snapshots" is an action wherein you remove certain snapshots. The Iceberg docs say, for example:

Expiring old snapshots removes them from metadata

Copy link
Contributor

@dimas-b dimas-b Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the behaviour of this policy defined somewhere? (sorry, if I'm missing the obvious). It would help a lot in reasoning about names and descriptions... TBH, the current description is rather obscure (also properties is just a bag of arbitrary entries).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example below demonstrates a free-form config map—its key-value pairs can be anything, like expire-in-days. So we shouldn't base a strong argument on this example alone. Using retention here isn't technically wrong, it's a valid term. However, the main reason for renaming it to expiry is to make the intent of this table maintenance policy more precise: it's specifically about removing expired snapshots. Retention is a broader concept and could imply that the maintenance job does more than just snapshot cleanup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flyrain : Could you clarify what "removing expired snapshots" involves and what makes snapshots "expired" (ideally in the description of the policy we're renaming in this PR)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add it here first:
What are involved?

  1. A table with snapshots
  2. A Polaris policy which indicates the snapshot expiry is enabled for the table.
  3. Table configuration or policy config indicates how to expire snapshots, including parameters like max age of a snapshot.
  4. An engine or client to able to remove expired snapshots, clean up files for the table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @flyrain .

With that description the term "expiry" makes sense to me.

However, I believe it would be nice to have that information in the description of the policy (in code). Referring back to GH discussions is not practical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I have definitely been a little confused about the semantics of policies. Big +1 on better descriptions (in code)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants