Uniform: Remove snapshot expiration patch, replace with custom delete callback which checks against metadata location #4059
+171
−185
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change removes the snapshot expiration patch for uniform which currently prevents any deletion of shared data files
as part of expiration. We can achieve the same behavior by passing in a custom delete callback which checks if a path to delete is in the Iceberg metadata location, if it is we can clean it up, if it's not we should not clean it up since it would be a data file.
Note, one edge case which is also addressed is in case the user some configured their Iceberg metadata location to be the same as the data location. In this case, we take the conservative approach of not doing any metadata cleanup.
Such a configuration in practice is very rare since it goes against a user's interest to not separate (separation of metadata in a different prefix ensures better throughput for instance), but it didn't add too much complexity to defend against that so it was added.
Which Delta project/connector is this regarding?
Description
This change removes the snapshot expiration patch for uniform which currently prevents any deletion of shared data files
as part of expiration. We can achieve the same behavior by passing in a custom delete callback which checks if a path to delete is in the Iceberg metadata location, if it is we can clean it up, if it's not we should not clean it up since it would be a data file.
Note, one edge case which is also addressed is in case the user some configured their Iceberg metadata location to be the same as the data location. In this case, we take the conservative approach of not doing any metadata cleanup.
Such a configuration in practice is very rare since it goes against a user's interest to not separate (separation of metadata in a different prefix ensures better throughput for instance), but it didn't add too much complexity to defend against that so it was added.
In general, We want to remove Uniform custom Iceberg patches since this will put us on a path for being able to upgrade more effectively and make it more maintainable.
How was this patch tested?
Added integration tests for expiration to ConvertToIcebergSuite
Does this PR introduce any user-facing changes?
No, behavior is preserved