-
Notifications
You must be signed in to change notification settings - Fork 281
Remove redundant locations when constructing access policies #2149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
polaris-core/src/main/java/org/apache/polaris/core/storage/StorageUtil.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/storage/StorageUtil.java
Outdated
Show resolved
Hide resolved
|
||
/** Removes "redundant" locations, so {/a/b/, /a/b/c, /a/b/d} will be reduced to just {/a/b/} */ | ||
private static @Nonnull Set<String> removeRedundantLocations(Set<String> locationStrings) { | ||
HashSet<String> result = new HashSet<>(locationStrings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a new collection, it can be produced by
locationStrings.stream()
.filter(Objects::nonNull)
.map(StorageLocation::of)
.collect(Collectors.toCollection(HashSet::new));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would remove duplicate locations, but not redundant locations like we want to. We'd still need to loop over the collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but you'd save the exponential instantiation of SotrageLocation objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you could safe the inner loop with a sorted collection, if the locations end with a /
.
polaris-core/src/test/java/org/apache/polaris/core/storage/StorageUtilTest.java
Show resolved
Hide resolved
service/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java
Outdated
Show resolved
Hide resolved
@@ -2612,16 +2572,6 @@ protected FileIO loadFileIO(String ioImpl, Map<String, String> properties) { | |||
callContext, ioImpl, properties, identifier, locations, storageActions, resolvedPath); | |||
} | |||
|
|||
private void blockedUserSpecifiedWriteLocation(Map<String, String> properties) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loadFileIO
is also unused and should be removed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that it can be removed, but unlike blockedUserSpecifiedWriteLocation
it's not related to this PR so I think we should do that separately.
Iceberg tables can technically store data across any number of paths, but Polaris currently uses 3 different locations for credential vending:
write.data.path
, if setwrite.metadata.path
, if setThis was intended to capture scenarios where e.g. (2) is not a child path of (1), so that the vended credentials can still be valid for reading the entire table. However, there are systems that seem to always set (2) and (3), such as:
s3:/my-bucket/base/iceberg
s3:/my-bucket/base/iceberg/data
s3:/my-bucket/base/iceberg/metadata
In such cases the extra paths (e.g. extra resources in the AWS Policy) are redundant. In one such case, these redundant paths caused the policy to exceed the maximum allowable 2048 characters.
This PR removes redundant paths -- those that are the child of another path -- from the list of accessible locations tracked for a given table and does some slight refactoring to consolidate the logic for extracting these paths from a TableMetadata.