Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions polaris-core/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ dependencies {

testFixturesApi(platform(libs.junit.bom))
testFixturesApi("org.junit.jupiter:junit-jupiter")
testFixturesApi("org.junit.jupiter:junit-jupiter-params")
testFixturesApi(libs.assertj.core)
testFixturesApi(libs.mockito.core)
testFixturesApi("com.fasterxml.jackson.core:jackson-core")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,12 @@ public enum PolarisCredentialProperty {
"the azure storage account host",
"the azure account name + endpoint that will append to the ADLS_SAS_TOKEN_PREFIX"),
EXPIRATION_TIME(
Long.class, "expiration-time", "the expiration time for the access token, in milliseconds");
Long.class, "expiration-time", "the expiration time for the access token, in milliseconds"),
ADDITIONAL_STORAGE_CONFIG(
String.class,
"additional-storage-config",
"the additional storage config for the cloud storage"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a fairly generic property... why does it need to be part of "credential" properties?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only place where we can dynamically override some storage config used by the FileIO. The key idea is that the getScopedCredentials response should be capable of carrying additional storage config settings that can be interpreted later in a context-specific way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully support the idea of allowing custom (configurable) additional properties to be sent to Iceberg clients for their FileIO configuration. However, it might be preferable to refactor Polaris code to avoid mixing "credentials" and general properties. IIRC, the "load table" and "config" responses in the Iceberg REST API are not limited to just "credential" properties. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed @dimas-b -- it makes sense for the catalog to be able to inject additional properties, but I think these may not necessarily be credentials and that it's better to avoid baking that assumption into the code.

Copy link
Member Author

@XJDKC XJDKC Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to expose these additional properties to the iceberg client, this is for BasePolarisCatalog to access the storage (doCommit, doRefresh, registerTable). The PolarisMetaStoreManager we implements will provide some additional storage config for custom FileIO to access the storage.
BasePolarisCatalog.java#L1607-L1631

But thanks for pointing this out, it seems that these properties will be also passed to the client. Maybe we should filter this property when we return the creds in the loadTable response.

For combining the additional properties with creds, the getSubscopedCredentials api actually currently represents a request for a dynamic connection-configuration, which happens to mostly be credentials right now. We should treat it as an api that we can use to get storage config so the Polaris's FileIO can use these info to access the storage.

Copy link
Contributor

@dimas-b dimas-b Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood the intent of this PR.

I'm not sure about exposing general FileIO properties for users to configure the internal Polaris access path to table storage. Polaris may not always use Iceberg FileIO for its internal access, but if general properties are exposed, refactoring that area will be much harder.

@XJDKC : Would you be able to share some more details about your intended use case?

Copy link
Member Author

@XJDKC XJDKC Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context
getSubscopedCreds returns an EnumMap<PolarisCredentialProperty, String>, it serves two purposes:

  1. For Polaris: Supplies credentials to initialize FileIO, enabling storage access for Polaris's operations such as doCommit, doRefresh, and registerTable
  2. For REST Client: Vends credentials to the REST Client, enabling it to access storage.

Our Intention
The PolarisMetaStoreManager we implement may need to dynamically select AWS S3 or Azure storage endpoints at runtime based on routing requirements (e.g., proxy, PrivateLink). It interprets these details contextually. This requires updating the routing logic to ensure Polaris can access the storage by sending requests to the correct Hostnames/IPs, so we implement our own FileIO.

To achieve this, we need to pass additional information along with credentials to Polaris (From PolarisMetaStoreManager to FileIO). We consider getSubscopedCreds a suitable mechanism for providing these details, as it effectively informs Polaris how to access storage. But right now, this api happens to mostly be credentials.

Problem
However, as you mentioned, including additional props in getSubscopedCreds also exposes them to the REST client, which uses the same API for a different purpose. Since the client (query engine) operates in the user’s environment, these info is unnecessary and should not be shared with the REST client due to security and relevance concerns.

Proposed Solution
To resolve this, we can filter out the additional storage configuration from the loadTable response, ensuring that it is only utilized internally by Polaris and not sent to the REST client.

cc: @dimas-b: Hope this answers your question!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, @XJDKC !

The use case makes total sense from my POV. However, I'm still not sure about overloading credential properties with this information.

Do you think a larger refactoring could be beneficial to allow customizing storage access inside Polaris, without affecting credentials specifically? Perhaps with such a refactoring having to excluding the new information from REST client responses will not be necessary... WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dimas-b

We come up with another solution to achieve that, details can be found in this PR: #724
Can you pls take a look? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#724 certainly looks preferable from my POV. Let's continue the discussion there.

;

private final Class valueType;
private final String propertyName;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.polaris.core.storage.cache;

import static org.apache.polaris.core.storage.PolarisCredentialVendor.*;

import java.util.EnumMap;
import java.util.Map;
import java.util.stream.Stream;
import org.apache.iceberg.aws.s3.S3FileIOProperties;
import org.apache.iceberg.azure.AzureProperties;
import org.apache.iceberg.gcp.GCPProperties;
import org.apache.polaris.core.persistence.*;
import org.apache.polaris.core.storage.PolarisCredentialProperty;
import org.assertj.core.api.Assertions;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.Arguments;
import org.junit.jupiter.params.provider.MethodSource;

public class StorageCredentialCacheEntryTest {

public StorageCredentialCacheEntryTest() {}

@Test
public void testBadResult() {
ScopedCredentialsResult badResult =
new ScopedCredentialsResult(
BaseResult.ReturnStatus.SUBSCOPE_CREDS_ERROR, "extra_error_info");
StorageCredentialCacheEntry cacheEntry = new StorageCredentialCacheEntry(badResult);
Assertions.assertThatThrownBy(() -> cacheEntry.convertToMapOfString())
.isInstanceOf(NullPointerException.class)
.hasMessageContaining("because \"this.credsMap\" is null");
}

@ParameterizedTest
@MethodSource("testConvertToMapOfStringParams")
public void testConvertToMapOfString(
Map<PolarisCredentialProperty, String> original, Map<String, String> expected) {
EnumMap<PolarisCredentialProperty, String> props =
new EnumMap<>(PolarisCredentialProperty.class);
props.putAll(original);

ScopedCredentialsResult goodResult = new ScopedCredentialsResult(props);
StorageCredentialCacheEntry cacheEntry = new StorageCredentialCacheEntry(goodResult);
Map<String, String> properties = cacheEntry.convertToMapOfString();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open Question: The PolarisMetaStoreManager can set any string as ADDITIONAL_STORAGE_CONFIG, and FileIO will handle it. However, in most cases, this string is expected to be a JSON object. Should we consider deserializing ADDITIONAL_STORAGE_CONFIG into a string map and merging it with other properties within the convertToMapOfString method?

Assertions.assertThat(properties).isEqualTo(expected);
System.out.println(properties);
}

private static Stream<Arguments> testConvertToMapOfStringParams() {
return Stream.of(
Arguments.of(
Map.of(
PolarisCredentialProperty.AWS_KEY_ID,
"aws_key_id",
PolarisCredentialProperty.AWS_SECRET_KEY,
"aws_secret_access_key",
PolarisCredentialProperty.AWS_TOKEN,
"aws_session_token"),
Map.of(
S3FileIOProperties.ACCESS_KEY_ID,
"aws_key_id",
S3FileIOProperties.SECRET_ACCESS_KEY,
"aws_secret_access_key",
S3FileIOProperties.SESSION_TOKEN,
"aws_session_token")),
Arguments.of(
Map.of(
PolarisCredentialProperty.GCS_ACCESS_TOKEN,
"gcs_oauth2_token",
PolarisCredentialProperty.GCS_ACCESS_TOKEN_EXPIRES_AT,
"gcs_token_expires_at"),
Map.of(
GCPProperties.GCS_OAUTH2_TOKEN,
"gcs_oauth2_token",
GCPProperties.GCS_OAUTH2_TOKEN_EXPIRES_AT,
"gcs_token_expires_at")),
Arguments.of(
Map.of(
PolarisCredentialProperty.AZURE_ACCOUNT_HOST,
"azure_storage_account",
PolarisCredentialProperty.AZURE_SAS_TOKEN,
"azure_sas_token"),
Map.of(
AzureProperties.ADLS_SAS_TOKEN_PREFIX + "azure_storage_account",
"azure_sas_token")),
Arguments.of(
Map.of(PolarisCredentialProperty.AZURE_ACCESS_TOKEN, "azure_access_token"),
Map.of("", "azure_access_token")),
Arguments.of(
Map.of(PolarisCredentialProperty.EXPIRATION_TIME, "expiration_time"),
Map.of("expiration-time", "expiration_time")),
Arguments.of(
Map.of(
PolarisCredentialProperty.ADDITIONAL_STORAGE_CONFIG,
"{\"s3.session-token-expires-at-ms\": \"expires_at\"}"),
Map.of(
"additional-storage-config",
"{\"s3.session-token-expires-at-ms\": \"expires_at\"}")),
Arguments.of(Map.of(), Map.of()));
}
}
Loading