-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amazon S3 Tables Integration #577
Comments
@flyrain Thank you for starting the issue, I have two questions:
BTW, the link of get_table_metadata_location seems wrong? and it should be from S3 Tables API doc. |
That's right. This integration won't change the source of truth(s3 table in this case), and other tools or pipelines against the source catalog should still work as is.
Polaris client will get the failure message in that case, then it can retry or just fail itself. The table is still consistent. But the failure may leave orphan files, which is fine as the other clients also leave orphan files in case of failure. Thanks for pointing out the wrong link. Updated. |
Do you mean that "update-table-metadata-location" will be part of the commit to the source catalog (e.g. HMS)? |
Yes, you could consider Polaris as a proxy when the source of truth is a remote catalog. Here are two different commit paths depends on different client types:
|
@flyrain Got it, so in this proposal, HMS and S3 Tables are mutually exclusive? Sorry for being unclear, the reason I asked the original question is to see if it is possible to operate HMS and S3 Tables at the same time, and use HMS as the source of the truth (because we want to keep our source of the truth data in HMS) but leverage some of S3 Tables additional features. |
Yes. They are mutually exclusive. I think it's better to only have one source of truth. In case of s3 tables, s3 service is the source of the truth for sure. An integration of HMS would be like:
HMS will have to invoke the s3 api It's another topic anyway. We may discuss it elsewhere. |
I wanted to kindly check if there are any updates on when these items are expected to be released |
Thank you for bringing this up @flyrain ! In general, I agree with your implementation details described about the read and write paths. I think the challenge here is less about implementation, but more about how federation looks like for Polaris. We probably want to get consensus around the federation proposal first, before proceeding further: Because there could be the option to directly mount an entire table bucket, or mount individual S3 tables. |
Polaris is designed to act as a REST facade for S3 tables, enabling both read and write operations by interacting with the S3 table API. Polaris registers an S3 table using its metadata location. A flag will be needed to label the new Iceberg table is a s3 table. Below is a summary of the proposed approach:
Read Path
get_table_metadata_location
to fetch themetadata.json
file.metadata.json
into a LoadTableResponse to return to the client.Write Path
When the table is updated, Polaris will:
metadata.json
.update-table-metadata-location
to commit the new metadata.AuthZ/AuthN
We need to ensure that the AWS role used for creating the Polaris catalog has the read and write privileges of the s3 table.
Describe alternatives you've considered
No response
The text was updated successfully, but these errors were encountered: