-
Notifications
You must be signed in to change notification settings - Fork 226
Feature: Rollback compaction on conflict #1285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature: Rollback compaction on conflict #1285
Conversation
468051c
to
05f9bad
Compare
01571dc
to
285cc29
Compare
285cc29
to
eb06c77
Compare
...e/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
...e/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Show resolved
Hide resolved
|
||
try { | ||
Tasks.foreach(new TableOperations[] {ops}) | ||
.retry(4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, can we make this configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more comments on the top, idea was to not diverge in terms of diff from the impl in CatalogHandle, hence didn't make this, this is ditto copy of what we have there.
Please let me know your thoughts considering this, happy to make this configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be the same as the commit-retry param? Not sure polaris needs to obey that but it does exist
...e/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
...e/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
...e/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
|
||
// TODO: Clean this up when CatalogHandler become extensible. | ||
// Copy of CatalogHandler#update | ||
private static LoadTableResponse updateTableWithRollback( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use baseCatalog
instead of making this static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly taken CatalogHandler#updateTable and it calling CatalogHandler#commit which is protected, so they are exact same except the part of rollback-logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the name is a little confusing, but I don't think we really want to copy too much CatalogHandler
logic into IcebergCatalogHandler
-- previously, this class was named like IcebergCatalogHandlerWrapper
. It's okay to copy small methods where we need to override behavior imo, but this is a lot of logic.
I think your comment is correct though that once CatalogHandler
is extensible a lot of this goes away. So maybe it's ok for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, I agree, I hope this methods were public and hence extensible.
...e/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Show resolved
Hide resolved
354b12b
to
f1de46c
Compare
@@ -762,7 +777,190 @@ public LoadTableResponse updateTable( | |||
if (isStaticFacade(catalog)) { | |||
throw new BadRequestException("Cannot update table on static-facade external catalogs."); | |||
} | |||
return CatalogHandlers.updateTable(baseCatalog, tableIdentifier, applyUpdateFilters(request)); | |||
// TODO: pending discussion if table property is right way, or a writer specific knob is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer this was a "catalog" owned table property or policy rather than something actually in the table metadata.json. It's the kind of thing I don't think is safe to allow for a generic user to alter/set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1417 :)
...e/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
...e/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
...e/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
...e/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
...e/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
request.updates().forEach((update) -> update.applyTo(newMetadataBuilder)); | ||
TableMetadata updated = newMetadataBuilder.build(); | ||
// always commit this | ||
taskOps.commit(base, updated); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an interesting approach and probably the right one, but I'm wondering if you considered just reloading the metadata.json from the time of the rollback and applying the commit to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posting Discussion offline : Idea was to not skip schema changes that could have potentially landed in between and for which we don't create a new snapshot we just create a new metadata.json so re-using metadata.json was something which we wanted. Though reloading the metadata.json from the time of rollback is also possible we can also checks like if current_schem_id, stats files etc are same or not to ensure we are not loosing any updates which doesn't have snapshot of their own, though this idea looks a bit easy with implementation wise hence choosing above.
TODO: think more on how else can this be achieved.
b7c6d2f
to
aa7d04b
Compare
aa7d04b
to
0db3770
Compare
About The Change
Intention is make the catalog smarter, to revert the compaction commits in case of crunch to let the writers who are actually adding or removing the data to the table succeed. In a sense treating compaction as always a lower priority process.
Presently the rest catalog client creates the snapshot and asks the Rest Server to apply the snapshot and gives this in a combination of requirement and update.
Polaris could apply some basic inference and generate some updates to metadata given a property is enabled at a table level, by saying that It will revert back the commit which was created by compaction and let the write succeed.
I had this PR in OSS, which was essentially doing this at the client end, but we think its best if we do this as server end. to support more such clients.
Devlist: https://lists.apache.org/thread/8k8t77dgk1vc124fnb61932bdp9kf1lc
New Scenario :