-
Notifications
You must be signed in to change notification settings - Fork 149
Description
Apache Iceberg version
main (development)
Please describe the bug 🐞
When using glue and a large message schema we get this error on every updateCatalog
Glue: UpdateTable, https response error StatusCode: 400, RequestID: 21d099dd-a972-488f-b3ab-7a6a2a1a4c35, InvalidInputException: Payload size of request exceeded limit
I believe it's because constructTableInput always includes all schema columns via schemasToGlueColumns(staged.Metadata()) on every UpdateTable call. We have roughly 3600 fields. This is by design because we use these records to write to elastic search with the "elastic common schema" format which has a lot of fields. A usual message will only have between 20-50 fields set but all fields exist in the schema.
Would it make sense to be able to avoid having the schema in updateTable? I believe it's only used as metadata used in athena, the real schema would still exist in the metadata and parquet files?