Skip to content

updateTable using glue with a large schema fails #701

@vrecan

Description

@vrecan

Apache Iceberg version

main (development)

Please describe the bug 🐞

When using glue and a large message schema we get this error on every updateCatalog

Glue: UpdateTable, https response error StatusCode: 400, RequestID: 21d099dd-a972-488f-b3ab-7a6a2a1a4c35, InvalidInputException: Payload size of request exceeded limit

I believe it's because constructTableInput always includes all schema columns via schemasToGlueColumns(staged.Metadata()) on every UpdateTable call. We have roughly 3600 fields. This is by design because we use these records to write to elastic search with the "elastic common schema" format which has a lot of fields. A usual message will only have between 20-50 fields set but all fields exist in the schema.

Would it make sense to be able to avoid having the schema in updateTable? I believe it's only used as metadata used in athena, the real schema would still exist in the metadata and parquet files?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions