From f417586cf01152350b6e4a726917907d7d597182 Mon Sep 17 00:00:00 2001 From: jiahao-db Date: Mon, 17 Nov 2025 17:47:10 +0000 Subject: [PATCH 1/6] init --- .../materialize-partition-columns.md | 33 +++++++++++++++++++ 1 file changed, 33 insertions(+) create mode 100644 protocol_rfcs/materialize-partition-columns.md diff --git a/protocol_rfcs/materialize-partition-columns.md b/protocol_rfcs/materialize-partition-columns.md new file mode 100644 index 00000000000..98a87bfb536 --- /dev/null +++ b/protocol_rfcs/materialize-partition-columns.md @@ -0,0 +1,33 @@ +# Materialize Partition Columns + +## Overview + +Currently, Delta tables store partition column values primarily in the table metadata (specifically in the `partitionValues` field of `AddFile` actions), and by default these columns are not physically written into the Parquet data files themselves. + +This RFC proposes a new writer-only table feature called `materializePartitionColumns`. When enabled, this feature requires partition columns to be physically materialized in Parquet data files alongside the data columns. + +## Motivation + +This feature provides a mechanism to require partition column materialization at the protocol level, ensuring all writers to the table comply with this requirement during the period when the feature is enabled. + +Materializing partition columns enhances compatibility with Parquet readers that access Parquet files directly and do not interpret Delta’s AddFile metadata, as well as with Iceberg readers, which expect partition columns to be stored within the data files. + +Additionally, having partition information embedded in the data files themselves enables more flexible data reorganization strategies, as files can be physically rearranged without strict partition directory constraints while still maintaining partition information. + +**For further discussions about this protocol change, please refer to the Github issue - https://github.com/delta-io/delta/issues/XXXX** + +-------- + + +> ***New Section after Identity Columns section*** +## Materialize Partition Columns + +When this feature is enabled, partition columns are physically written to Parquet files alongside the data columns, which can improve flexibility with respect to data layout changes in the future, and make these data files easier to interpret for readers unfamiliar with partition values. To support this feature: + - The table must be on Writer Version 7, and a feature name `materializePartitionColumns` must exist in the table `protocol`'s `writerFeatures`. + +When supported: + - The table respects metadata property `delta.enableMaterializePartitionColumnsFeature` for enablement of this feature. The writer feature `materializePartitionColumns` is auto-supported when this property is set to `true`. + - When the writer feature `materializePartitionColumns` is set in the protocol, writers must require that partition column values are materialized into any newly created data file, placed after the data columns in the parquet schema. + - When the writer feature `materializePartitionColumns` is not set in the table protocol, writers are not required to write partition columns to data files. Note that other features might still require materialization of partition values, such as [Iceberg Compatibility V1](#iceberg-compatibility-v1) + +This feature does not impose any requirements on readers. All Delta readers must be able to read the table regardless of whether partition columns are materialized in the data files. If partition values are present in both parquet and AddFile metadata, readers should continue to read partition values from AddFile metadata. From 7654b30b2c964ed1f290a93260e005a9fbc99241 Mon Sep 17 00:00:00 2001 From: jiahao-db Date: Fri, 21 Nov 2025 06:05:16 +0000 Subject: [PATCH 2/6] Added link to the RFC issue --- protocol_rfcs/materialize-partition-columns.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/protocol_rfcs/materialize-partition-columns.md b/protocol_rfcs/materialize-partition-columns.md index 98a87bfb536..d89c0234829 100644 --- a/protocol_rfcs/materialize-partition-columns.md +++ b/protocol_rfcs/materialize-partition-columns.md @@ -14,7 +14,7 @@ Materializing partition columns enhances compatibility with Parquet readers that Additionally, having partition information embedded in the data files themselves enables more flexible data reorganization strategies, as files can be physically rearranged without strict partition directory constraints while still maintaining partition information. -**For further discussions about this protocol change, please refer to the Github issue - https://github.com/delta-io/delta/issues/XXXX** +**For further discussions about this protocol change, please refer to the Github issue - https://github.com/delta-io/delta/issues/5555** -------- From 690f113ab9081cee63b5191ee566c726c91ad29d Mon Sep 17 00:00:00 2001 From: jiahao-db Date: Fri, 21 Nov 2025 06:13:08 +0000 Subject: [PATCH 3/6] Updated README.md --- protocol_rfcs/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/protocol_rfcs/README.md b/protocol_rfcs/README.md index d9ff7c9f4f3..6350952566e 100644 --- a/protocol_rfcs/README.md +++ b/protocol_rfcs/README.md @@ -25,6 +25,7 @@ Here is the history of all the RFCs propose/accepted/rejected since Feb 6, 2024, | 2025-03-18 | [iceberg-writer-compat-v1.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/iceberg-writer-compat-v1.md) | https://github.com/delta-io/delta/issues/4284 | IcebergWriterCompatV1 | | 2025-04-07 | [catalog-managed.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/catalog-managed.md) | https://github.com/delta-io/delta/issues/4381 | Catalog-Managed Tables | | 2025-05-06 | [variant-shredding.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-shredding.md) | https://github.com/delta-io/delta/issues/4032 | Variant Shredding | +| 2025-11-20 | [materialize-partition-columns.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/materialize-partition-columns.md) | https://github.com/delta-io/delta/issues/5555 | Materialize Partition Columns | ### Accepted RFCs From 810244ba9e79b508512d046e6e1297cf94050a5e Mon Sep 17 00:00:00 2001 From: jiahao-db Date: Fri, 21 Nov 2025 06:18:20 +0000 Subject: [PATCH 4/6] Updated issue linke --- protocol_rfcs/materialize-partition-columns.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/protocol_rfcs/materialize-partition-columns.md b/protocol_rfcs/materialize-partition-columns.md index d89c0234829..8e75e629253 100644 --- a/protocol_rfcs/materialize-partition-columns.md +++ b/protocol_rfcs/materialize-partition-columns.md @@ -1,5 +1,7 @@ # Materialize Partition Columns +**Associated Github issue for discussions: https://github.com/delta-io/delta/issues/5555** + ## Overview Currently, Delta tables store partition column values primarily in the table metadata (specifically in the `partitionValues` field of `AddFile` actions), and by default these columns are not physically written into the Parquet data files themselves. @@ -14,8 +16,6 @@ Materializing partition columns enhances compatibility with Parquet readers that Additionally, having partition information embedded in the data files themselves enables more flexible data reorganization strategies, as files can be physically rearranged without strict partition directory constraints while still maintaining partition information. -**For further discussions about this protocol change, please refer to the Github issue - https://github.com/delta-io/delta/issues/5555** - -------- From e1f8069ad76ec8ad36c541cc5b64695f4ff3a66b Mon Sep 17 00:00:00 2001 From: jiahao-db Date: Mon, 24 Nov 2025 18:16:25 +0000 Subject: [PATCH 5/6] updated rfc description --- protocol_rfcs/materialize-partition-columns.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/protocol_rfcs/materialize-partition-columns.md b/protocol_rfcs/materialize-partition-columns.md index 8e75e629253..5379be502af 100644 --- a/protocol_rfcs/materialize-partition-columns.md +++ b/protocol_rfcs/materialize-partition-columns.md @@ -14,7 +14,7 @@ This feature provides a mechanism to require partition column materialization at Materializing partition columns enhances compatibility with Parquet readers that access Parquet files directly and do not interpret Delta’s AddFile metadata, as well as with Iceberg readers, which expect partition columns to be stored within the data files. -Additionally, having partition information embedded in the data files themselves enables more flexible data reorganization strategies, as files can be physically rearranged without strict partition directory constraints while still maintaining partition information. +Additionally, having partition information embedded in the data files themselves enables more flexible data reorganization strategies. The same parquet files could be linked in future versions of a table that do not have the same (or any) partition columns. -------- @@ -26,8 +26,12 @@ When this feature is enabled, partition columns are physically written to Parque - The table must be on Writer Version 7, and a feature name `materializePartitionColumns` must exist in the table `protocol`'s `writerFeatures`. When supported: - - The table respects metadata property `delta.enableMaterializePartitionColumnsFeature` for enablement of this feature. The writer feature `materializePartitionColumns` is auto-supported when this property is set to `true`. - - When the writer feature `materializePartitionColumns` is set in the protocol, writers must require that partition column values are materialized into any newly created data file, placed after the data columns in the parquet schema. - - When the writer feature `materializePartitionColumns` is not set in the table protocol, writers are not required to write partition columns to data files. Note that other features might still require materialization of partition values, such as [Iceberg Compatibility V1](#iceberg-compatibility-v1) - -This feature does not impose any requirements on readers. All Delta readers must be able to read the table regardless of whether partition columns are materialized in the data files. If partition values are present in both parquet and AddFile metadata, readers should continue to read partition values from AddFile metadata. + - The table respects metadata property `delta.enableMaterializePartitionColumnsFeature` for enablement of this feature. The writer feature `materializePartitionColumns` is auto-enabled when this property is set to `true`. + - When the writer feature `materializePartitionColumns` is set in the protocol, writers must require that partition column values are materialized into any newly created data file, placed after the data columns in the parquet + schema. This mimics the same partition column materialization requirement from [IcebergCompatV1](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#iceberg-compatibility-v1) +and +[IcebergCompatV2](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#iceberg-compatibility-v2). As such, the `materializePartitionColumns` feature can be seen as a subset of the requirements imposed by those features, providing the partition column materialization guarantee independently without requiring full + Iceberg compatibility. + - When the writer feature `materializePartitionColumns` is not set in the table protocol, writers are not required to write partition columns to data files. Note that other features might still require materialization of partition values, such as [IcebergCompatV1](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#iceberg-compatibility-v1) + +This feature does not impose any requirements on readers. All Delta readers must be able to read the table regardless of whether partition columns are materialized in the data files. If partition values are present in both parquet and AddFile metadata, Delta readers should continue to read partition values from AddFile metadata. From c2dc3b2b72a304d7b85f230d3767a30866d5676e Mon Sep 17 00:00:00 2001 From: jiahao-db Date: Wed, 26 Nov 2025 19:44:30 +0000 Subject: [PATCH 6/6] updated --- protocol_rfcs/materialize-partition-columns.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/protocol_rfcs/materialize-partition-columns.md b/protocol_rfcs/materialize-partition-columns.md index 5379be502af..585ea6b59ed 100644 --- a/protocol_rfcs/materialize-partition-columns.md +++ b/protocol_rfcs/materialize-partition-columns.md @@ -27,7 +27,7 @@ When this feature is enabled, partition columns are physically written to Parque When supported: - The table respects metadata property `delta.enableMaterializePartitionColumnsFeature` for enablement of this feature. The writer feature `materializePartitionColumns` is auto-enabled when this property is set to `true`. - - When the writer feature `materializePartitionColumns` is set in the protocol, writers must require that partition column values are materialized into any newly created data file, placed after the data columns in the parquet + - When the writer feature `materializePartitionColumns` is set in the protocol, writers must materialize partition columns into any newly created data file, placing them after the data columns in the parquet schema. This mimics the same partition column materialization requirement from [IcebergCompatV1](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#iceberg-compatibility-v1) and [IcebergCompatV2](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#iceberg-compatibility-v2). As such, the `materializePartitionColumns` feature can be seen as a subset of the requirements imposed by those features, providing the partition column materialization guarantee independently without requiring full