|
62 | 62 | - [Reader Requirements for Vacuum Protocol Check](#reader-requirements-for-vacuum-protocol-check)
|
63 | 63 | - [Clustered Table](#clustered-table)
|
64 | 64 | - [Writer Requirements for Clustered Table](#writer-requirements-for-clustered-table)
|
| 65 | +- [Variant Data Type](#variant-data-type) |
| 66 | + - [Variant data in Parquet](#variant-data-in-parquet) |
| 67 | + - [Writer Requirements for Variant Type](#writer-requirements-for-variant-type) |
| 68 | + - [Reader Requirements for Variant Data Type](#reader-requirements-for-variant-data-type) |
| 69 | + - [Compatibility with other Delta Features](#compatibility-with-other-delta-features) |
65 | 70 | - [Requirements for Writers](#requirements-for-writers)
|
66 | 71 | - [Creation of New Log Entries](#creation-of-new-log-entries)
|
67 | 72 | - [Consistency Between Table Metadata and Data Files](#consistency-between-table-metadata-and-data-files)
|
|
100 | 105 | - [Struct Field](#struct-field)
|
101 | 106 | - [Array Type](#array-type)
|
102 | 107 | - [Map Type](#map-type)
|
| 108 | + - [Variant Type](#variant-type) |
103 | 109 | - [Column Metadata](#column-metadata)
|
104 | 110 | - [Example](#example)
|
105 | 111 | - [Checkpoint Schema](#checkpoint-schema)
|
@@ -1353,6 +1359,86 @@ The example above converts `configuration` field into JSON format, including esc
|
1353 | 1359 | }
|
1354 | 1360 | ```
|
1355 | 1361 |
|
| 1362 | + |
| 1363 | +# Variant Data Type |
| 1364 | + |
| 1365 | +This feature enables support for the `variant` data type, which stores semi-structured data. |
| 1366 | +The schema serialization method is described in [Schema Serialization Format](#schema-serialization-format). |
| 1367 | + |
| 1368 | +To support this feature: |
| 1369 | +- The table must be on Reader Version 3 and Writer Version 7 |
| 1370 | +- The feature `variantType` must exist in the table `protocol`'s `readerFeatures` and `writerFeatures`. |
| 1371 | + |
| 1372 | +## Example JSON-Encoded Delta Table Schema with Variant types |
| 1373 | + |
| 1374 | +``` |
| 1375 | +{ |
| 1376 | + "type" : "struct", |
| 1377 | + "fields" : [ { |
| 1378 | + "name" : "raw_data", |
| 1379 | + "type" : "variant", |
| 1380 | + "nullable" : true, |
| 1381 | + "metadata" : { } |
| 1382 | + }, { |
| 1383 | + "name" : "variant_array", |
| 1384 | + "type" : { |
| 1385 | + "type" : "array", |
| 1386 | + "elementType" : { |
| 1387 | + "type" : "variant" |
| 1388 | + }, |
| 1389 | + "containsNull" : false |
| 1390 | + }, |
| 1391 | + "nullable" : false, |
| 1392 | + "metadata" : { } |
| 1393 | + } ] |
| 1394 | +} |
| 1395 | +``` |
| 1396 | + |
| 1397 | +## Variant data in Parquet |
| 1398 | + |
| 1399 | +The Variant data type is represented as two binary encoded values, according to the [Spark Variant binary encoding specification](https://github.com/apache/spark/blob/master/common/variant/README.md). |
| 1400 | +The two binary values are named `value` and `metadata`. |
| 1401 | + |
| 1402 | +When writing Variant data to parquet files, the Variant data is written as a single Parquet struct, with the following fields: |
| 1403 | + |
| 1404 | +Struct field name | Parquet primitive type | Description |
| 1405 | +-|-|- |
| 1406 | +value | binary | The binary-encoded Variant value, as described in [Variant binary encoding](https://github.com/apache/spark/blob/master/common/variant/README.md) |
| 1407 | +metadata | binary | The binary-encoded Variant metadata, as described in [Variant binary encoding](https://github.com/apache/spark/blob/master/common/variant/README.md) |
| 1408 | + |
| 1409 | +The parquet struct must include the two struct fields `value` and `metadata`. |
| 1410 | +Supported writers must write the two binary fields, and supported readers must read the two binary fields. |
| 1411 | + |
| 1412 | +[Variant shredding](https://github.com/apache/parquet-format/blob/master/VariantShredding.md) will be introduced in a separate `variantShredding` table feature. will be introduced later, as a separate `variantShredding` table feature. |
| 1413 | + |
| 1414 | +## Writer Requirements for Variant Data Type |
| 1415 | + |
| 1416 | +When Variant type is supported (`writerFeatures` field of a table's `protocol` action contains `variantType`), writers: |
| 1417 | +- must write a column of type `variant` to parquet as a struct containing the fields `value` and `metadata` and storing values that conform to the [Variant binary encoding specification](https://github.com/apache/spark/blob/master/common/variant/README.md) |
| 1418 | +- must not write a parquet struct field named `typed_value` to avoid confusion with the field required by [Variant shredding](https://github.com/apache/parquet-format/blob/master/VariantShredding.md) with the same name. |
| 1419 | + |
| 1420 | +## Reader Requirements for Variant Data Type |
| 1421 | + |
| 1422 | +When Variant type is supported (`readerFeatures` field of a table's `protocol` action contains `variantType`), readers: |
| 1423 | +- must recognize and tolerate a `variant` data type in a Delta schema |
| 1424 | +- must use the correct physical schema (struct-of-binary, with fields `value` and `metadata`) when reading a Variant data type from file |
| 1425 | +- must make the column available to the engine: |
| 1426 | + - [Recommended] Expose and interpret the struct-of-binary as a single Variant field in accordance with the [Spark Variant binary encoding specification](https://github.com/apache/spark/blob/master/common/variant/README.md). |
| 1427 | + - [Alternate] Expose the raw physical struct-of-binary, e.g. if the engine does not support Variant. |
| 1428 | + - [Alternate] Convert the struct-of-binary to a string, and expose the string representation, e.g. if the engine does not support Variant. |
| 1429 | + |
| 1430 | +## Compatibility with other Delta Features |
| 1431 | + |
| 1432 | +Feature | Support for Variant Data Type |
| 1433 | +-|- |
| 1434 | +Partition Columns | **Supported:** A Variant column is allowed to be a non-partitioned column of a partitioned table. <br/> **Unsupported:** Variant is not a comparable data type, so it cannot be included in a partition column. |
| 1435 | +Clustered Tables | **Supported:** A Variant column is allowed to be a non-clustering column of a clustered table. <br/> **Unsupported:** Variant is not a comparable data type, so it cannot be included in a clustering column. |
| 1436 | +Delta Column Statistics | **Supported:** A Variant column supports the `nullCount` statistic. <br/> **Unsupported:** Variant is not a comparable data type, so a Variant column does not support the `minValues` and `maxValues` statistics. |
| 1437 | +Generated Columns | **Supported:** A Variant column is allowed to be used as a source in a generated column expression, as long as the Variant type is not the result type of the generated column expression. <br/> **Unsupported:** The Variant data type is not allowed to be the result type of a generated column expression. |
| 1438 | +Delta CHECK Constraints | **Supported:** A Variant column is allowed to be used for a CHECK constraint expression. |
| 1439 | +Default Column Values | **Supported:** A Variant column is allowed to have a default column value. |
| 1440 | +Change Data Feed | **Supported:** A table using the Variant data type is allowed to enable the Delta Change Data Feed. |
| 1441 | + |
1356 | 1442 | # In-Commit Timestamps
|
1357 | 1443 |
|
1358 | 1444 | The In-Commit Timestamps writer feature strongly associates a monotonically increasing timestamp with each commit by storing it in the commit's metadata.
|
@@ -1965,6 +2051,14 @@ type| Always the string "map".
|
1965 | 2051 | keyType| The type of element used for the key of this map, represented as a string containing the name of a primitive type, a struct definition, an array definition or a map definition
|
1966 | 2052 | valueType| The type of element used for the key of this map, represented as a string containing the name of a primitive type, a struct definition, an array definition or a map definition
|
1967 | 2053 |
|
| 2054 | +### Variant Type |
| 2055 | + |
| 2056 | +Variant data uses the Delta type name `variant` for Delta schema serialization. |
| 2057 | + |
| 2058 | +Field Name | Description |
| 2059 | +-|- |
| 2060 | +type | Always the string "variant" |
| 2061 | + |
1968 | 2062 | ### Column Metadata
|
1969 | 2063 | A column metadata stores various information about the column.
|
1970 | 2064 | For example, this MAY contain some keys like [`delta.columnMapping`](#column-mapping) or [`delta.generationExpression`](#generated-columns) or [`CURRENT_DEFAULT`](#default-columns).
|
|
0 commit comments