Skip to content

fix(spec): validate schema types by format version#2417

Open
manuzhang wants to merge 1 commit into
apache:mainfrom
manuzhang:codex/block-creating-v1/v2-tables-with-timestamp_ns
Open

fix(spec): validate schema types by format version#2417
manuzhang wants to merge 1 commit into
apache:mainfrom
manuzhang:codex/block-creating-v1/v2-tables-with-timestamp_ns

Conversation

@manuzhang
Copy link
Copy Markdown
Member

@manuzhang manuzhang commented May 9, 2026

Which issue does this PR close?

What changes are included in this PR?

Reject schema types that require a newer table format (e.g. v3). This keeps ordinary CREATE TABLE defaults on v2 while allowing timestamp_ns schemas to pass validation.

Are these changes tested?

Add UTs


Co-authored-by: @codex

@manuzhang
Copy link
Copy Markdown
Member Author

manuzhang commented May 9, 2026

@CTTY @liurenjie1024 please help take a look.

@CTTY
Copy link
Copy Markdown
Collaborator

CTTY commented May 11, 2026

Hi Manu, in terms of implementation, I think implementing trait SchemaVisitor would be a cleaner approach compared to having multiple functions for different data types. wdyt?

@manuzhang manuzhang force-pushed the codex/block-creating-v1/v2-tables-with-timestamp_ns branch from ffd534c to f5201b1 Compare May 12, 2026 13:11
Comment thread crates/iceberg/src/spec/table_metadata.rs Outdated
Comment thread crates/iceberg/src/spec/table_metadata_builder.rs Outdated
@manuzhang manuzhang force-pushed the codex/block-creating-v1/v2-tables-with-timestamp_ns branch 2 times, most recently from 31327b2 to 13b6d45 Compare May 13, 2026 14:42
@manuzhang
Copy link
Copy Markdown
Member Author

@CTTY Please take another look. Thanks!

@manuzhang manuzhang force-pushed the codex/block-creating-v1/v2-tables-with-timestamp_ns branch from 13b6d45 to cb2d9e8 Compare May 18, 2026 16:29
Comment thread crates/iceberg/src/spec/table_metadata.rs Outdated
Comment thread crates/iceberg/src/spec/table_metadata.rs Outdated
Comment thread crates/iceberg/src/spec/table_metadata_builder.rs Outdated
Reject schema types that require a newer table format and update the DataFusion catalog registration path to request v3 only when the converted schema needs it. This keeps ordinary CREATE TABLE defaults on v2 while allowing timestamp_ns schemas to pass validation.

Co-authored-by: Codex <codex@openai.com>
@manuzhang manuzhang force-pushed the codex/block-creating-v1/v2-tables-with-timestamp_ns branch from cb2d9e8 to 7d855c2 Compare May 29, 2026 07:10
@manuzhang manuzhang requested review from CTTY and Kurtiscwright May 29, 2026 07:29
@Kurtiscwright
Copy link
Copy Markdown
Contributor

I would recommend for future commits, please don't use force push in Iceberg repos. It removes the ability to do a diff on what's changed since last commit. My understanding is that Github will squash commits when merging into Main if that's the behavior you were attempting to manage by using force-push.

}

#[test]
fn test_table_metadata_builder_rejects_v2_list_element_requiring_v3() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for including, I was going to request a nested data type unit test.

}

/// Validates that all schema types are supported by the table format version.
pub fn validate_compatible_with_format_version(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a 4+ layer deep unit test and an integration test with 100+ columns and multiple deeply nested structs to see how this validations affects Table creation times.

let primitive = field_type.as_primitive_type()?;
let min_format_version = primitive.min_format_version()?;
(format_version < min_format_version).then(|| {
let column_name = self.name_by_field_id(field.id).unwrap_or(field.name.as_str());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to cause this unwrap_or to throw an error? I believe that would cause a panic and a crash at runtime, but I can't figure out if its possible to pass a field.name that would cause the as_str() to panic.

Comment on lines +448 to +449
let min_format_version = primitive.min_format_version()?;
(format_version < min_format_version).then(|| {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to do a comparison here? I thought the point of implementing the match in primitive.min_format_version() was so it would do the comparison for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No validation that prevents creating a V2 table with timestamp_ns/timestamptz_ns types

3 participants