fix(spec): validate schema types by format version#2417
Conversation
|
@CTTY @liurenjie1024 please help take a look. |
|
Hi Manu, in terms of implementation, I think implementing |
ffd534c to
f5201b1
Compare
31327b2 to
13b6d45
Compare
|
@CTTY Please take another look. Thanks! |
13b6d45 to
cb2d9e8
Compare
Reject schema types that require a newer table format and update the DataFusion catalog registration path to request v3 only when the converted schema needs it. This keeps ordinary CREATE TABLE defaults on v2 while allowing timestamp_ns schemas to pass validation. Co-authored-by: Codex <codex@openai.com>
cb2d9e8 to
7d855c2
Compare
|
I would recommend for future commits, please don't use force push in Iceberg repos. It removes the ability to do a diff on what's changed since last commit. My understanding is that Github will squash commits when merging into |
| } | ||
|
|
||
| #[test] | ||
| fn test_table_metadata_builder_rejects_v2_list_element_requiring_v3() { |
There was a problem hiding this comment.
Thank you for including, I was going to request a nested data type unit test.
| } | ||
|
|
||
| /// Validates that all schema types are supported by the table format version. | ||
| pub fn validate_compatible_with_format_version( |
There was a problem hiding this comment.
It would be nice to have a 4+ layer deep unit test and an integration test with 100+ columns and multiple deeply nested structs to see how this validations affects Table creation times.
| let primitive = field_type.as_primitive_type()?; | ||
| let min_format_version = primitive.min_format_version()?; | ||
| (format_version < min_format_version).then(|| { | ||
| let column_name = self.name_by_field_id(field.id).unwrap_or(field.name.as_str()); |
There was a problem hiding this comment.
Is it possible to cause this unwrap_or to throw an error? I believe that would cause a panic and a crash at runtime, but I can't figure out if its possible to pass a field.name that would cause the as_str() to panic.
| let min_format_version = primitive.min_format_version()?; | ||
| (format_version < min_format_version).then(|| { |
There was a problem hiding this comment.
Do you need to do a comparison here? I thought the point of implementing the match in primitive.min_format_version() was so it would do the comparison for you.
Which issue does this PR close?
timestamp_ns/timestamptz_nstypes #2418.What changes are included in this PR?
Reject schema types that require a newer table format (e.g. v3). This keeps ordinary CREATE TABLE defaults on v2 while allowing timestamp_ns schemas to pass validation.
Are these changes tested?
Add UTs
Co-authored-by: @codex