Skip to content

Conversation

martin-g
Copy link
Member

Extend the test case with an Option field with a default.

#252 (comment)

@martin-g
Copy link
Member Author

Hm!
I think there is a bug here.
Currently this produces:

RecordField {
   ...,
   default: Some(
     String("null"),
   ),
   ...

while it should be default: Some(Value::Null).

I see two ways to improve/fix it:

  1. Update our darling usage to support #[avro(default = null)], notice that null is not a String!
  2. If 1) does not work for some reason then check whether the value is "null" and the first union variant is Null then use Value::Null

@Kriskras99
Copy link
Contributor

Kriskras99 commented Aug 12, 2025

Shouldn't it only parse r#""{something}""# as a string? "{something}" shoud always be a type or integer literal right?

@martin-g
Copy link
Member Author

Shouldn't it only parse r#""{something}""# as a string?

Yes! Debugging confirms this.

But still the default's value must be Value::Null, not Value::String("null")... Still debugging.

@Kriskras99
Copy link
Contributor

I think the bug is in the schema generated. It should be "type": null not "type": "null". See the field default values: https://avro.apache.org/docs/1.11.1/specification/#schema-record

@Kriskras99
Copy link
Contributor

Kriskras99 commented Aug 12, 2025

(Note that when a default value is specified for a record field whose type is a union, the type of the default value must match the first element of the union. Thus, for unions containing “null”, the “null” is usually listed first, since the default value of such unions is typically null.)
From the Union part of the reference. So that would put the bug in the schema parser.

@Kriskras99
Copy link
Contributor

Kriskras99 commented Aug 12, 2025

So thinking about it for a bit, there are two bugs:

  • In Avro derive it should generate a JSON null when the default is "null" (when it's ""null"", it should generate a JSON string)
  • In the schema parser it should check if the default matches the type of the first element of the union.
    • Thus for a schema with "type": ["null", "string"], default: "something" should return an error

@martin-g
Copy link
Member Author

  • In Avro derive it should generate a JSON null when the default is "null" (instead of ""null"")

It does produce serde_json::Value::Null

@Kriskras99
Copy link
Contributor

But that should generate a "default": null which should make this test fail: https://github.com/apache/avro-rs/pull/260/files#diff-2e1b9de8fab3f41491702f9d75b536a4a208002e0d84c489c8404a6d8bb7ec45R1452

@martin-g
Copy link
Member Author

Yes! There is a bug somewhere!

avro-rs/avro/src/schema.rs

Lines 7203 to 7226 in 23a9af7

let valid_schema = r#"
{
"type": "record",
"name": "SampleSchema",
"fields": [
{
"name": "order",
"type": {
"type": "record",
"name": "Order",
"fields": [
{
"name": "order_number",
"type": ["null", "string"],
"default": null
},
{ "name": "order_date", "type": "string" }
]
}
}
]
}"#;
let schema = Schema::parse_str(valid_schema);
assert!(schema.is_ok());
produces:

fields: [
                                RecordField {
                                    name: "order_number",
                                    doc: None,
                                    aliases: None,
                                    default: Some(
                                        Null,
                                    ),
                                    schema: Union(
                                        UnionSchema {
                                            schemas: [
                                                Null,
                                                String,
                                            ],
                                            variant_index: {
                                                Null: 0,
                                                String: 1,
                                            },
                                        },
                                    ),

so it seems to be somewhere in the derive macro...

@martin-g
Copy link
Member Author

I think the bug is in schema_equality.rs

@martin-g
Copy link
Member Author

Right! The default equality (SpecificationEq) uses the canonical form of the schema and as we already know the default field is not part of it ...

It should be `null` instead of `"null"`

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
@martin-g
Copy link
Member Author

Another bug: the default equality impl is StructFieldEq and it seems it also fails to detect it!

@martin-g
Copy link
Member Author

Well, I am not sure whether it is a bug or not.
StructFieldEq behaves like SpecificationEq and compares only the schemas' names and their fields' schemas.

@Kriskras99
Copy link
Contributor

Looking at StructFieldEq it doesn't seem to check the default field

@Kriskras99
Copy link
Contributor

I think the only way to compare this would be with serde_json::to_string{_pretty}(schema)

@Kriskras99
Copy link
Contributor

Or change StructFieldEq to also look at default (and potentially other things like doc)

@martin-g
Copy link
Member Author

Right!

git diff
diff --git i/avro/src/schema_equality.rs w/avro/src/schema_equality.rs
index 1097594..83bf969 100644
--- i/avro/src/schema_equality.rs
+++ w/avro/src/schema_equality.rs
@@ -217,7 +217,9 @@ impl StructFieldEq {
             && fields_one
                 .iter()
                 .zip(fields_two.iter())
-                .all(|(f1, f2)| self.compare(&f1.schema, &f2.schema))
+                .all(|(f1, f2)| {
+                    self.compare(&f1.schema, &f2.schema) && f1.default == f2.default
+                })
     }
 }
 
diff --git i/avro_derive/tests/derive.rs w/avro_derive/tests/derive.rs
index a85caa0..aea426f 100644
--- i/avro_derive/tests/derive.rs
+++ w/avro_derive/tests/derive.rs
@@ -1449,7 +1449,7 @@ mod test_derive {
                 {
                     "name":"optional",
                     "type": ["null", "string"],
-                    "default": null
+                    "default": "null"
                 }
             ]

detects the issue!
But I am not sure whether it this should be done!

Hopefully someone will explain why the default should be STRIPed for canonical form at https://lists.apache.org/thread/j0no5h8sjlyo1xdzw47dftgcn2k84w2h

@Kriskras99
Copy link
Contributor

Hmmm. I think I understand why default is stripped from the PCF. It is because the PCF is supposed to be used to check the equality of the writer and reader schema, not compatibility. As the writer does nothing with the default field when writing, it's not relevant for equality (if a reader wants to use a compatible schema it first needs to read with the original schema and then convert to the compatible schema).
As the PFC (and the fingerprints that use it as a base) are only for checking equality with the writer the don't include the default field.

For our testing purposes it's probably better to use something like this:

fn strict_schema_equality(expected: &str, schema: Schema) {
    let expected: Value = serde_json::from_str(expected).unwrap();
    let got = serde_json::to_value(schema).unwrap();
    assert_eq!(got, expected)
}

This checks that all fields are as expected, including doc, default and other fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants