Skip to content

Conversation

beseder1
Copy link

@beseder1 beseder1 commented Sep 18, 2025

With the current implementation of the RecordVisitor, Avro schema unions are handled in a non-deterministic way. This can cause deserialization errors when different AvroMapper instances are used for serialization and deserialization.

Specifically, because the subtype ordering is unstable, deserialization may attempt to resolve a serialized object against the wrong subtype, resulting in errors.

This PR makes the ordering deterministic by using an ArrayList instead of a Set, ensuring consistent behavior across serialization and deserialization.

@beseder1 beseder1 force-pushed the fix/601-deterministic-avro-unions branch from 94b2377 to 0fcc23d Compare September 18, 2025 06:25
final List<Schema> unionSchemas = new ArrayList<>();
// Initialize with this schema
if (_type.isConcrete()) {
unionSchemas.add(_typeSchema);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not prevent duplicate adds here instead? Check if unionSchemas has exact instance (iterate over, compare with ==), only add if not yet included?

final Set<Schema> seenSchemas = Collections.newSetFromMap(new IdentityHashMap<>());

for(Schema s : schemas) {
if(!seenSchemas.contains(s)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to use Set, instead of set.contains() should rather try to add, check response value (true means entry was added, i.e. was not yet included).

@cowtowncoder cowtowncoder added the cla-needed PR looks good (although may also require code review), but CLA needed from submitter label Sep 18, 2025
@cowtowncoder
Copy link
Member

Good start! PR much appreciated.
I think we can avoid separate de-duping by checking for existence on-the-fly, but aside from that I think this makes sense.

One process thing: we need CLA (from either https://github.com/FasterXML/jackson/blob/main/contributor-agreement.pdf (individual) or https://github.com/FasterXML/jackson/blob/main/contributor-agreement.pdf (Corporate CLA)). This is only needed once per contributor, one is good for all future contributions.

The usual way is to print, fill & sign, email to cla at fasterxml dot com. Once this is done I can proceed with merging (pending final review etc).

Looking forward to merging this PR (and backporting in 2.19, I think).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-needed PR looks good (although may also require code review), but CLA needed from submitter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants