Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glue serde avro to json deserialization includes namespaces and union types #3237

Open
Ronserruya opened this issue Jan 15, 2023 · 7 comments · Fixed by #3931
Open

Glue serde avro to json deserialization includes namespaces and union types #3237

Ronserruya opened this issue Jan 15, 2023 · 7 comments · Fixed by #3931
Assignees
Labels
area/serde Serialization & Deserialization (plugins) scope/backend status/accepted An issue which has passed triage and has been accepted type/bug Something isn't working
Milestone

Comments

@Ronserruya
Copy link

Originally reported in #3224 , split into a separate issue following the discussion in #3235

When the glue serde deserializes to json from avro, it includes the record namespaces and types in the case of union. This is the first time I'm encountering the behaviour since the python deserializer or the one used in kafka-connect don't follow this behavior

Example:

Original msg:

{"name": {"first": "ron", "last": "serruya", "full": "ron serruya"}, "ids1": [5,6], "ids2": ["abc", 123]}

schema used:

{
  "type": "record",
  "name": "generation",
  "namespace": "top_level",
  "fields": [
    {
      "name": "name",
      "type": [
        {
          "type": "record",
          "name": "name",
          "namespace": "top_level.generation",
          "fields": [
            {
              "name": "raw",
              "type": [
                "string",
                "null"
              ]
            },
            {
              "name": "first",
              "type": "string"
            },
            {
              "name": "full",
              "type": "string"
            },
            {
              "name": "last",
              "type": ["string"]
            }
          ]
        },
        "null"
      ]
    },
    {
      "name": "ids1",
      "type": {"type": "array", "items": "int"}
    },
    {
      "name": "ids2",
      "type": {"type": "array", "items": ["string", "int"]}
    }
  ]
}

base64 encoded avro msg (just the msg, without the glue-related bytes at the start)
AAIGcm9uFnJvbiBzZXJydXlhAA5zZXJydXlhBAoMAAQABmFiYwL2AQA=

The current glue deserializer shows this msg as:

{
  "name": {
    "top_level.generation.name": {
      "raw": null,
      "first": "ron",
      "full": "ron serruya",
      "last": {
        "string": "serruya"
      }
    }
  },
  "ids1": [
    5,
    6
  ],
  "ids2": [
    {
      "string": "abc"
    },
    {
      "int": 123
    }
  ]
}

As you can see it adds string, int, or the record namespace top_level.generation.name

I fixed this issue locally by adding this line: encoder.setIncludeNamespace(false); in the avroRecordToJson method

But according to the comment in #3235 , that's not a completely valid fix since it can break other stuff?

Before and after the fix:
Screen Shot 2023-01-15 at 15 48 52
Screen Shot 2023-01-15 at 15 45 44

@Ronserruya Ronserruya added status/triage Issues pending maintainers triage type/bug Something isn't working labels Jan 15, 2023
@Haarolean Haarolean added scope/backend status/accepted An issue which has passed triage and has been accepted and removed status/triage Issues pending maintainers triage labels Jan 23, 2023
@Haarolean
Copy link
Contributor

Hey, thanks, we'll take a look.

@S1M0NM
Copy link

S1M0NM commented Apr 27, 2023

This seems to also apply to the default SchemaRegistry Serde.

Is there a way i can fix this for the included SchemaRegistry serde?

@Haarolean Haarolean added the area/serde Serialization & Deserialization (plugins) label May 18, 2023
@frankgrimes97
Copy link

@Haarolean Any update on this? We're also being affected by this odd display behavior and would like to see it fixed.

@Haarolean
Copy link
Contributor

@frankgrimes97 planned for 0.8

@Haarolean Haarolean added this to the 0.8 milestone Jun 5, 2023
@iliax iliax linked a pull request Jun 12, 2023 that will close this issue
13 tasks
@iliax iliax reopened this Jun 21, 2023
@iliax
Copy link
Contributor

iliax commented Jun 21, 2023

Reopening, since it is only fixed for Kafka schema registry, not glue

@frankgrimes97
Copy link

@Haarolean Any update on when we might see a fix and 0.8 release?

@Haarolean
Copy link
Contributor

@Haarolean Any update on when we might see a fix and 0.8 release?

@frankgrimes97
#4255
kafbat/kafka-ui#23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/serde Serialization & Deserialization (plugins) scope/backend status/accepted An issue which has passed triage and has been accepted type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants