train_pgo: Add json and avro training #29487

StephanDollberg · 2026-02-02T16:28:21Z

Add Avro and JSON training to further help with iceberg perf when those are in use.

Backports Required

Release Notes

none

Copilot

Pull request overview

This PR enhances PGO (Profile-Guided Optimization) training by adding Avro and JSON schema support alongside the existing Protobuf training. This helps optimize Redpanda's performance when Iceberg is used with these different serialization formats.

Changes:

Added Avro and JSON schema definitions with corresponding sample payloads
Refactored the Protobuf-specific setup function into a generic schema setup function that supports multiple formats
Added a new function to send messages via rpk for testing Avro and JSON schemas
Updated topic names to be format-specific and fixed hardcoded references

tools/pgo_bolt/train_pgo.py

Iceberg generally still a massive performance concern. Adding avro training adds a another perf bump to help with that. We do a very minimal rpk based training which seems to result in good enough training coverage.

travisdowns · 2026-02-02T19:36:41Z

tools/pgo_bolt/train_pgo.py

+}"""
+JSON_TOPIC_NAME = "iceberg-json-topic"
+JSON_SAMPLE_PAYLOAD = json.dumps(
+    {"name": "hello my name is json shady", "id": 13579, "ts": 1625079045123456}


put an array and null in there so we hit those paths?

I could add it but it seems make very little difference either way. E.g.: in our datalake omb test we use a message with only string fields and just training on a single integer id field results in same perf as what is shown above. So it seems to not make much difference. Whatever you prefer.

"So it seems to not make much difference."

The effort to do it is close to zero, so yes I think we should. We always suffer from the problem of our tests being much narrower than real world scenarios so we need to augment our tests (which unfortunately are very similar between training and validation) with our judgment and guesses: imagine someone has a schema which is mostly one giant array. Then it may matter.

Similar to the avro training also add a json equivalent.

StephanDollberg requested review from ballard26, Copilot and travisdowns February 2, 2026 16:28

Copilot AI reviewed Feb 2, 2026

View reviewed changes

tools/pgo_bolt/train_pgo.py Outdated Show resolved Hide resolved

train_pgo: Add minimal avro training

945edef

Iceberg generally still a massive performance concern. Adding avro training adds a another perf bump to help with that. We do a very minimal rpk based training which seems to result in good enough training coverage.

StephanDollberg force-pushed the stephan/avro-json-training branch from c5c32b6 to 92eb520 Compare February 2, 2026 16:33

travisdowns reviewed Feb 2, 2026

View reviewed changes

train_pgo: Add json iceberg training

af44bed

Similar to the avro training also add a json equivalent.

StephanDollberg force-pushed the stephan/avro-json-training branch from 92eb520 to af44bed Compare February 3, 2026 16:42

travisdowns approved these changes Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train_pgo: Add json and avro training #29487

train_pgo: Add json and avro training #29487

StephanDollberg commented Feb 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

travisdowns Feb 2, 2026 •

edited

Loading

Uh oh!

StephanDollberg Feb 2, 2026

Uh oh!

travisdowns Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

train_pgo: Add json and avro training #29487

Are you sure you want to change the base?

train_pgo: Add json and avro training #29487

Conversation

StephanDollberg commented Feb 2, 2026

Backports Required

Release Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

travisdowns Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StephanDollberg Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

travisdowns Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

travisdowns Feb 2, 2026 •

edited

Loading