[WIP] Fix query streaming block #1079

danieloliveira-shopify · 2025-08-08T11:10:17Z

Important

This is a work in progress where I am investigating an issue that is only noticeable from versions after v0.62.0.
TLDR: Column values are mixed up and end up in the wrong database column. I.e: Value from input column A lands in column B. This is not happening to every single table and every single write but usually when the input block is bigger like over 900k rows.

Summary

Problem: When streaming large datasets using OnInput callbacks, columns could get mixed up (column A receiving values intended for column B). This affected production workloads with complex column types like Map(String, String).
Root Cause: Column type inference was applied only once at the start, but OnInput callbacks reset column data between blocks, causing subsequent blocks to use stale type information.
Solution: Extract inference logic into applyInference() function and re-apply it for each block in the streaming loop, ensuring fresh type information while preserving column order.
Testing: Added comprehensive test suite including production-scale validation (900k rows) and Map column integration tests.
Impact: Fixes column mixing in high-volume streaming scenarios while maintaining backward compatibility and performance.

CHANGELOG Description
Fixed column mixing bug in high-volume streaming scenarios

When streaming large datasets to ClickHouse using the OnInput callback, columns could get mixed up where column A would receive values intended for column B. This was particularly noticeable with complex column types like Map(String, String) and affected production workloads with 900k+ rows.
The issue was caused by column type inference being applied only once at the beginning of the streaming process. When using OnInput callbacks, subsequent blocks would use stale type information, leading to column order corruption.

Changes:
Extract column inference logic into applyInference() function
Re-apply inference for each block in the streaming loop
Preserve column order by processing input columns in their original order
Add comprehensive test suite including production-scale validation

Impact:
Fixes column mixing in high-volume streaming scenarios
Maintains backward compatibility - no API changes
Preserves performance with minimal overhead
Validated with production-scale tests (900k rows, Map columns)

Files Changed:
query.go: Fixed sendInput function to re-apply inference per block
query_test.go: Added comprehensive test suite for validation
This fix ensures that column type inference is applied fresh for each block, preventing data corruption in high-volume streaming scenarios while maintaining the existing API and performance characteristics.

Checklist

Delete items not relevant to your PR:

Unit and integration tests covering the common scenarios were added
A human-readable description of the changes was provided to include in CHANGELOG
For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

fix(query): re-apply column inference per block to prevent column mixing

ee882ea

danieloliveira-shopify force-pushed the fix-query-streaming-block branch from e3c8e9d to ee882ea Compare August 8, 2025 11:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Fix query streaming block #1079

[WIP] Fix query streaming block #1079

Uh oh!

danieloliveira-shopify commented Aug 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP] Fix query streaming block #1079

Are you sure you want to change the base?

[WIP] Fix query streaming block #1079

Uh oh!

Conversation

danieloliveira-shopify commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Important

Summary

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

danieloliveira-shopify commented Aug 8, 2025 •

edited

Loading