Skip to content

Conversation

evenyag
Copy link
Contributor

@evenyag evenyag commented Sep 17, 2025

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#6078

What's changed and what's your intention?

This PR converts batches to flat format when reading files with the old format.

  • FlatReadFormat will check whether it needs to convert the format
  • Refactor ReadFormat::new to receive num_columns and file_path for FlatReadFormat
  • FlatFormat now uses an enum ParquetAdapter to get stats and compute projection for the parquet file.
  • ParquetAdapter will delegate the operation to PrimaryKeyReadFormat if the file has the old format.
  • Support sparse encoding in flat format (BulkMemtable will also convert the flat format)

Other minor fixes:

  • Only uses dictionary type for strings in compute_input_arrow_schema()
  • Avoid adding the same file to the file cache index twice

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@github-actions github-actions bot added size/M docs-not-required This change does not impact docs. labels Sep 17, 2025
Signed-off-by: evenyag <[email protected]>
adds a method flat_sst_arrow_schema_column_num() to get the field num

Signed-off-by: evenyag <[email protected]>
Adds two structs ParquetFlat and ParquetPrimaryKeyToFlat.
ParquetPrimaryKeyToFlat delegates stats and projection to the
PrimaryKeyReadFormat.

Signed-off-by: evenyag <[email protected]>
Signed-off-by: evenyag <[email protected]>
@evenyag evenyag marked this pull request as ready for review September 19, 2025 06:14
@evenyag evenyag requested a review from fengjiachun September 19, 2025 06:19
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@evenyag evenyag marked this pull request as draft September 19, 2025 09:54
@evenyag evenyag marked this pull request as ready for review September 19, 2025 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants