Flesh out schema evolution support for variant

### Task Description

**What needs to be done:**

https://github.com/apache/hudi/pull/17833/changes#r2874744476

https://github.com/apache/hudi/pull/17833#discussion_r2957128901

**Why this task is needed:**

Comment from @vinothchandar 
What kind of evolution are we able to support with Variant columns? Can we ensure there are tests around. 

- Adding a Variant column to an existing table
- Removing a Variant column
- `maxColumnId` calculation when Variant fields (with negative IDs) coexist with regular fields
- Round-trip: `HoodieSchema → InternalSchema → HoodieSchema` for Variant types

### Additional consideration: variant projection vs schema-evolution ordering (from PR #18923)

PR #18923 moved Spark 4.1 PushVariantIntoScan projection out of the engine-neutral merge buffer into the log readers (parquet reader native projection, avro `deserializeRecords`). This flips the order of schema-evolution projection and variant projection in the MOR read path:

- Before: evolve then project. `composeEvolvedSchemaTransformer` ran against full VariantType rows (consistent with `dataBlock.getSchema()`), then variant projection rewrote them.
- After: project then evolve. Rows reach the buffer already projected, but `composeEvolvedSchemaTransformer` still builds its projector with `from = dataBlock.getSchema()` (variant typed as VariantType). When `internalSchema` is non-empty, the evolve step would mis-decode the projected struct bytes as VariantVal.

This only triggers with the (schema-on-read active + variant column + PushVariantIntoScan + MOR log blocks) combination, which is the same unsupported matrix this task covers. When designing variant schema evolution, make `composeEvolvedSchemaTransformer` aware that variant fields may already be projected (build the evolve step against the projected schema) or otherwise reconcile the ordering. Refs: `FileGroupRecordBuffer.composeEvolvedSchemaTransformer`, `InternalSchemaConverter` line 510.

Exact trigger: both `internalSchema` is non-empty AND `shouldProjectVariants` is true (Spark only). If either is false there is no mismatch. With an empty `internalSchema` the evolve step is identity (`composeEvolvedSchemaTransformer` returns empty), so rows pass through untouched. With `shouldProjectVariants` false the rows are never projected upstream (both the avro `projectLogBlockRecords` and the parquet `getUnsafeRowIterator` overload are gated on it), so the evolve step sees consistent full VariantType rows.

Guard placement caveat: a fail-fast guard is awkward to add today because `internalSchema` lives in the engine-neutral buffer while `shouldProjectVariants` lives in the Spark reader context. The buffer cannot see the variant gate directly, so a clean guard would need a small reader-context hook (e.g. `readerContext.hasPendingVariantProjection()`) checked alongside the non-empty `internalSchema`.

### Task Type

Code improvement/refactoring

### Related Issues

**Parent feature issue:** (if applicable )
**Related issues:**
NOTE: Use `Relationships` button to add parent/blocking issues after issue is created.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flesh out schema evolution support for variant #18285

Task Description

Additional consideration: variant projection vs schema-evolution ordering (from PR #18923)

Task Type

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Flesh out schema evolution support for variant #18285

Description

Task Description

Additional consideration: variant projection vs schema-evolution ordering (from PR #18923)

Task Type

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions