Skip to content

[BUG][Spark] UniForm Iceberg incremental conversion diagnostics report the latest snapshot version instead of the offending commit version #7029

@AnudeepKonaboina

Description

@AnudeepKonaboina

[BUG][Spark] UniForm Iceberg incremental conversion diagnostics report the latest snapshot version instead of the offending commit version

Bug

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Describe the problem

UniForm performs incremental Iceberg conversion by translating Delta commits one
at a time. IcebergConverter.runIcebergConversionForActions processes a single
commit, identified by its deltaVersion parameter.

However, the diagnostics emitted from this path report targetSnapshot.version
— the latest (head) snapshot being synced — rather than deltaVersion, the
commit actually being processed. This affects:

  • the delta.iceberg.conversion.unsupportedActions event and its accompanying
    logError
  • the delta.iceberg.conversion.convertActions success event

As a result, when conversion fails on a specific commit, the error and telemetry
attribute the failure to the table head (frequently an unrelated operation such
as a MERGE) rather than to the commit that actually failed. The reported
version does not correspond to the commit being converted, which makes
conversion failures misleading and difficult to triage.

Steps to reproduce

  1. Create a Delta table with UniForm Iceberg enabled
    (delta.universalFormat.enabledFormats = 'iceberg') so incremental Iceberg
    conversion runs on each commit.
  2. Produce a sequence of commits such that an earlier commit (e.g. a stats/
    metadata commit) is converted while the table head is a later, unrelated
    commit (e.g. a MERGE).
  3. Trigger a conversion failure on the earlier commit (an unsupported
    combination of actions), or simply inspect the convertActions success
    event for a converted commit.
  4. Inspect the driver ERROR log line / the emitted Delta event.

Observed results

The diagnostics report the head snapshot version, not the commit being
processed.
For example:

26/06/13 02:00:10 ERROR IcebergConverterEdge: Unsupported combination of actions for incremental conversion. Context:
version -> 62111,
commitInfo -> COMPUTE STATS,
hasAdd -> true,
hasRemove -> false,
dataChange -> Some, 
hasDv -> true

Further details

The relevant code is in
iceberg/src/main/scala/org/apache/spark/sql/delta/icebergShaded/IcebergConverter.scala,
in runIcebergConversionForActions. Both the unsupportedActions /
convertActions recordDeltaEvent calls and the logError string interpolate
targetSnapshot.version where they should report the per-commit deltaVersion.

This is a diagnostics/logging-only issue; conversion behavior itself is correct
and unchanged. The fix is to report both the head version and the per-commit
version, and to rename the fields so the two are unambiguous.

Environment information

  • Delta Lake version: master (also affects released versions where incremental
    UniForm Iceberg conversion is present)
  • Spark version: 3.5.x / 4.x
  • Scala version: 2.13

Willingness to contribute

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions