Skip to content

[Feature Request][Delta-Spark] Add configurable recordDeltaEvent posting in SparkListener #6994

@sanskar-agrawal2018

Description

@sanskar-agrawal2018

Summary

This PR adds support for posting recordDeltaEvent from a SparkListener based on a configurable flag. The implementation enables Delta read file tracking and event emission without requiring any modifications to existing pipeline code.

Problem

The initial approach considered using the observe function to capture Delta read file information. However, this requires adding logic directly into pipeline implementations, making it intrusive and requiring changes to user workloads.

A listener-based solution is preferred to keep the functionality transparent and independent of pipeline code changes.

Proposed Solution

This PR introduces a SparkListener-based mechanism that captures Delta read file information when a query completes and posts a recordDeltaEvent based on configuration.

The listener will:

  • Monitor query execution completion.
  • Collect Delta read file information.
  • Post recordDeltaEvent when the feature is enabled through configuration.

This approach centralizes event generation and avoids modifications to existing application logic.

Configuration

The feature is controlled through a configuration flag.

When enabled:

  • SparkListener captures Delta read file information.
  • recordDeltaEvent is posted with the collected details.

When disabled:

  • No additional event processing occurs.
  • Existing behavior remains unchanged.

This ensures that users only incur the overhead of event collection when the functionality is explicitly required.

Benefits

  • No changes required in existing pipeline code.
  • Configurable and opt-in behavior.
  • Enables Delta read file tracking through SparkListener.
  • Supports observability and monitoring use cases through recordDeltaEvent.
  • Minimizes impact on existing workloads.

Expected Behavior

Configuration Enabled

  • SparkListener is triggered on query completion.
  • Delta read file information is collected.
  • recordDeltaEvent is posted with the relevant metadata.

Configuration Disabled

  • No Delta read file tracking is performed.
  • No recordDeltaEvent is posted.
  • Existing execution flow remains unchanged.

Notes

This implementation avoids using observe within pipeline logic and provides a non-intrusive, configuration-driven mechanism for posting recordDeltaEvent through SparkListener.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions