Summary
This PR adds support for posting recordDeltaEvent from a SparkListener based on a configurable flag. The implementation enables Delta read file tracking and event emission without requiring any modifications to existing pipeline code.
Problem
The initial approach considered using the observe function to capture Delta read file information. However, this requires adding logic directly into pipeline implementations, making it intrusive and requiring changes to user workloads.
A listener-based solution is preferred to keep the functionality transparent and independent of pipeline code changes.
Proposed Solution
This PR introduces a SparkListener-based mechanism that captures Delta read file information when a query completes and posts a recordDeltaEvent based on configuration.
The listener will:
- Monitor query execution completion.
- Collect Delta read file information.
- Post
recordDeltaEvent when the feature is enabled through configuration.
This approach centralizes event generation and avoids modifications to existing application logic.
Configuration
The feature is controlled through a configuration flag.
When enabled:
- SparkListener captures Delta read file information.
recordDeltaEvent is posted with the collected details.
When disabled:
- No additional event processing occurs.
- Existing behavior remains unchanged.
This ensures that users only incur the overhead of event collection when the functionality is explicitly required.
Benefits
- No changes required in existing pipeline code.
- Configurable and opt-in behavior.
- Enables Delta read file tracking through SparkListener.
- Supports observability and monitoring use cases through
recordDeltaEvent.
- Minimizes impact on existing workloads.
Expected Behavior
Configuration Enabled
- SparkListener is triggered on query completion.
- Delta read file information is collected.
recordDeltaEvent is posted with the relevant metadata.
Configuration Disabled
- No Delta read file tracking is performed.
- No
recordDeltaEvent is posted.
- Existing execution flow remains unchanged.
Notes
This implementation avoids using observe within pipeline logic and provides a non-intrusive, configuration-driven mechanism for posting recordDeltaEvent through SparkListener.
Summary
This PR adds support for posting
recordDeltaEventfrom a SparkListener based on a configurable flag. The implementation enables Delta read file tracking and event emission without requiring any modifications to existing pipeline code.Problem
The initial approach considered using the
observefunction to capture Delta read file information. However, this requires adding logic directly into pipeline implementations, making it intrusive and requiring changes to user workloads.A listener-based solution is preferred to keep the functionality transparent and independent of pipeline code changes.
Proposed Solution
This PR introduces a SparkListener-based mechanism that captures Delta read file information when a query completes and posts a
recordDeltaEventbased on configuration.The listener will:
recordDeltaEventwhen the feature is enabled through configuration.This approach centralizes event generation and avoids modifications to existing application logic.
Configuration
The feature is controlled through a configuration flag.
When enabled:
recordDeltaEventis posted with the collected details.When disabled:
This ensures that users only incur the overhead of event collection when the functionality is explicitly required.
Benefits
recordDeltaEvent.Expected Behavior
Configuration Enabled
recordDeltaEventis posted with the relevant metadata.Configuration Disabled
recordDeltaEventis posted.Notes
This implementation avoids using
observewithin pipeline logic and provides a non-intrusive, configuration-driven mechanism for postingrecordDeltaEventthrough SparkListener.