Skip to content

feat(client): enrich write commit callback message and fire it for table-service commits#18988

Open
codope wants to merge 2 commits into
apache:masterfrom
codope:enrich-write-commit-callback
Open

feat(client): enrich write commit callback message and fire it for table-service commits#18988
codope wants to merge 2 commits into
apache:masterfrom
codope:enrich-write-commit-callback

Conversation

@codope

@codope codope commented Jun 12, 2026

Copy link
Copy Markdown
Member

Describe the issue this Pull Request addresses

Today the write commit callback (HoodieWriteCommitCallback) has two limitations that make it awkward for consumers that want to react to what a commit actually changed on storage:

  1. The callback message doesn't say which files a commit replaced. HoodieWriteCommitCallbackMessage carries the write stats, but a consumer that wants to correlate each newly written base file with the existing base file (and bootstrap source) it superseded has to rebuild a FileSystemView itself duplicating I/O the write client already paid for.
  2. The callback only fires for data commits. Compaction and clustering completions never invoke the callback, so consumers get no signal for table-service commits.

This PR addresses both, backward-compatibly (no breaking change to the public API).

Summary and Changelog

What users gain: callback implementations now receive, per updated file group, the previous base file path (and bootstrap source path, if any) pre-resolved by the write client without rebuilding a file-system view. The callback also fires on compaction and clustering completions, not just data commits.

Changelog (all changes under hudi-client/hudi-client-common/):

  • HoodieWriteCommitCallbackMessage: added two optional fields:
    • prevFilePaths: Map<fileId, PrevFilePaths>, where PrevFilePaths holds prevBaseFilePath and bootstrapBaseFilePath.
    • extraContext: Map<String,String> for producer-attached context.

Both default to empty maps (never null). The existing ctors are preserved; a new all-args ctor is generated via the existing lombok @AllArgsConstructor.

  • BaseHoodieClient: lifted the commitCallback field up from BaseHoodieWriteClient, and added two shared methods:
    • fireCommitCallback(commitTime, commitActionType, stats, BaseFileOnlyView, extraMetadata) which lazily constructs the callback from hoodie.write.commit.callback.class and invokes it.
    • resolvePrevFilePaths(stats, BaseFileOnlyView) for each update stat, looks up the previous base file via the cached view (getBaseFileOn), capturing path + bootstrap path.
  • BaseHoodieWriteClient: removed the inline callback block from commitStats; the callback now fires from postCommit. postCommit takes the resolved commitActionType so the message reports the actual action (e.g. replacecommit for insert_overwrite) rather than the table's base action type.
  • BaseHoodieTableServiceClient: fires the callback after successful compaction (commit action) and clustering (replacecommit action).
  • TestBaseHoodieClient (new): Unit tests covering resolvePrevFilePaths (inserts skipped, update resolution, bootstrap capture, missing-file skip, best-effort on view failure, null inputs) and the message default/retention contract.

No code was copied from third-party sources.

Impact

  • Public API: HoodieWriteCommitCallbackMessage is @PublicAPIClass(EVOLVING). The change is additive and backward compatible i.e. existing ctors and getters are unchanged, and new fields default to empty maps. Existing callback implementations compile and run unchanged.
  • Behavior: the callback now also fires for (a) the executor auto-commit path and (b) compaction/clustering completions, which previously did not fire. Consumers that assumed the callback fired only for explicit data commits will now see additional invocations.
  • Performance: prev-file resolution reuses the already cached fs view. No additional I/O beyond what the writer already performed. Resolution and callback invocation are best-effort and never fail the write.

Risk Level

low

Changes are confined to one module (hudi-client-common) and the callback path. All callback/resolution failures are caught and logged, so a misbehaving callback or a stale/remote view cannot fail a commit. Verified with the apache/hudi default build profile. Also:

  • Added a UT: TestBaseHoodieClient
  • Repo-wide check confirms all existing HoodieWriteCommitCallbackMessage ctor/getter call sites remain compatible.

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions Bot added the size:L PR with lines of changes in (300, 1000] label Jun 12, 2026
…ble-service commits

Two backward-compatible improvements to the post-commit write callback mechanism:

1. Enrich HoodieWriteCommitCallbackMessage with two optional fields so callback
   implementations no longer have to rebuild a FileSystemView or reach into engine
   config:
   - prevFilePaths: Map<fileId, PrevFilePaths> -- the previous base file (and
     bootstrap source, if any) each updated file group replaces, pre-resolved by the
     write client from its cached file-system view.
   - extraContext: Map<String,String> -- free-form context producers can attach.
   Both default to empty maps; the existing 4-arg and 6-arg constructors are preserved.

2. Fire the callback for table-service commits too (compaction and clustering
   completion), not just data commits. The shared firing logic (fireCommitCallback)
   and prev-file resolution (resolvePrevFilePaths) are lifted into BaseHoodieClient so
   both BaseHoodieWriteClient (data commits, via postCommit) and
   BaseHoodieTableServiceClient (compaction/clustering completion) reuse them. The
   commitCallback field is lifted up from BaseHoodieWriteClient.

postCommit now receives the resolved commit action type so the callback reports the
actual action (e.g. replacecommit for insert_overwrite) rather than the table's base
action type.

Best-effort by design: callback and prev-file resolution failures are logged and never
fail the write.

Adds TestBaseHoodieClient covering resolvePrevFilePaths (inserts, updates, bootstrap
capture, missing-file skip, best-effort on view failure, null inputs) and the message
default/retention contract.
@codope codope force-pushed the enrich-write-commit-callback branch from f2f674d to 127d370 Compare June 13, 2026 06:21
* {@code FileSystemView}.
*/
protected static Map<String, PrevFilePaths> resolvePrevFilePaths(List<HoodieWriteStat> stats,
BaseFileOnlyView fsView) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query the fsview is costly, can we make it a supplier so that it is only called when necessary.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to a Supplier so the view is only resolved after the callback-enabled check, keeping the default (callback-off) path free. Even before, on the enabled path, it reuses the same cached view the writer already populated, so no extra I/O.

}
commitCallback.call(new HoodieWriteCommitCallbackMessage(
commitTime, config.getTableName(), config.getBasePath(),
stats, Option.of(commitActionType), extraMetadata, resolvePrevFilePaths(stats, fsView),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure the usage of the prev file paths, is it for debugging?

@codope codope Jun 13, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was needed for an internal feature.. it lets callback consumers (e.g. testing frameworks, lineage/audit sinks) see which base file each update replaced without rebuilding a FileSystemView themselves.
cc @prashantwason

* <p>Best-effort: catches and logs any exception from the user-supplied callback so a
* misbehaving observer cannot fail the commit.
*/
protected void fireCommitCallback(String commitTime,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mayFireCommitCallback or fireCommitCallbackIfNecessary

Signed-off-by: codope <sagarsumit09@gmail.com>
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 72.72727% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.25%. Comparing base (97c03f7) to head (22eeafb).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
.../java/org/apache/hudi/client/BaseHoodieClient.java 65.71% 11 Missing and 1 partial ⚠️
...lback/common/HoodieWriteCommitCallbackMessage.java 70.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #18988   +/-   ##
=========================================
  Coverage     68.25%   68.25%           
- Complexity    29509    29520   +11     
=========================================
  Files          2544     2544           
  Lines        142744   142800   +56     
  Branches      17816    17820    +4     
=========================================
+ Hits          97433    97473   +40     
- Misses        37304    37320   +16     
  Partials       8007     8007           
Flag Coverage Δ
common-and-other-modules 44.79% <65.45%> (+0.01%) ⬆️
hadoop-mr-java-client 44.67% <16.36%> (-0.03%) ⬇️
spark-client-hadoop-common 48.09% <16.36%> (-0.03%) ⬇️
spark-java-tests 48.75% <16.36%> (-0.02%) ⬇️
spark-scala-tests 44.76% <20.00%> (-0.02%) ⬇️
utilities 37.22% <16.36%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ache/hudi/client/BaseHoodieTableServiceClient.java 75.26% <100.00%> (+0.14%) ⬆️
.../org/apache/hudi/client/BaseHoodieWriteClient.java 78.53% <100.00%> (+0.83%) ⬆️
...lback/common/HoodieWriteCommitCallbackMessage.java 70.00% <70.00%> (-30.00%) ⬇️
.../java/org/apache/hudi/client/BaseHoodieClient.java 86.75% <65.71%> (-4.01%) ⬇️

... and 16 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

);
}
log.info("Compacted successfully on commit {}", compactionCommitTime);
fireCommitCallbackIfNecessary(compactionCommitTime, HoodieTimeline.COMMIT_ACTION,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completeLogCompaction (LOG_COMPACTION_ACTION) is also a table-service completion with write stats and a timeline commit, but it does not call fireCommitCallbackIfNecessary while compaction and clustering do. Intentional scope, or should log compaction fire the callback too? If intentional, a short note here on why it is excluded would keep the set explicit.

* group from a cached {@link BaseFileOnlyView}, so callback implementations receive the
* read/write file pairing without rebuilding a file-system view.
*/
public class TestBaseHoodieClient {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test asserts the PR's headline behavior - that the callback fires on compaction and clustering completion. resolvePrevFilePaths and the message contract are covered, but completeCompaction/completeClustering firing is not (codecov flags fireCommitCallbackIfNecessary as uncovered). Consider a test that registers a recording callback and asserts it fires once per table-service commit with the expected action type (commit / replacecommit).

@hudi-agent hudi-agent left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR enriches the write commit callback message with previous-file paths and extends callback firing to compaction and clustering completions. A few correctness concerns worth checking in the inline comments — particularly around the commitActionType passed from postWrite (which uses the table's base action type rather than the operation's actual action), the action type for clustering on newer table versions, and a subtle timing change where the callback now fires before mayBeCleanAndArchive/runTableServicesInline instead of after. Please take a look at the inline comments, and this should be ready for a Hudi committer or PMC member to take it from here. A few naming/comment and minor consistency nits below; the main one is a misleading Javadoc phrase on extraContext that implies it's optional when it isn't.

try {
postCommit(hoodieTable, result.getCommitMetadata().get(), instantTime, Option.empty());
postCommit(hoodieTable, result.getCommitMetadata().get(), instantTime,
hoodieTable.getMetaClient().getCommitActionType(), Option.empty());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Could this report the wrong action type for auto-committed replacecommit operations? hoodieTable.getMetaClient().getCommitActionType() returns commit/deltacommit (operation-agnostic), but SparkPartitionTTLActionExecutor goes through SparkAutoCommitExecutor and commits with REPLACE_COMMIT_ACTION — the callback would advertise commit/deltacommit for that path. Using CommitUtils.getCommitActionType(getOperationType(), table.getMetaClient().getTableType()) would match the actual committed action.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

heartbeatClient.stop(clusteringCommitTime);
}
log.info("Clustering successfully on commit {} for table {}", clusteringCommitTime, table.getConfig().getBasePath());
fireCommitCallbackIfNecessary(clusteringCommitTime, HoodieTimeline.REPLACE_COMMIT_ACTION,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 For table-version 8+ clustering uses HoodieTimeline.CLUSTERING_ACTION as the completed instant action (not REPLACE_COMMIT_ACTION) — the metrics line just above already uses CLUSTERING_ACTION. Could you read the action from clusteringInstant.getAction() so consumers see the same action that's actually on the timeline?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

HoodieTimer postCommitTimer = HoodieTimer.start();
try {
postCommit(table, metadata, instantTime, extraMetadata);
postCommit(table, metadata, instantTime, commitActionType, extraMetadata);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Is the new ordering intentional? Pre-PR the callback fired after mayBeCleanAndArchive and runTableServicesInline (and was skipped if they threw with canIgnorePostCommitFailures=false). Now it fires inside postCommit, before them — so a consumer can receive a successful-commit callback even when post-commit cleanup throws and commitStats ultimately propagates the exception. The PR description mentions extra callbacks but doesn't call out this ordering shift.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

);
}
log.info("Compacted successfully on commit {}", compactionCommitTime);
fireCommitCallbackIfNecessary(compactionCommitTime, HoodieTimeline.COMMIT_ACTION,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Worth noting: HoodieFlinkTableServiceClient overrides both completeCompaction and completeClustering and doesn't delegate to super or call fireCommitCallbackIfNecessary, so Flink users won't see the new table-service callback. Should the Flink overrides also fire the callback, or is the PR intentionally scoped to Spark/Java?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

/**
* Free-form context that producers can attach for downstream callback consumers.
* The OSS write client populates this as empty; specialized callsites or wrappers
* may populate it with whatever context their callbacks need. Mirrors the

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: the phrase "Mirrors the optional shape of extraMetadata" is misleading — extraContext is Map<String, String>, not Option<Map<String, String>>, so the shapes are different. Could you drop or rephrase that sentence?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

*/
public static class PrevFilePaths implements Serializable {
private static final long serialVersionUID = 1L;
public final String prevBaseFilePath;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: the outer class uses Lombok @Getter for field access, but PrevFilePaths exposes bare public final fields — callers end up using two different access patterns on the same message object. Would it be worth adding @Getter here (and making the fields private) for consistency?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

continue;
}
String prevPath = prev.get().getPath();
String bootstrapPath = prev.get().getBootstrapBaseFile().isPresent()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: prev.get() and getBootstrapBaseFile() are each called twice — could you cache them in locals, e.g. HoodieBaseFile prevFile = prev.get() and Option<BaseFile> bootstrapBase = prevFile.getBootstrapBaseFile()?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants