Skip to content

RATIS-2393 Add Span Context to RaftRpcRequestProto#1341

Open
taklwu wants to merge 9 commits intoapache:masterfrom
taklwu:RATIS-2393
Open

RATIS-2393 Add Span Context to RaftRpcRequestProto#1341
taklwu wants to merge 9 commits intoapache:masterfrom
taklwu:RATIS-2393

Conversation

@taklwu
Copy link

@taklwu taklwu commented Feb 4, 2026

What changes were proposed in this pull request?

We're adding a new map to RaftRpcRequestProto that will be used for upcoming Opentelemetry integration.

see the usage of PoC https://github.com/taklwu/ratis/blob/opentelemetry0129/ratis-common/src/main/java/org/apache/ratis/trace/TraceUtils.java#L235-L251

Another official reference for how this map is going to be inject and extract from the http / rpc header

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-2393

How was this patch tested?

running mvn clean package -DskipTests

@taklwu
Copy link
Author

taklwu commented Feb 4, 2026

@szetszwo Hi Nicholas, should I directly open PRs to master branch ?

in addition , the test failure doesn't seem to be related to my changes

[INFO] Running org.apache.ratis.netty.TestLeaderElectionWithNetty
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
Error:  Tests run: 25, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 74.14 s <<< FAILURE! -- in org.apache.ratis.netty.TestLeaderElectionWithNetty
Error:  org.apache.ratis.netty.TestLeaderElectionWithNetty.testChangeListenerToFollower -- Time elapsed: 1.509 s <<< FAILURE!
org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
	at org.apache.ratis.server.impl.LeaderElectionTests.testChangeListenerToFollower(LeaderElectionTests.java:562)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)


Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taklwu , thanks for working on this! Please see the comment inlined.

BTW, this change is very small. Let's combine the usage of the new SpanContextProto such as RATIS-2395 or some part of it. Otherwise, it is hard to determine whether this change is good or not.

@OneSizeFitsQuorum
Copy link
Contributor

@taklwu Please add license for the failed ci!

@taklwu
Copy link
Author

taklwu commented Mar 3, 2026

@OneSizeFitsQuorum I have updated the license and good catch.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taklwu , thanks for the update!

Please see the comments inlined and also https://issues.apache.org/jira/secure/attachment/13081062/1341_review.patch

taklwu added 7 commits March 5, 2026 08:38
 - First commit of using OpenTelemetry
 - use junit5 and default opentelemetry extension in test case.
 - Add test to mock client level span and inject when request is being sent
 - only include the server extract
1. use traceAsyncMethod
2. add ClientInvocationId
3. and other places
@taklwu
Copy link
Author

taklwu commented Mar 5, 2026

sorry that I rebased on top of master and messed up the review history, but the last commit is the only change for addressing the recent comments by @szetszwo .

@taklwu

This comment was marked as off-topic.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taklwu , thanks for the update! Please see the comments inlined.

/**
* Get the cause of the {@link Throwable} if it is a {@link CompletionException}.
*/
public static Throwable unwrapCompletionException(Throwable error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a similar method

Let's reuse it and move the other methods to JavaUtils.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about this comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reusing existing method is good, sorry that I didn't know unwrapCompletionException was there.

/**
* Construct {@link Span} instances originating from the client request.
*/
public class ClientRequestSpanBuilder implements Supplier<Span> {
Copy link
Contributor

@szetszwo szetszwo Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

We should use lambda and not to implement functional interface in general. See also https://www.oracle.com/technical-resources/articles/java/lambda.html

Copy link
Author

@taklwu taklwu Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give a try, it may take a bit.

but the idea was from the hbase implementation that few span builders was created.

*/
package org.apache.ratis.server.impl;

import io.opentelemetry.api.trace.Span;
Copy link
Contributor

@szetszwo szetszwo Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that I may not be clear in my last comment -- we should import io.opentelemetry only in ratis-common. Then, it will be easier to change the tracing dependency to be pluggable in the future.

We often see that a library may introduce incompatible changes (such as Dropwizard Metrics v3 vs v4). Since Ratis itself is also a library, we don't want to force our consumer applications to use a particular dependency (e.g. Ratis supports both Dropwizard Metrics v3 and v4 by making it pluggable). Another reason is that a dependency may have CVEs (e.g. the infamous Log4Shell case). Applications may choose to disable it.

BTW, we should add a conf to enable/disable tracing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants