-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-18613: Add StreamsGroupHeartbeat handler in the group coordinator #19114
base: trunk
Are you sure you want to change the base?
Conversation
...oordinator/src/test/java/org/apache/kafka/coordinator/group/streams/StreamsGroupBuilder.java
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/streams/StreamsGroup.java
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Show resolved
Hide resolved
Basic streams group heartbeat handling. - No support for static membership - No support for configurations (using constants instead) - No support for regular expressions
b442946
to
fccd6e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, @lucasbru !
I did a first pass over the production code. I haven't looked at GroupMetadataTest
, yet.
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/streams/StreamsGroup.java
Outdated
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/streams/TasksTuple.java
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/streams/TasksTuple.java
Show resolved
Hide resolved
@cadonna Thanks for the comments! Ready for re-review |
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @lucasbru !
I reviewed half of the tests in GroupMetadataTest
.
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
* @throws InvalidRequestException if the request is not valid. | ||
* @throws UnsupportedAssignorException if the assignor is not supported. | ||
*/ | ||
private static void throwIfStreamsGroupHeartbeatRequestIsInvalid( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am missing the following check from the KIP:
"Each element of ActiveTasks, StandbyTasks and WarmupTasks has to be a valid task ID in the topology initialized for the group ID."
Clearly, you cannot put it in this method, but I could also not find it in the heartbeat handler.
Do you still plan to add this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I wouldn't add this now. It seems contradict the story about topology updating (e.g. removing a subtopology). We'd add a check now, and remove it again in the next version, no? Also this PR seems to big already ;). We could consider creating a little ticket, but I don't think we should add it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on the "not adding now part". I do not completely understand the "topology updating" part. We can still perform this check for the current topology epoch, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but we have to think this through. Even if we let clients crash only when they use the same topology epoch, that would mean that scaling down the number of input partitions for a stateless application would become impossible. I am wondering if it isn't a better course of action to just ignore tasks that we do not expect. After all, the client is just saying what he thinks he owns. If we'd just ignore those tasks, the next target assignment would not include the tasks, so the client would automatically be instructed to revoke whatever tasks it thinks it currently has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Outdated
Show resolved
Hide resolved
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Outdated
Show resolved
Hide resolved
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Outdated
Show resolved
Hide resolved
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Outdated
Show resolved
Hide resolved
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Outdated
Show resolved
Hide resolved
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cadonna Thanks for the comments! Ready for re-review.
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Show resolved
Hide resolved
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Outdated
Show resolved
Hide resolved
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Outdated
Show resolved
Hide resolved
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Outdated
Show resolved
Hide resolved
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Outdated
Show resolved
Hide resolved
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Outdated
Show resolved
Hide resolved
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
* @throws InvalidRequestException if the request is not valid. | ||
* @throws UnsupportedAssignorException if the assignor is not supported. | ||
*/ | ||
private static void throwIfStreamsGroupHeartbeatRequestIsInvalid( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I wouldn't add this now. It seems contradict the story about topology updating (e.g. removing a subtopology). We'd add a check now, and remove it again in the next version, no? Also this PR seems to big already ;). We could consider creating a little ticket, but I don't think we should add it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @lucasbru, I a few comments, but otherwise this lgtm
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java
Outdated
Show resolved
Hide resolved
...p-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupMetadataManagerTest.java
Outdated
Show resolved
Hide resolved
@bbejeck Thanks for the comments! Ready for re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, @lucasbru !
I reviewed also the second part of the tests.
Once you resolved to my comments, the PR is ready for merge.
.withTargetAssignment(memberId, TaskAssignmentTestUtil.mkTasksTuple(TaskRole.ACTIVE, | ||
TaskAssignmentTestUtil.mkTasks(subtopology1, 0, 1, 2, 3, 4, 5))) | ||
.withTargetAssignmentEpoch(10) | ||
.withTopology(StreamsTopology.fromHeartbeatRequest(topology)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test would be easier to understand if you added the following here:
.withPartitionMetadata(Map.of(
fooTopicName, new org.apache.kafka.coordinator.group.streams.TopicMetadata(fooTopicId, fooTopicName, 3),
barTopicName, new org.apache.kafka.coordinator.group.streams.TopicMetadata(barTopicId, barTopicName, 3)
))
or even add a variable changedPartitionCount = 3
and then:
.withPartitionMetadata(Map.of(
fooTopicName, new org.apache.kafka.coordinator.group.streams.TopicMetadata(fooTopicId, fooTopicName, changedPartitionCount),
barTopicName, new org.apache.kafka.coordinator.group.streams.TopicMetadata(barTopicId, barTopicName, 3)
))
It took me quite some time to understand in the partition metadata was from no metadata to some metadata.
} | ||
|
||
@Test | ||
public void testStreamsLeavingMemberBumpsGroupEpoch() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better name would be testStreamsLeavingMemberRemovesMemberAndBumpsGroupEpoch
TaskAssignmentTestUtil.mkTasks(subtopology1, 3, 4, 5), | ||
TaskAssignmentTestUtil.mkTasks(subtopology2, 2))) | ||
.withTargetAssignmentEpoch(10)) | ||
.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, you should also add the partition metadata to the group. The test relies on the order of checking first the member changes and then partition metadata changes. In the case the production code changes that order this test would fail in this specific location although it should not. That is better than the other way around, i.e., the test does not fail although it should. Nevertheless, making the test more robust to production code changes saves some debugging pain.
|
||
// Member 1 joins the streams group. The request fails because the | ||
// target assignment computation failed. | ||
assertThrows(UnknownServerException.class, () -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also verify the exception message?
context.rollback(); | ||
|
||
// However, the next heartbeat should detect the divergence based on the epoch and trigger | ||
// a metadata refr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// a metadata refr | |
// a metadata refresh. |
); | ||
|
||
// Advance time past the revocation timeout. | ||
List<ExpiredTimeout<Void, CoordinatorRecord>> timeouts = context.sleep(10000 + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use a variable revocationTimeout
instead of 10000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @lucasbru LGTM modulo addressing Bruno's oustanding comments.
Basic streams group heartbeat handling. The main part of are the unit tests that make sure that we behave, for the most part, like a consumer group.