-
Notifications
You must be signed in to change notification settings - Fork 21
Separate shuffler's communication into new interface #437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
@madsbk @wence- @nirandaperera I think I have addressed everything, please let me know if I missed something. |
cpp/include/rapidsmpf/communicator/metadata_payload_exchange.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, @pentschev. Besides a small suggestion, the only remaining comment I have is about organization.
I think we should move this to a subdir, something like:
rapidsmpf/communicator/utils/metadata_payload_exchange/core.hpp
rapidsmpf/communicator/utils/metadata_payload_exchange/tag.hpp
rapidsmpf/communicator/utils/metadata_payload_exchange/active_messaging.hpp
Done in 2eeb521, I left it out of |
|
Anything else @wence- @nirandaperera ? |
| std::vector<std::unique_ptr<MetadataPayloadExchange::Message>> received_messages; | ||
| for (int iter = 0; iter < 10 && received_messages.empty(); ++iter) { | ||
| auto messages = comm_interface->recv(); | ||
| received_messages.insert( | ||
| received_messages.end(), | ||
| std::make_move_iterator(messages.begin()), | ||
| std::make_move_iterator(messages.end()) | ||
| ); | ||
|
|
||
| if (!received_messages.empty()) | ||
| break; | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(10)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test could fail if everything is very slow, right? We're not actually waiting for completion.
Also, nit, use std::ranges::move(messages, std::back_inserter(received_messages))
| std::vector<std::unique_ptr<MetadataPayloadExchange::Message>> received_messages; | ||
| for (int iter = 0; iter < 50 && received_messages.empty(); ++iter) { | ||
| auto messages = comm_interface->recv(); | ||
| received_messages.insert( | ||
| received_messages.end(), | ||
| std::make_move_iterator(messages.begin()), | ||
| std::make_move_iterator(messages.end()) | ||
| ); | ||
|
|
||
| if (!received_messages.empty()) | ||
| break; | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(10)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again here this loop makes me worried, since it could randomly fail if the other side is very slow.
| void wait_for_communication_complete() { | ||
| for (int iter = 0; iter < 100 && !comm_interface->is_idle(); ++iter) { | ||
| comm_interface->recv(); | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(10)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can spuriously exit without communication being completed.
This suggests that we need a way to actually progress until everything is done, but we currently don't have that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not possible in general because we can't know on the receive side that someone has sent us a message until we get it...
| if (comm->rank() == peer_rank) { | ||
| EXPECT_EQ(received_messages.size(), 1); | ||
| auto& msg = received_messages[0]; | ||
| EXPECT_EQ(msg->peer_rank(), 0); | ||
| EXPECT_EQ(msg->metadata(), test_metadata); | ||
| EXPECT_EQ(msg->data(), nullptr); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This never fires, right? Because peer_rank is defined as (comm->rank() + 1) % nranks so it is never true that comm->rank() == peer_rank.
| std::vector<std::unique_ptr<MetadataPayloadExchange::Message>> received_messages; | ||
| for (int iter = 0; iter < 10 && received_messages.empty(); ++iter) { | ||
| auto messages = comm_interface->recv(); | ||
| received_messages.insert( | ||
| received_messages.end(), | ||
| std::make_move_iterator(messages.begin()), | ||
| std::make_move_iterator(messages.end()) | ||
| ); | ||
|
|
||
| if (!received_messages.empty()) | ||
| break; | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(10)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test, rank 0 sends to rank 1, and everyone else "does nothing".
But the test is not really following that flow.
I think we want something like:
if (comm->nranks() != 2) {
GTEST_SKIP() << "Only for two ranks";
}
if (comm->rank() == 0) {
send(message);
} else {
auto received = ...;
while (received.empty()) {
std::ranges::move(comm_interface->recv(), std::back_inserter(received));
std::this_thread::yield();
}
EXPECT_EQ(received.size(), 1);
EXPECT_EQ(msg->peer_rank(), 0);
...
}
But now I am worried about rank-0 finishing the test with stuff still in flight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, wait, rank-0 must also call recv repeatedly because that's the only thing that actually progresses the underlying tag communication.
This is very counter intuitive.
| completed_messages.insert( | ||
| completed_messages.end(), | ||
| std::make_move_iterator(completed_data.begin()), | ||
| std::make_move_iterator(completed_data.end()) | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::ranges::move(completed_data, std::back_inserter(completed_messages));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, reading through the usage in the tests gives me some pause for thought.
It seems like the sender must call recv repeatedly to progress sends (even if they're never receiving anything). Do I have this right?
|
@wence- I have pushed multiple changes that do what you requested in your latest comments and what we've discussed offline. Here's a summary of the changes:
I hope this now addresses all the concerns, and more importantly does it all correctly. One more thing: I just want to say I will start pushing against changes in the Tag communication flow in this PR, my initial idea was to start simple with porting over the |
Move communication operations from the
Shuffler::ProgressThreadinto a newCommunicationInterfaceabstract class to handle all communication, with the use of a classMessage. This approach has the benefit of making a more distinct and easier to understand separation of the communication routine from the shuffler making the progress operator considerably smaller (will be done in follow-up PR), simultaneously allowing extensions to the communication interface disjoint from the shuffler, for example, to allow implementing an Active Message-based communicator for the shuffler. It also generalizes the concept of a "message", instead of transferring chunks all theCommunicationInterfaceknows about are messages represented by theMessageclass that combines metadata, payload and src/dst rank.Currently implement only a
TagCommunicationInterfacethat implements exactly the behavior that is implemented in the shuffler progress, additionally removing the need for the ack message. Extending to support Active Messages will also be handled in a separate PR.Removes also ack messages, closes #475, closes #535 .