-
Notifications
You must be signed in to change notification settings - Fork 77
Select ibv device who has active port_state. #456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -17,8 +17,32 @@ namespace ibv { | |||||||||||
|
||||||||||||
Reactor::Reactor(IbvLib ibvLib, IbvDeviceList deviceList) | ||||||||||||
: ibvLib_(std::move(ibvLib)) { | ||||||||||||
bool found = false; | ||||||||||||
TP_DCHECK_GE(deviceList.size(), 1); | ||||||||||||
ctx_ = createIbvContext(getIbvLib(), deviceList[0]); | ||||||||||||
|
||||||||||||
// If the deviceList contains multiple ibv devices, we will select the | ||||||||||||
// device of the port whose port_state is active, instead of just selecting | ||||||||||||
// the first device in the deviceList by default. | ||||||||||||
for (int i = 0; i < deviceList.size(); i++) { | ||||||||||||
IbvContext tp_ctx_; | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Our naming convention is |
||||||||||||
IbvLib::port_attr portAttr; | ||||||||||||
std::memset(&portAttr, 0, sizeof(portAttr)); | ||||||||||||
tp_ctx_ = createIbvContext(getIbvLib(), deviceList[i]); | ||||||||||||
TP_CHECK_IBV_INT(ibvLib.query_port(tp_ctx_.get(), kPortNum, &portAttr)); | ||||||||||||
if (portAttr.state == IbvLib::port_state::PORT_ACTIVE) { | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||||||||
ctx_ = std::move(tp_ctx_); | ||||||||||||
found = true; | ||||||||||||
break; | ||||||||||||
} else { | ||||||||||||
TP_VLOG(8) << "IbvDevice " << deviceList[i].name << " port " | ||||||||||||
<< unsigned(kPortNum) << " state is " | ||||||||||||
<< ibvLib.port_state_str(portAttr.state) | ||||||||||||
<< " , so skip this device"; | ||||||||||||
} | ||||||||||||
} | ||||||||||||
|
||||||||||||
TP_THROW_ASSERT_IF(found == false) << "Unable to find available ibv device"; | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we can't find any usable devices we shouldn't consider it an error (and crash the program), instead we should just disable the ibv transport. The logic to do so happens here: tensorpipe/tensorpipe/transport/ibv/context_impl.cc Lines 58 to 62 in bb1473a
Could you move your code to that file? You will probably need to change the constructor of the Reactor class so that it takes a IbvContext object, instead of an IbvDeviceList. |
||||||||||||
|
||||||||||||
pd_ = createIbvProtectionDomain(getIbvLib(), ctx_); | ||||||||||||
cq_ = createIbvCompletionQueue( | ||||||||||||
getIbvLib(), | ||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could you keep this list sorted alphabetically?