You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently noticed that when rdma_close is called on Windows' listener session, it will attempt to wait until rdma_close in Linux's connector's session is called.
This scenario was tested using Labview but I believe it should be reproducible with C API as well.
Window's Labview VI:
RDMA Listen->RDMA Accept->RDMA Configure Buffers->RDMA Acquire Received Buffer->Delete DVR->RDMA Close
Linux's Labview VI (NI Linux RT is used in this example):
RDMA Connect->RDMA Configure Buffers->RDMA Acquire Send Buffer->RDMA Set Used Send Buffer Size->Delete DVR->Time delay of 10s->RDMA Close
The expectation is that once DVR is deleted, Windows will proceed to close the RDMA connection. But our observation today is that Windows will hang at RDMA Close until Linux's RDMA Close is called (after the 10s delay).
The same scenario doesn't reproduce with 2 Windows systems, 2 Linux systems nor with Windows system connecting to Linux system (listening).
After a bunch of debugging, we noticed that the Windows is stuck at RdmaConnectedSession::Destroy during RDMA Close. Specifically, it's waiting for EventHandlerThread to exit when connector->Disconnect is called and for some reason, the thread only exited when RDMA Close is called on Linux. I'm not sure what's going on here nor fully understand the purpose of EventHandlerThread.
I have tried to workaround this by time-bounding connector-Disconnect with HandleHROverlappedWithTimeout with a timeout of 1000ms and it seems to work but I'm not sure if by doing so will break any of the existing logic.
The text was updated successfully, but these errors were encountered:
We recently noticed that when rdma_close is called on Windows' listener session, it will attempt to wait until rdma_close in Linux's connector's session is called.
This scenario was tested using Labview but I believe it should be reproducible with C API as well.
Window's Labview VI:
Linux's Labview VI (NI Linux RT is used in this example):
The expectation is that once DVR is deleted, Windows will proceed to close the RDMA connection. But our observation today is that Windows will hang at RDMA Close until Linux's RDMA Close is called (after the 10s delay).
The same scenario doesn't reproduce with 2 Windows systems, 2 Linux systems nor with Windows system connecting to Linux system (listening).
After a bunch of debugging, we noticed that the Windows is stuck at RdmaConnectedSession::Destroy during RDMA Close. Specifically, it's waiting for EventHandlerThread to exit when connector->Disconnect is called and for some reason, the thread only exited when RDMA Close is called on Linux. I'm not sure what's going on here nor fully understand the purpose of EventHandlerThread.
I have tried to workaround this by time-bounding connector-Disconnect with HandleHROverlappedWithTimeout with a timeout of 1000ms and it seems to work but I'm not sure if by doing so will break any of the existing logic.
The text was updated successfully, but these errors were encountered: