Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listener's session in Windows waits until Connector's session in Linux is closed #21

Open
nikhalim opened this issue Aug 8, 2023 · 3 comments · May be fixed by #23
Open

Listener's session in Windows waits until Connector's session in Linux is closed #21

nikhalim opened this issue Aug 8, 2023 · 3 comments · May be fixed by #23

Comments

@nikhalim
Copy link
Collaborator

nikhalim commented Aug 8, 2023

We recently noticed that when rdma_close is called on Windows' listener session, it will attempt to wait until rdma_close in Linux's connector's session is called.

This scenario was tested using Labview but I believe it should be reproducible with C API as well.
Window's Labview VI:

  • RDMA Listen->RDMA Accept->RDMA Configure Buffers->RDMA Acquire Received Buffer->Delete DVR->RDMA Close
    Linux's Labview VI (NI Linux RT is used in this example):
  • RDMA Connect->RDMA Configure Buffers->RDMA Acquire Send Buffer->RDMA Set Used Send Buffer Size->Delete DVR->Time delay of 10s->RDMA Close

The expectation is that once DVR is deleted, Windows will proceed to close the RDMA connection. But our observation today is that Windows will hang at RDMA Close until Linux's RDMA Close is called (after the 10s delay).
The same scenario doesn't reproduce with 2 Windows systems, 2 Linux systems nor with Windows system connecting to Linux system (listening).

After a bunch of debugging, we noticed that the Windows is stuck at RdmaConnectedSession::Destroy during RDMA Close. Specifically, it's waiting for EventHandlerThread to exit when connector->Disconnect is called and for some reason, the thread only exited when RDMA Close is called on Linux. I'm not sure what's going on here nor fully understand the purpose of EventHandlerThread.

I have tried to workaround this by time-bounding connector-Disconnect with HandleHROverlappedWithTimeout with a timeout of 1000ms and it seems to work but I'm not sure if by doing so will break any of the existing logic.

@nikhalim
Copy link
Collaborator Author

nikhalim commented Aug 8, 2023

FYI @ericgross2

@nikhalim nikhalim linked a pull request Aug 11, 2023 that will close this issue
@ericgross2
Copy link
Contributor

Have you tried taking a Wireshark/tcpdump trace of this sequence?

@nikhalim
Copy link
Collaborator Author

I'm unable to see any trace (or at least any meaningful trace). I used Wireshark in Windows and tcpdump in Linux.

Here are the VIs used to reproduce the problem. The top VI creates a listener session in Windows and bottom VI creates a connector session in Linux.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants