Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

submit_and_wait(1) does not return w/ nop and recvmsg_multishot #1364

Closed
TwoClocks opened this issue Mar 11, 2025 · 5 comments
Closed

submit_and_wait(1) does not return w/ nop and recvmsg_multishot #1364

TwoClocks opened this issue Mar 11, 2025 · 5 comments

Comments

@TwoClocks
Copy link

I'm having all kinds of issues trying to get recvmsg_multishot to work on some multicast addresses over localhost. Sometimes I get data on some sockets, but not others. Sometimes I don't get anything. Sometimes it works fine. I'm sure I'm doing something wrong, but I'm not sure what.

But the oddest thing, is that submit() is not returning w/ a NOP sqe in the list.

Here is the SQE when I submit_and_wait:

os.linux.io_uring_sqe.io_uring_sqe{ .opcode = os.linux.IORING_OP.RECVMSG, .flags = 37, .ioprio = 2, .fd = 4, .off = 0, .addr = 140100692133480, .len = 1, .rw_flags = 0, .user_data = 1369234387412764160, .buf_index = 43520, .personality = 0, .splice_fd_in = 0, .addr3 = 0, .resv = 0 }
    -- flags:0b100101 ioprio:0b10
os.linux.io_uring_sqe.io_uring_sqe{ .opcode = os.linux.IORING_OP.RECVMSG, .flags = 37, .ioprio = 2, .fd = 5, .off = 0, .addr = 140100692135016, .len = 1, .rw_flags = 0, .user_data = 1369234387412765696, .buf_index = 43521, .personality = 0, .splice_fd_in = 0, .addr3 = 0, .resv = 0 }
    -- flags:0b100101 ioprio:0b10
os.linux.io_uring_sqe.io_uring_sqe{ .opcode = os.linux.IORING_OP.NOP, .flags = 0, .ioprio = 0, .fd = 0, .off = 0, .addr = 0, .len = 0, .rw_flags = 0, .user_data = 864831229147275264, .buf_index = 0, .personality = 0, .splice_fd_in = 0, .addr3 = 0, .resv = 0 }
    -- flags:0b0 ioprio:0b0	

Even if my setup of the recvs is wrong, I would expect this .NOP to return right away. Instead submit_and_wait(1) never returns. The behavior is the same if I setup the ring with IORING_SETUP_SQPOLL or not.

If I re-order the requests, or I pre-pend other requests in the same submission, the behavior changes, but at least submit() returns.

running a 6.12.16 kernel.

What's the best way to debug/trace how the kernel is processing this queue?

@axboe
Copy link
Owner

axboe commented Mar 12, 2025

Please provide a complete code example that shows it. I can almost certainly guarantee that this is a coding issue, it's not an io_uring issue, as it's a pretty basic usage.

@axboe
Copy link
Owner

axboe commented Mar 12, 2025

For tracing, you can do:

echo 1 > /sys/kernel/debug/tracing/events/io_uring_enable

run your code, and then

cat /sys/kernel/debug/tracing/trace

@TwoClocks
Copy link
Author

Thanks!

Using the trace was clutch. Is that in the man page someplace?
Found my bug. Was setting a bad flag and linking requested that shouldn't be linked. (similar name, different flag field).

I'm still a bit surprised that the submit did not return w/ a valid NOP. The kernel trace said it completed the NOP fine.

@TwoClocks
Copy link
Author

oh! I see. My bug was thus:
I was setting the flags with IORING_RECVSEND_FIXED_BUF when I meant to be using IOSQE_FIXED_FILE (auto-complete). Causing the NOP to be linked to a recv requests. It all makes sense now.

@axboe
Copy link
Owner

axboe commented Mar 12, 2025

Using the trace was clutch. Is that in the man page someplace?

Don't think it is... Should probably be, but not quite sure where to put it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants