-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IO_URING better than DPDK backend? #2622
Comments
On Fri, Jan 17, 2025 at 9:59 AM GeorgKreuzmayr ***@***.***> wrote:
Hey everyone,
I am currently benchmarking the seastar user-space TCP stack and was
surprised by the results where the IO_URING backend performed better than
the DPDK backend.
Result:
IO_URING Backend: 23 Gbit/s per direction
DPDK Backend: ~5-10 Gbit/s per direction
Setup:
c7gn.16xlarge instance type
AWS cluster placement group
Full-duplex communication
16 TCP connections
1 Shard (CPU core)
Seastar commit: 871079a
<871079a>
Ubuntu 24.04
Also, I found that when increasing the number of connections, at some
point the DPDK backend started breaking completely.
You can find the example that I benchmarked in this repository
<https://github.com/wagjamin/seastar-experiments/tree/minimal-example>.
Are these performance numbers expected?
Is there any known issue with many TCP connections using the DPDK backend?
16 connections isn't much.
I wasn't aware we support iouring out of the box that easily.
In general, our DPDK is older and not that maintained. In the past it
did offer superior results but that was even before io-uring was developed.
To make a fair test, you should use a more recent dpdk app (but usually
they don't come with tcp)
Can you verify that in both cases only a single hardware thread is used?
There is a good chance that the kernel uses more threads, for irq processing
or other.
Another option is to use larger msg sizes, where zero copy will be more
effective.
… —
Reply to this email directly, view it on GitHub
<#2622>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANHURO32GXRQZZCFEUH5X32LELELAVCNFSM6AAAAABVMDHADKVHI2DSMVQWIX3LMV43ASLTON2WKOZSG44TKNRXG4YDIMA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thank you for getting back to me on this!
I agree, but as I mentioned above, the whole app was breaking when I used more than that with the DPDK backend.
The IO_URING backend seems to be the default. From what I can understand, you probably use something like AF_XDP sockets to send and receive raw ethernet frames, similar to what you can do with DPDK.
Yes
I don't think it is possible to use "zero-copy" TCP, as the memory I am passing to your TCP stack is not pinned, which is required for the NIC to DMA it onto the wire. |
On Fri, Jan 17, 2025 at 10:58 AM GeorgKreuzmayr ***@***.***> wrote:
Thank you for getting back to me on this!
16 connections isn't much.
I agree, but as I mentioned above, the whole app was breaking when I used
more than that with the DPDK backend.
I wasn't aware we support iouring out of the box that easily.
The IO_URING backend seems to be the default. From what I can understand,
you probably use something like AF_XDP sockets to send and receive raw
ethernet frames, similar to what you can do with DPDK.
Can you verify that in both cases only a single hardware thread is used?
Yes
Another option is to use larger msg sizes, where zero copy will be more
effective.
I don't think it is possible to use "zero-copy" TCP, as the memory I am
passing to your TCP stack is not pinned, which is required for the NIC to
DMA it onto the wire.
Seastar can pin the memory (pinned mostly (based on config) in ScyllaDB
too)
and it can zero copy but it's not worth it with small packets.
A potential advantage would be to test multiple cores which will work
nicely but
will saturate the link. Anyway, it's been too long from the time we were
looking
into every bit here so my feedback is very limited
… —
Reply to this email directly, view it on GitHub
<#2622 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANHUROSQJJNDJQPIFT567D2LESB3AVCNFSM6AAAAABVMDHADKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJYGY3TAMJYGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hey everyone,
I am currently benchmarking the seastar user-space TCP stack and was surprised by the results where the IO_URING backend performed better than the DPDK backend.
Result:
IO_URING Backend: 23 Gbit/s per direction
DPDK Backend: ~5-10 Gbit/s per direction
Setup:
c7gn.16xlarge instance type
AWS cluster placement group
Full-duplex communication
16 TCP connections
1 Shard (CPU core)
Seastar commit: 871079a
Ubuntu 24.04
Also, I found that when increasing the number of connections, at some point the DPDK backend started breaking completely.
You can find the example that I benchmarked in this repository.
Are these performance numbers expected?
Is there any known issue with many TCP connections using the DPDK backend?
The text was updated successfully, but these errors were encountered: