-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] new network driver: slirpnetstack (experimental) #101
Conversation
cc @majek Travis will give us some iperf benchmark data. |
5ecd508
to
29fdb14
Compare
seems cloudflare/slirpnetstack#1 needs to be solved |
fc17d80
to
dc56a71
Compare
DNS doesn't work? @majek (host)$ rootlesskit --net=slirpnetstack --copy-up=/etc bash
(rootlesskit) # cat /etc/resolv.conf
nameserver 8.8.8.8
(rootlesskit) # nslookup www.google.com
Server: 8.8.8.8
Address: 8.8.8.8#53
Non-authoritative answer:
Name: www.google.com
Address: 172.217.175.100
Name: www.google.com
Address: 2404:6800:4004:80b::2004
(rootlesskit) # telnet www.google.com 80
telnet: could not resolve www.google.com/80: Name or service not known |
dc56a71
to
14c2274
Compare
@AkihiroSuda yeah, I didn't get that right yet. I just committed some hack to maybe work around the major problem, but I haven't tested it proper yet. Let me know if it fixes the immediate problem. |
14c2274
to
11034db
Compare
thanks, seems fine |
https://travis-ci.org/rootless-containers/rootlesskit/builds/640423841
Note: slirp4netns vs slirpnetstack cannot be compared fairly in this benchmark, because slirpnetstack lacks host-loopback address 10.0.2.2. So slirpnetstack benchmark connects to the host eth0 address 172.17.0.2 instead. But even considering that slirpnetstack seems slow? |
|
p.s. On my laptop with slirp4netns, iperf3 throughput against the host loopback and against the host eth0 were almost same (with several
|
bhasker from gvisor team here. So I ran this locally and I think I understand why this is performing worse and there are a few simple fixes to slirpnetstack a) NewForwarder is being called with a window of 30k which means its advertising a Window Scale of 1 resulting in every packet causing a ZeroWindow event/update. w/ these fixes I see the following iperf3 -c 100.117.29.130 -t 240 iperf3 -c 100.117.29.130 -t 240 -R The latter is slower because our checksum calculation is really slow. It's using 30% of cpu when I run sudo perf top -p |
rootless-containers/rootlesskit#101 a) NewForwarder is being called with a window of 30k which means its advertising a Window Scale of 1 resulting in every packet causing a ZeroWindow event/update. b) RXChecksumOffload should be set to true as when reading packets from tap there is no need to do checksum verification. Ideally we should not need to do TXChecksumming either but enabling that causes linux to drop TCP packets w/o valid checksums. Also our calculateChecksum code is slow and could use with some loop unrolling to make it much faster. c) slirpnetstack is enabling sniffer on the link endpoint which means its trying to log every packet. This will slow down things dramatically. This should be made configurable and only enabled when debugging issues. d) ModerateRecvBuf to true won't really work as gonet API is not configured to invoke the endpoint.ModerateRecvBuf after Read(). The way auto-tuning works is gvisor invokes the API after it copies bytes to user-space. Right now setting it to true will not do anything unless the API is called. That said there is no real reason to use auto-tuning since its all onhost connection and a buffer of a couple of MB is more than enough to hit 10Gbits/s.
Thanks @hbhasker; implemented in cloudflare/slirpnetstack@14ee235 |
See https://github.com/majek/slirpnetstack Fix rootless-containers#100 Signed-off-by: Akihiro Suda <[email protected]>
11034db
to
dc4ba13
Compare
The PR should be almost ready to merge, but slirpnetstack CLI spec seems going to change? cloudflare/slirpnetstack#4 |
@AkihiroSuda it would be nice if we agreed on what CLI should look like. It seems the requirements for containers or VM are pretty similar. I have moved & updated the slirp-helper spec on a wiki: https://gitlab.freedesktop.org/slirp/libslirp/-/wikis/Slirp-Helper. Not sure it's the best way to discuss the spec though, any suggestion? but feel free to edit the wiki in the meantime. Or maybe we don't need a spec, and just follow whatever slirpnetstack defines? (note: the slirp-helper spec was precisely written to allow to easily interchange the helper implementation in libvirt...) |
I don't really get what --ready-fd is for. The network data should be processed only after the helper/configuration is ready. --exit-fd, why not kill the process? There is also a --exit-with-parent in the spec which may help.
yeah :) beside the NS-specific options, there are not that much slirp4netns options anyway. But I wonder about the JSON API, why not use DBus? |
it simplifies how the lifecycle is handled in Podman. The other end is injected into the conmon process (the shim process for the container). When conmon exits then also slirp4netns is terminated without conmon knowing how to handle slirp4netns. |
slirp4netns adopted JSON API because we didn't want to introduce extra dependencies. |
Fundamentally, DBus, the protocol, doesn't need extra dependencies compared to JSON. For example, for glib/gio apps, DBus facilities are there, while JSON would be an extra library... In practice though, the most convenient is to use the bus, which requires a message bus process. But given that DBus bus is present in 99% Linux systems, it shouldn't be a problem for slirp. (and it can work on macos or even windows) Regarding shell scriptability, plenty of choices: busctl, gdbus and dbus-send. The introspection capability makes it very convenient too with bash completion etc. For others, like python, there are various convenient high-level API. json isn't a good machine serialization format, and has issues with numbers. It is also pretty limited. DBus comes with better machine format, types, security, introspection and tools in general for IPC. It's unfortunate that qemu picked JSON. I don't think we should repeat that. |
How will it work with dind? |
What do you mean? |
Docker-in-Docker and its variants like Podman-in-Podman. These environments don't have init and execute the container engine directly as PID 1. |
There are several options I can think of:
How/what do you connect to the json socket today? |
slirp4netns itself listens on a UNIX socket. The socket path is specified by the caller process such as RootlessKit (used by Docker) and Podman. |
How do you typically connect to that socket path, and when using dind ? From outside the container? Sounds like it would be fairly easy to start a dbus-daemon and use the bus socket instead for all IPC. You would have a single socket path for all external processes. dbus-broker is 250kb, typically 1-2 mb in RAM. Hardly a large dependency compared to the multi-Mb of go processes. |
I just found out podman is using varlink. This is perhaps another option.. |
The socket is only connected from RootlessKit and Podman, which determines the socket path by themselves.
The varlink API is already being superseded by Docker-compatible REST API. |
I see, so you don't need to have an IPC with the helper. There is a "builtin" port redirection that does setup over unix sockets, and skips the slirp/userspace tcpip stack, but running a small splice/proxy. It is not as generic/powerful as slirp arbitrary stream redirection, but that's probably not necessary for containers in general. As long as --dbus remain optional, I suppose you don't care then?
Ok, I think we can stick with DBus then, as I believe it is more convenient to work with from shell, python or go (godbus is pretty nice). For -exit-fd, I can see that makes things a bit easier on the management side, and it's easy to add. But it looks more like a hack to me. If I read it right, podman intentionally "leaks" rootlessSlirpSyncW opened fd, and silently passed to conmon on fork/exec. Right? I see that conmon.pid is written out on some userdata/ dir. Why not also track other helpers this way? |
I don't care it for rootless containers. But if we want the slirp-helper spec to be adopted by virtual machine platforms (QEMU, VirtualBox, Bochs, BasilliskII, SheepShaver, SIMH...) as well, we need to make sure the spec can be easily implemented on non-Linux platforms such as macOS and Windows. |
Given that the bus is explicitly optional, and DBus p2p must be supported, a simple stream (tcp or namedpipe or whatever) is enough to communicate with the helper on any platforms. But quite knowingly, choosing DBus is based on a preference for Linux systems. Also, nothing prevents from extending the spec with additional IPC, if necessary. |
dbus can't cross over the user namespace border? 😭 $ unshare -r dbus-monitor --address unix:path=/run/user/1001/bus
Failed to register connection to bus at unix:path=/run/user/1001/bus: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. |
It's a plain unix socket. There are some default security/credentials restrictions that may apply here. What are you trying to achieve? You want a container to access the host bus? That sounds wrong. Fwiw, I think flatpak has a lot of thinking around how host and container bus can communicate. |
See https://github.com/majek/slirpnetstack
Fix #100