-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Route traffic into rootlesskit netns #173
Comments
I attempted to do this myself by binding the rootlesskit netns into a named one for |
How is this related to port drivers? https://github.com/rootless-containers/rootlesskit#port-drivers https://github.com/rootless-containers/usernetes#expose-netns-ports-to-the-host |
It'd be an alternative to the current helpers we have in
The issue with this is that it means we have to deconflict ports in both the rootlesskit netns, and outside in the main netns. We already have IP networking for both the CNI subnet and the ClusterIP subnet available inside the rootlesskit netns, and if I nsenter it, I can curl all of the pods and services using their assigned IP addresses. It would be nice if we could simply route traffic to/from those ranges without having to enter the main rootlesskit netns. Specifically, it would be useful and roughly representative of a cloud cluster if ClusterIPs and LoadBalancer IPs (e.g. allocated by something like metallb) could be easily routed to by a naive user. I know that ClusterIPs would normally not be accessible by a user, but allowing that for u7s would make it much easier to work with since it would let them avoid running metallb or some similar infrastructural service. I'd see this working something like:
I would see the actual implementation looking something like the following steps:
I didn't get the routing working when I tried it. It's been a little while since I worked with veths and bridging so it's probably my fault. I also think it would be sensible to be able to call out of the throwaway netns to the host/internet, I've not described that in the steps above. It's also worth noting that the throwaway netns is purely so the user can do this rootless as well. Having a rootful version which bridges the main system netns to the cluster might be interesting for convenience. So does this sound workable? I'm happy to make a PR if I can get it working, but might need some tips on why I can't route like I expected. Edit: So I've just realised that obviously the throwaway netns won't work nicely as I laid out above because we can't push one of the veth ends into a sibling netns. I suppose we could set up the veths in the rootlesskit namespace, then push one end into a throwaway child netns of the RK one, then push the user into that throwaway netns? For now I've mocked this up creating a veth in the main system netns with root, then pushing one end into each of the throwaway and rootlesskit namespaces. Setting a /30 on each end of the veth and adding a route to 10.0.0.0/24 via the IP assigned to the veth in the rootlesskit namespace allows me to reach services by their assigned ClusterIP :) Edit 2: Working flow: # Terminal 1
(ns/host) $ export RK_PID="$(cat "${XDG_RUNTIME_DIR}/usernetes/rootlesskit/child_pid")"
(ns/host) $ nsenter -U --preserve-credential -n -t "${RK_PID)"
(ns/cluster) $ ip link add rk-veth type veth peer name usr-veth
(ns/cluster) $ ip link set up rk-veth
(ns/cluster) $ ip link set up usr-veth
(ns/cluster) $ ip addr add 10.20.30.1/30 dev rk-veth
(ns/cluster) $ ip addr add 10.20.30.2/30 dev usr-veth
# Terminal 2
(ns/host) $ nsenter -U --preserve-credential -n -t "${RK_PID)"
(ns/cluster) $ unshare -U -n --map-root-user
(ns/bridge) $ echo $$
<CHILD_PID>
# Terminal 3 - I think this shouldn't need to be rootful but I couldn't do it in the RK namespace
(ns/host) $ sudo mkdir -p /run/netns; sudo touch /run/netns/usr-bridge
(ns/host) $ sudo mount -o /proc/<CHILD_PID/net/ns /run/netns/usr-bridge
# Back to terminal 1
(ns/cluster) $ ip link set usr-veth netns usr-bridge
# Back to terminal 2
(ns/bridge) $ ip route add 10.0.0/24 via 10.20.30.1 dev usr-veth
(ns/bridge) $ curl -vv <some_service_ip>
hello world! We can then enter the bridged namespace from the host at will in other terminals and things work as expected. By bridging the rk-veth to cni0 when I tried last time, I wasn't actually able to do IP routing which explains why I was getting confused. By just adding a route via the IP on the rk-veth from the usr-veth side, the kernel's IP forwarding stack takes care of it for me. |
This could be substituted by running |
I actually just avoided this by realising I could just specify the target PID instead. Didn't know that was a thing. Here is a working #!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
SCRIPT_PATH="$(realpath -e "${BASH_SOURCE[0]}")"
SCRIPT_DIR="${SCRIPT_PATH%/*}"
UNSHARE_CMD=("unshare" "-U" "--map-root-user" "-m" "-n")
NSENTER_CMD=(
"nsenter" "-U" "--preserve-credential" "-m" "-n" "--wd=${PWD}" "-t"
)
if [ -z "${_BR_REEXEC_OUTER:+SET}" ]; then
_RK_PID="$(cat "${XDG_RUNTIME_DIR}/usernetes/rootlesskit/child_pid")"
if [ -z "${_RK_PID}" ]; then
echo "No rootlesskit PID found"
exit 127
fi
"${NSENTER_CMD[@]}" "${_RK_PID}" env _BR_REEXEC_OUTER=true "${SCRIPT_PATH}"
exit $?
else
ip link add rk-veth type veth peer name usr-veth
ip link set up rk-veth
ip addr add 10.20.30.1/30 dev rk-veth
"${UNSHARE_CMD[@]}" sleep infinity &
SLEEP_PID=$!
# Set up a teardown function
declare -a TEARDOWN_PIDS=( "${SLEEP_PID}" )
function stop () {
if [ "${#TEARDOWN_PIDS[@]}" -gt 0 ]; then
kill "${TEARDOWN_PIDS[@]}"
fi
}
trap exit SIGINT SIGTERM
trap stop EXIT ERR
# Pause to ensure that unshare has re-execed
while [ \
"$(stat -c "%i" /proc/self/ns/net)" == \
"$(stat -c "%i" "/proc/${SLEEP_PID}/ns/net")" \
]; do sleep 0.1; done
# Run a slirp in the bridge netns
slirp4netns --configure --mtu=65520 --disable-host-loopback \
"${SLEEP_PID}" tap0 &
TEARDOWN_PIDS+=( "$!" )
# Send the user end of the veth into the bridge netns
ip link set usr-veth netns "${SLEEP_PID}"
# Run some network setup commands
cmds=(
"ip link set up usr-veth;"
"ip addr add 10.20.30.2/30 dev usr-veth;"
"ip route add 10.0.0.0/24 via 10.20.30.1 dev usr-veth;"
)
"${NSENTER_CMD[@]}" "${SLEEP_PID}" "${SHELL:-bash}" -c "${cmds[*]}"
"${NSENTER_CMD[@]}" "${SLEEP_PID}" cat >/etc/bridge.resolv.conf <<EOF
nameserver 10.0.0.53
search cluster.local
EOF
"${NSENTER_CMD[@]}" "${SLEEP_PID}" mount -o ro,bind /etc/bridge.resolv.conf /etc/resolv.conf
# Finally run a shell for the user
"${NSENTER_CMD[@]}" "${SLEEP_PID}" "${SHELL:-bash}"
exit $?
fi There are a few things to change in there if it were to be runnable more than once at a time, e.g. dynamic veth names and IPs, but should be a solid base. I'm going to move onto a few other things for the day now since I can now see how well this will fit into the rest of my workflow. If you could let me know what you think about getting something like this merged in, I'd appreciate it! I'll keep an eye on this issue :) Edit: Added a slirp and cluster DNS |
Why do you need slirp in slirp? |
Hmm, you're right. That second slirp should probably run in the top level namespace, rather than in the cluster namespace. The intention was to avoid routing external traffic (ie. not to the cluster related routes we explicitly add) from the child namespace via the main rootlesskit ns, but obviously by running the slirp in the rootlesskit ns, that's exactly what we're doing. Oops! It seems like it might be a pain to get the PID of the sleep anchoring the child namespace up to the top level script though, so a cheaper way would be to add a default route via the veth as well - but I think the second slirp is a better way, if we can manage it. |
I'm using u7s to mock up a bare-metal cluster deployment and it would be nice to be able to easily enter an environment where I'm able to route traffic to IPs inside a rootlesskit netns. In my head, this would probably involve a throwaway netns with a veth to pass traffic to the sibling (target) netns (somehow bridged as well?), and a set of routes from the CLI (e.g. cluster IP ranges from u7s, plus extra ranges specified by a user). It would make sense for this throwaway netns to also be capable of routing out to the host/onward and making use of host DNS by default.
Is there currently a nice way of doing something like this? If not, would it be straightforward to implement?
The text was updated successfully, but these errors were encountered: