|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "CI & new features" |
| 4 | +--- |
| 5 | + |
| 6 | +The [previous post]({% post_url 2024-03-04-backports %}) mentioned that February |
| 7 | +was still full of various "maintenance" tasks, mainly around the backports, and |
| 8 | +the preparation of the future Linux 6.9. The beginning of March was similar to |
| 9 | +that, then more time was finally available to look at fixing issues, and |
| 10 | +preparing new features. Read on to find out more about what happened in March! |
| 11 | + |
| 12 | +<!--more--> |
| 13 | + |
| 14 | +## The future v6.9 and backports |
| 15 | + |
| 16 | +Linux v6.8 was released on March 10th. As mentioned in my [previous post]({% |
| 17 | +post_url 2024-03-04-backports %}), we had up to this date to suggest new |
| 18 | +features, and refactoring to be included in `net-next` tree before being closed |
| 19 | +for new submissions. We took this opportunity to send a last feature for the |
| 20 | +future v6.9 (`TCP_NOTSENT_LOWAT` socket option support from Paolo) one week |
| 21 | +before, and a bunch of refactoring in the selftests initiated by Geliang, a few |
| 22 | +days before the limit. We usually don't like to rush things just before the |
| 23 | +closure, but it generally helps to reduce the maintenance cost to send big |
| 24 | +refactoring early, than having to carry it only in our tree for a bit of time. |
| 25 | + |
| 26 | +This has been done while in parallel, I was also helping the stable team |
| 27 | +[backporting]({% post_url 2024-03-04-backports %}) even more patches which could |
| 28 | +not be applied without conflicts in stable versions. Pretty much the same as |
| 29 | +what was done in February, indeed, not that interesting then :) |
| 30 | + |
| 31 | + |
| 32 | +## CI: a big step forward |
| 33 | + |
| 34 | +With more available time, this allows me to work on the long awaited tasks |
| 35 | +linked to the CI: |
| 36 | +- Using [runners with KVM support](https://github.com/multipath-tcp/mptcp_net-next/issues/474). |
| 37 | +- Validating [MPTCP BPF tests](https://github.com/multipath-tcp/mptcp_net-next/issues/406). |
| 38 | +- Switching to [`virtme-ng`](https://github.com/multipath-tcp/mptcp_net-next/issues/472). |
| 39 | +- Tracking regressions by [publishing tests results](https://github.com/multipath-tcp/mptcp_net-next/issues/473). |
| 40 | + |
| 41 | +### GitHub Actions and KVM support |
| 42 | + |
| 43 | +Back in [December]({% post_url 2024-01-01-Angel-Project %}), when the switch to |
| 44 | +GitHub Actions started, it was not possible to enable KVM support with public |
| 45 | +runners. That was the main reason behind choosing [Cirrus CI](https://cirrus-ci.org/) |
| 46 | +a few years ago, and keeping it for the tests with the debug kernel config a few |
| 47 | +months ago. As described in the [previous post]({% post_url 2024-01-01-Angel-Project %}), |
| 48 | +our workflow was impacted by Cirrus CI's monthly limit, and it was the reason |
| 49 | +behind this partial switch to GitHub Actions. Moving only the tests with a |
| 50 | +non-debug kernel config was not enough, we were still impacted by that: the |
| 51 | +monthly limit was reached on the 31st of January, and on the 16th of February. |
| 52 | +Another solution was then required. |
| 53 | + |
| 54 | +I was then looking at adding a self-hosted runner. I managed to |
| 55 | +[successfully](https://github.com/matttbe/mptcp_net-next/actions/runs/8194936484) |
| 56 | +execute the tests on a self-hosted runner which was a refurbished mini PC at |
| 57 | +home. I then realised that was not enough: KVM was still not used, because the |
| 58 | +docker image is not executed with enough permissions (`--privileged`, or |
| 59 | +`--cap-add` + `mount`). |
| 60 | + |
| 61 | +I knew from a [GitHub blog post from last year](https://github.blog/changelog/2023-02-23-hardware-accelerated-android-virtualization-on-actions-windows-and-linux-larger-hosted-runners/) |
| 62 | +that it was possible to have KVM support, so I tried to find a way to use it |
| 63 | +with our "Docker container actions", like they do in |
| 64 | +[reactivecircus/android-emulator-runner](https://github.com/reactivecircus/android-emulator-runner). |
| 65 | +Then I found out that since [January this year](https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/), |
| 66 | +it is possible to have KVM support with the Linux public GitHub runners! So no |
| 67 | +need to host and maintain that at home with a limited Internet connection! Plus |
| 68 | +it means there is no need to restrict these tests to patches sent on our mailing |
| 69 | +list, people can have results from the CI simply by sending code to their GitHub |
| 70 | +fork repo! |
| 71 | + |
| 72 | +So I: |
| 73 | +- [Enabled KVM support](https://github.com/multipath-tcp/mptcp_net-next/commit/677b5ecd223ca1a39e993dfd0138f32420521d26) |
| 74 | + with a "workaround" (Docker is launched manually) |
| 75 | +- [Added the 'debug' mode support](https://github.com/multipath-tcp/mptcp_net-next/commit/6c0b56e647b611e902ffacb958eb7443009f0ef2) |
| 76 | +- [Removed Cirrus-CI support](https://github.com/multipath-tcp/mptcp_net-next/commit/cc356e6ad19f66c50a97e7829e7031bbb5b7f199) |
| 77 | +- (And did other [clean-ups](https://github.com/multipath-tcp/mptcp_net-next/commits/t/DO-NOT-MERGE-mptcp-add-CI-support/.github/workflows?author=matttbe&since=2024-03-01&until=2024-03-31) |
| 78 | + while at it) |
| 79 | + |
| 80 | +With KVM support, the CPU usage is reduced and no longer near the 100% limit, so |
| 81 | +our tests are more stable. Dropping Cirrus-CI support with a bunch of pretty |
| 82 | +much duplicated code is helpful for the maintenance in the long term. |
| 83 | + |
| 84 | +### BPF Tests |
| 85 | +MPTCP BPF tests are present in the Linux kernel since 2022 (they were already in |
| 86 | +our tree in August 2020, but the development got interrupted). Back then, the |
| 87 | +tests were limited to the available features: being able to read fields from an |
| 88 | +MPTCP socket and checking if a TCP socket is an MPTCP subflow. With this, it is |
| 89 | +possible to monitor MPTCP connections, and even interact with them, e.g. by |
| 90 | +changing socket options per subflow. Later, |
| 91 | +[`mptcpify`](https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=ddba122428a7) |
| 92 | +BPF program has been added to force the creation of MPTCP sockets instead of TCP |
| 93 | +ones. |
| 94 | + |
| 95 | +Until recently, these tests -- and the ones for the work-in-progress MPTCP BPF |
| 96 | +packet schedulers -- were not validated by our CI. We didn't track regressions |
| 97 | +in this area. With the help of Geliang, our CI scripts have been |
| 98 | +[adapted](https://github.com/search?q=repo%3Amultipath-tcp%2Fmptcp-upstream-virtme-docker+bpf&type=commits) |
| 99 | +to run these tests. Recently, I added a |
| 100 | +["matrix" support](https://github.com/multipath-tcp/mptcp_net-next/commit/71a9e1d223e484148778e2549adbf18a6abecf8a) |
| 101 | +on GitHub Action to be able to run these tests requiring more kernel config |
| 102 | +options in a dedicated runner. |
| 103 | + |
| 104 | +### Virtme NG |
| 105 | +[Virtme](https://github.com/amluto/virtme/) is very useful to quickly run a VM |
| 106 | +with a custom kernel, and using the file system of the host (or in our case, the |
| 107 | +one of a container containing all required dependences). We have been using it |
| 108 | +since 2019, and we were happy with it. |
| 109 | + |
| 110 | +In 2020, it looks like this Virtme project started to get unmaintained. In |
| 111 | +December 2022, we had to [patch it](https://github.com/amluto/virtme/pull/82) to |
| 112 | +support kernels >= 6.2. More recently, another |
| 113 | +[patch](https://github.com/amluto/virtme/pull/81) was required to support QEmu >= |
| 114 | +7.2. Andrea Righi started to gather different fixes on |
| 115 | +[his side](https://github.com/arighi/virtme/), before creating the |
| 116 | +[`virtme-ng` project](https://github.com/arighi/virtme-ng/) in 2023. |
| 117 | + |
| 118 | +`virtme-ng` brings interesting features introduced in this nice |
| 119 | +[LWN article](https://lwn.net/Articles/951313/). Switching to it would reduce |
| 120 | +the boot time, and reduce a lot the I/O thanks to |
| 121 | +[`virtiofs`](https://virtio-fs.gitlab.io/). So that's what we did |
| 122 | +[recently](https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/commit/0c54a948e22669d265b4ef083080e0f0af3ffe6f). It should also help us for the long |
| 123 | +term maintenance. |
| 124 | + |
| 125 | +### Tracking regressions |
| 126 | + |
| 127 | +Since we use a public CI, results are simply published on an IRC channel |
| 128 | +([#mptcp-ci](https://web.libera.chat/?#mptcp-ci)). This is not really easy to |
| 129 | +track regressions. |
| 130 | + |
| 131 | +[Publish Test Results](https://github.com/marketplace/actions/publish-test-results) |
| 132 | +GitHub Action has been added, but it doesn't keep a long history of results. |
| 133 | + |
| 134 | +A new ["Flakes"](https://ci-results.mptcp.dev/flakes.html) has then been created |
| 135 | +to help us to track unstable tests. It is similar to |
| 136 | +[Netdev's Flakes](https://netdev.bots.linux.dev/flakes.html) page (with |
| 137 | +[dark scheme support](https://github.com/linux-netdev/nipa/pull/17) :) ). |
| 138 | + |
| 139 | +It is a shame such service is not better integrated in GitHub Actions. In a |
| 140 | +perfect world where tests are all stable, it should not be needed. But here, |
| 141 | +when hosts need to talk to each other, packets can be delayed for some reason, |
| 142 | +causing retransmissions, etc. It is not easy to predict everything. The |
| 143 | +[cURL](https://curl.se/) project is using |
| 144 | +[TestClutch](https://github.com/dfandrich/testclutch/), but it is an external |
| 145 | +service to deploy, and it doesn't support the TAP format yet. |
| 146 | + |
| 147 | +## What's next? |
| 148 | + |
| 149 | +Big work has been started to rewrite [mptcp.dev](https://www.mptcp.dev) website. |
| 150 | +When working on adding native MPTCP support to apps like |
| 151 | +[lighttpd](https://github.com/lighttpd/lighttpd1.4/pull/132) and |
| 152 | +[curl](https://github.com/curl/curl/pull/13278), it was clear that a website |
| 153 | +gathering all required info to know about MPTCP to set it up, and to add its |
| 154 | +support in apps were missing. (*Note: our website was updated on the 18th of |
| 155 | +April, it was looking like |
| 156 | +[this](https://github.com/multipath-tcp/mptcp.dev/blob/531801e/README.md) |
| 157 | +before.*) |
| 158 | + |
| 159 | +Publishing a doc in the kernel official documentation will also help end-users |
| 160 | +and app developers. |
| 161 | + |
| 162 | +In terms of developments, the next priorities are adding |
| 163 | +[missing features](https://github.com/golang/go/issues/56539#issuecomment-1940486340) |
| 164 | +to have MPTCP enabled by default in Go. |
| 165 | + |
| 166 | + |
| 167 | +## Team work |
| 168 | + |
| 169 | +As always, it is important to note that what I presented here so far is mostly |
| 170 | +what I was working on. But I'm not alone in this project. For example, Geliang |
| 171 | +continued to do some clean-ups in the KSelfTests, looked at the MPTCP |
| 172 | +support in [IPerf3](https://github.com/esnet/iperf/pull/1661), and started to |
| 173 | +look at adding "last time" counters in `MPTCP_INFO`. Mat and Paolo helped with |
| 174 | +the reviews, and Christoph looked at running fuzzing tests on top of the last |
| 175 | +RHEL kernel. |
| 176 | + |
| 177 | +A great community! |
0 commit comments