Skip to content

Commit 1965dc6

Browse files
committed
post: new article about the last CI changes
And more, around the maintenance part. Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
1 parent 8604766 commit 1965dc6

File tree

1 file changed

+177
-0
lines changed

1 file changed

+177
-0
lines changed

_posts/2024-04-08-CI-new-feat.md

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
---
2+
layout: post
3+
title: "CI & new features"
4+
---
5+
6+
The [previous post]({% post_url 2024-03-04-backports %}) mentioned that February
7+
was still full of various "maintenance" tasks, mainly around the backports, and
8+
the preparation of the future Linux 6.9. The beginning of March was similar to
9+
that, then more time was finally available to look at fixing issues, and
10+
preparing new features. Read on to find out more about what happened in March!
11+
12+
<!--more-->
13+
14+
## The future v6.9 and backports
15+
16+
Linux v6.8 was released on March 10th. As mentioned in my [previous post]({%
17+
post_url 2024-03-04-backports %}), we had up to this date to suggest new
18+
features, and refactoring to be included in `net-next` tree before being closed
19+
for new submissions. We took this opportunity to send a last feature for the
20+
future v6.9 (`TCP_NOTSENT_LOWAT` socket option support from Paolo) one week
21+
before, and a bunch of refactoring in the selftests initiated by Geliang, a few
22+
days before the limit. We usually don't like to rush things just before the
23+
closure, but it generally helps to reduce the maintenance cost to send big
24+
refactoring early, than having to carry it only in our tree for a bit of time.
25+
26+
This has been done while in parallel, I was also helping the stable team
27+
[backporting]({% post_url 2024-03-04-backports %}) even more patches which could
28+
not be applied without conflicts in stable versions. Pretty much the same as
29+
what was done in February, indeed, not that interesting then :)
30+
31+
32+
## CI: a big step forward
33+
34+
With more available time, this allows me to work on the long awaited tasks
35+
linked to the CI:
36+
- Using [runners with KVM support](https://github.com/multipath-tcp/mptcp_net-next/issues/474).
37+
- Validating [MPTCP BPF tests](https://github.com/multipath-tcp/mptcp_net-next/issues/406).
38+
- Switching to [`virtme-ng`](https://github.com/multipath-tcp/mptcp_net-next/issues/472).
39+
- Tracking regressions by [publishing tests results](https://github.com/multipath-tcp/mptcp_net-next/issues/473).
40+
41+
### GitHub Actions and KVM support
42+
43+
Back in [December]({% post_url 2024-01-01-Angel-Project %}), when the switch to
44+
GitHub Actions started, it was not possible to enable KVM support with public
45+
runners. That was the main reason behind choosing [Cirrus CI](https://cirrus-ci.org/)
46+
a few years ago, and keeping it for the tests with the debug kernel config a few
47+
months ago. As described in the [previous post]({% post_url 2024-01-01-Angel-Project %}),
48+
our workflow was impacted by Cirrus CI's monthly limit, and it was the reason
49+
behind this partial switch to GitHub Actions. Moving only the tests with a
50+
non-debug kernel config was not enough, we were still impacted by that: the
51+
monthly limit was reached on the 31st of January, and on the 16th of February.
52+
Another solution was then required.
53+
54+
I was then looking at adding a self-hosted runner. I managed to
55+
[successfully](https://github.com/matttbe/mptcp_net-next/actions/runs/8194936484)
56+
execute the tests on a self-hosted runner which was a refurbished mini PC at
57+
home. I then realised that was not enough: KVM was still not used, because the
58+
docker image is not executed with enough permissions (`--privileged`, or
59+
`--cap-add` + `mount`).
60+
61+
I knew from a [GitHub blog post from last year](https://github.blog/changelog/2023-02-23-hardware-accelerated-android-virtualization-on-actions-windows-and-linux-larger-hosted-runners/)
62+
that it was possible to have KVM support, so I tried to find a way to use it
63+
with our "Docker container actions", like they do in
64+
[reactivecircus/android-emulator-runner](https://github.com/reactivecircus/android-emulator-runner).
65+
Then I found out that since [January this year](https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/),
66+
it is possible to have KVM support with the Linux public GitHub runners! So no
67+
need to host and maintain that at home with a limited Internet connection! Plus
68+
it means there is no need to restrict these tests to patches sent on our mailing
69+
list, people can have results from the CI simply by sending code to their GitHub
70+
fork repo!
71+
72+
So I:
73+
- [Enabled KVM support](https://github.com/multipath-tcp/mptcp_net-next/commit/677b5ecd223ca1a39e993dfd0138f32420521d26)
74+
with a "workaround" (Docker is launched manually)
75+
- [Added the 'debug' mode support](https://github.com/multipath-tcp/mptcp_net-next/commit/6c0b56e647b611e902ffacb958eb7443009f0ef2)
76+
- [Removed Cirrus-CI support](https://github.com/multipath-tcp/mptcp_net-next/commit/cc356e6ad19f66c50a97e7829e7031bbb5b7f199)
77+
- (And did other [clean-ups](https://github.com/multipath-tcp/mptcp_net-next/commits/t/DO-NOT-MERGE-mptcp-add-CI-support/.github/workflows?author=matttbe&since=2024-03-01&until=2024-03-31)
78+
while at it)
79+
80+
With KVM support, the CPU usage is reduced and no longer near the 100% limit, so
81+
our tests are more stable. Dropping Cirrus-CI support with a bunch of pretty
82+
much duplicated code is helpful for the maintenance in the long term.
83+
84+
### BPF Tests
85+
MPTCP BPF tests are present in the Linux kernel since 2022 (they were already in
86+
our tree in August 2020, but the development got interrupted). Back then, the
87+
tests were limited to the available features: being able to read fields from an
88+
MPTCP socket and checking if a TCP socket is an MPTCP subflow. With this, it is
89+
possible to monitor MPTCP connections, and even interact with them, e.g. by
90+
changing socket options per subflow. Later,
91+
[`mptcpify`](https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=ddba122428a7)
92+
BPF program has been added to force the creation of MPTCP sockets instead of TCP
93+
ones.
94+
95+
Until recently, these tests -- and the ones for the work-in-progress MPTCP BPF
96+
packet schedulers -- were not validated by our CI. We didn't track regressions
97+
in this area. With the help of Geliang, our CI scripts have been
98+
[adapted](https://github.com/search?q=repo%3Amultipath-tcp%2Fmptcp-upstream-virtme-docker+bpf&type=commits)
99+
to run these tests. Recently, I added a
100+
["matrix" support](https://github.com/multipath-tcp/mptcp_net-next/commit/71a9e1d223e484148778e2549adbf18a6abecf8a)
101+
on GitHub Action to be able to run these tests requiring more kernel config
102+
options in a dedicated runner.
103+
104+
### Virtme NG
105+
[Virtme](https://github.com/amluto/virtme/) is very useful to quickly run a VM
106+
with a custom kernel, and using the file system of the host (or in our case, the
107+
one of a container containing all required dependences). We have been using it
108+
since 2019, and we were happy with it.
109+
110+
In 2020, it looks like this Virtme project started to get unmaintained. In
111+
December 2022, we had to [patch it](https://github.com/amluto/virtme/pull/82) to
112+
support kernels >= 6.2. More recently, another
113+
[patch](https://github.com/amluto/virtme/pull/81) was required to support QEmu >=
114+
7.2. Andrea Righi started to gather different fixes on
115+
[his side](https://github.com/arighi/virtme/), before creating the
116+
[`virtme-ng` project](https://github.com/arighi/virtme-ng/) in 2023.
117+
118+
`virtme-ng` brings interesting features introduced in this nice
119+
[LWN article](https://lwn.net/Articles/951313/). Switching to it would reduce
120+
the boot time, and reduce a lot the I/O thanks to
121+
[`virtiofs`](https://virtio-fs.gitlab.io/). So that's what we did
122+
[recently](https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/commit/0c54a948e22669d265b4ef083080e0f0af3ffe6f). It should also help us for the long
123+
term maintenance.
124+
125+
### Tracking regressions
126+
127+
Since we use a public CI, results are simply published on an IRC channel
128+
([#mptcp-ci](https://web.libera.chat/?#mptcp-ci)). This is not really easy to
129+
track regressions.
130+
131+
[Publish Test Results](https://github.com/marketplace/actions/publish-test-results)
132+
GitHub Action has been added, but it doesn't keep a long history of results.
133+
134+
A new ["Flakes"](https://ci-results.mptcp.dev/flakes.html) has then been created
135+
to help us to track unstable tests. It is similar to
136+
[Netdev's Flakes](https://netdev.bots.linux.dev/flakes.html) page (with
137+
[dark scheme support](https://github.com/linux-netdev/nipa/pull/17) :) ).
138+
139+
It is a shame such service is not better integrated in GitHub Actions. In a
140+
perfect world where tests are all stable, it should not be needed. But here,
141+
when hosts need to talk to each other, packets can be delayed for some reason,
142+
causing retransmissions, etc. It is not easy to predict everything. The
143+
[cURL](https://curl.se/) project is using
144+
[TestClutch](https://github.com/dfandrich/testclutch/), but it is an external
145+
service to deploy, and it doesn't support the TAP format yet.
146+
147+
## What's next?
148+
149+
Big work has been started to rewrite [mptcp.dev](https://www.mptcp.dev) website.
150+
When working on adding native MPTCP support to apps like
151+
[lighttpd](https://github.com/lighttpd/lighttpd1.4/pull/132) and
152+
[curl](https://github.com/curl/curl/pull/13278), it was clear that a website
153+
gathering all required info to know about MPTCP to set it up, and to add its
154+
support in apps were missing. (*Note: our website was updated on the 18th of
155+
April, it was looking like
156+
[this](https://github.com/multipath-tcp/mptcp.dev/blob/531801e/README.md)
157+
before.*)
158+
159+
Publishing a doc in the kernel official documentation will also help end-users
160+
and app developers.
161+
162+
In terms of developments, the next priorities are adding
163+
[missing features](https://github.com/golang/go/issues/56539#issuecomment-1940486340)
164+
to have MPTCP enabled by default in Go.
165+
166+
167+
## Team work
168+
169+
As always, it is important to note that what I presented here so far is mostly
170+
what I was working on. But I'm not alone in this project. For example, Geliang
171+
continued to do some clean-ups in the KSelfTests, looked at the MPTCP
172+
support in [IPerf3](https://github.com/esnet/iperf/pull/1661), and started to
173+
look at adding "last time" counters in `MPTCP_INFO`. Mat and Paolo helped with
174+
the reviews, and Christoph looked at running fuzzing tests on top of the last
175+
RHEL kernel.
176+
177+
A great community!

0 commit comments

Comments
 (0)