I was invited by Daniel Stenberg to work with him on curl improvements sponsored by the Sovereign Tech Fund, an initiative of the German government to strengthen digital infrastructure and open source in the public interests. Daniel blogged about it.
Via this blog I try to give some updates on my ongoing work in this project, not least for transparency. This is deeply technical gobbledygook.
German news magazine Zeit has an article about the Sovereign Tech Fund and curl, of course in German. It gets most things right. And, importantly, it touches on the question if it is the government's job to strengthen otherwise unfincanced open source projects.
Unsurprisingly, I believe it is a good investment, if it chooses the projects based on good criteria. One such would be that a software in in wide use by the public. In my experience, a government is not well suited for seeding new projects. That would attract scammers and most certainly waste a lot of money. Financial investors are better at handling that.
We merged a large Pull Request yesterday (+6799,-4566 lines of code) that builds on the previously mentioned connection filters. What does it contain?
We rewrote the Happy Eyeballing (HE) code to make it more modular. HE is involved in making connections to a remote host. When you tell a HTTP library to talk to curl.se
, for example, there is more going on than most people are aware of. You know that you can only connect to IP addresses. So what is the IP address of curl.se
. Well, let's ask dig
:
> dig curl.se
curl.se. 3487 IN A 151.101.129.91
curl.se. 3487 IN A 151.101.193.91
curl.se. 3487 IN A 151.101.1.91
curl.se. 3487 IN A 151.101.65.91
> dig curl.se -t AAAA
curl.se. 3476 IN AAAA 2a04:4e42::347
curl.se. 3476 IN AAAA 2a04:4e42:200::347
curl.se. 3476 IN AAAA 2a04:4e42:400::347
curl.se. 3476 IN AAAA 2a04:4e42:600::347
curl.se. 3476 IN AAAA 2a04:4e42:800::347
curl.se. 3476 IN AAAA 2a04:4e42:a00::347
curl.se. 3476 IN AAAA 2a04:4e42:c00::347
curl.se. 3476 IN AAAA 2a04:4e42:e00::347
Oh! There are 12 IP addresses, 4 for IPv4 and 8 for IPv6. Trying just one after the other, the naive approach, is no good. The likelihood of one working will differ between address families. If some box between you and curl.se
screws up IPv4, you'll waste time trying those first four addresses.
Also, packets may take completely different paths through the internet, depending on the address family. It can happen that both will work, but one is much faster than the other. So curl
(and also your browser btw.) will make 2 attempts in parallel, one on IPv4 and one on IPv6. The one who connects first is the winner and the other one is shut down again.
Happy Eyeballing has been in curl for a long time. So why rewrite it?
Curl needs this strategy also for QUIC connections (the base protocol for HTTP/3) and QUIC works slightly different. The rewrite allows the Happy Eyeballing code to no longer care if a connection is made via TCP, UDP or QUIC. And the modular design means we can use testing stubs here too, for asserting that the timeout/attempts in HE work as needed.
In the near future, we will add what I would call ALPN Eyeballing. When you want a HTTP connection using either http/1.1
, h2
or h3
. Curl will then start one TCP+TLS and one QUIC attempt in parallel and select a winner. And both the TCP and the QUIC attempt will use Happy Eyeballing potentially. So curl might have 4 parallel IP connects ongoing at a time.
This was beyond what the former implementation could do, but now we can implement that.
The HTTP/2 implementation using nghttp2 was moved into a connection filter. Same for all HTTP/3 backends: ngtcp2, quiche and msh3. This means that curl's HTTP protocol handler cares less (read: has less complexity) about the specifics of a particular HTTP version. The installed http filter takes care of that.
This also introduces multiplexing, e.g. the handling of more that one request on a connection, for HTTP/3. For a new request, curl queries its existing connections if they have room for another one. A HTTP/1.1 connection can only do one request at a time, but HTTP/2 and 3 can do more, depending on what the server allows.
The connection handling now queries that at its filters. Dynamic querying this is important, since this request limit may change during the lifetime of a connection. For example, a server shutting down gracefully, will tell its client that it no longer accepts new requests. The number of ongoing requests becomes the new limit, going down to 0 with each finished request.
The HTTP/3 implementations are still experimental and for a reason. While we now do multiplexing on them, there is more work to be done to handle edge cases, general fairness, graceful shutdown, etc. Which will occupy is in the coming weeks.
One area where work has already started is the expansion of curl's test suite. Curl has a large base of ~1600 test cases, covering all protocols. These allow us to do the refactoring of its internals with high confidence. They run in various combinations, so the number alone is a little deceiving.
To make them rely on little else and run almost on any platform, the tests use self-written stub servers and some generally available facilities such as stunnel
. One thing they do lack is support for real parallel processing. Which is important for HTTP/2 and HTTP/3.
So, we bring in Apache httpd and nghttpx as servers to run a new, separate test suite with. Other servers might follow. We start with these for the following reasons:
- These tests focus on curl's parallel processing. They do not need to run on all platforms. On many linux distributions the
apache2-dev
package gives us all we need, ready made. - Its capability to dynamically load 3rd party modules allows us to inject our code into the server. We will trigger faulty behaviour and delays this way. We need to verify how curl handles those.
- I know Apache httpd quite well, having written Apache's HTTP/2 implementation and other parts.
- nghttpx has an excellent HTTP/3 implementation and its author is highly skilled and responsive.
We have a skeleton of the test suite running. We can check how hundreds of requests are handled and how connections are used. We already see some areas where we need to improve curl's implementation.
I am highly confident that by expanding this we will be able to move HTTP/3 out of its experimental status in the near future.
Wishing all of you a good start into 2023!