Skip to content

Fields/Headers take a list of Vec<(String, String)> to create, but probably should take Vec<(String, List<String>)> #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
brendandburns opened this issue Mar 11, 2023 · 14 comments

Comments

@brendandburns
Copy link
Contributor

The new-entries function takes a list of (String, String) but the fields-get function returns a Vec(String)

Headers can have multiple values per the HTTP spec, so I think that new-entries should probably take Vec<(String, List<String>)>

@lukewagner
Copy link
Member

I wondered about this as well. One reason why perhaps not is: in the underlying protocols, iiuc, the fields are just a sequence of key/value pairs and thus if we cluster them by key, we're losing the protocol order which I could hypothetically imagine affecting some applications in corner cases. Also, for at least some common HTTP libraries (1,2), list-of-string-pairs is offered as the lower-level ground truth. Also, when you want to perform a higher-level mapping of a key to 0..N values, then the method you want is fields-get (allowing the host impl to use a backing hash-table (e.g.)). Thus, I think list<tuple<string,list<string>>> sits in a sort of no-man's land between the lower-level sequence (that I think is commonly used in the representation) and the higher-level map.

But that's just my understanding based on a partial knowledge of this space, so I'm happy to hear other thoughts too.

@pimterry
Copy link

the fields are just a sequence of key/value pairs and thus if we cluster them by key, we're losing the protocol order which I could hypothetically imagine affecting some applications in corner cases

This is notable because quite a few services (notably Cloudflare) use the details of this header ordering & header key case (Connection vs connection) for HTTP fingerprinting to detect types of client, and in many configurations they'll automatically block HTTP traffic based on that alone as a bot detection heuristic. Affects HTTP proxies, web crawlers, scraping tools, uncommon web browser implementations, etc etc - big chunks of the web become inaccessible via HTTP if you can't precisely control raw header order & casing.

Not something that most normal HTTP API clients need to care about much, but imo it's definitely worth having low-level tuple<string, string>[] APIs available everywhere to support these cases (for example Node.js has a headers map and rawHeaders array available on all HTTP messages, and writeHead(status, headers) accepts headers in both KV map & string array formats).

@brendandburns
Copy link
Contributor Author

@lukewagner my biggest feedback is that there is inconsistency between the way that you create a new headers object (string, string) and the value that is returned when you call get (vec). And then of course there's also an 'append' method.

I don't feel strongly, multi-header is definitely an edge case, but we should be consistent at least.

@lukewagner
Copy link
Member

Sorry, I think I'm missing your point. As I'm reading it, new-fields takes a list<tuple<string,string>> and fields-entries returns the same type, so we're symmetric in the case of bulk construction/consumption. The other methods are for finer-grained queries/mutations that don't require touching the complete list of fields.

@pchickey
Copy link
Contributor

I believe his argument is that new-fields should take a list<tuple<string,list<string>>>, a design change I am also in favor of

@lukewagner
Copy link
Member

Wouldn't that also lead to the same loss in control over ordering as if fields-entries returned a list<tuple<string,list<string>>>? There's also a related question of whether the caller of new-fields would be allowed to pass the same key multiple times in the outer list, and whether that's enforced and, if not, what happens (including in chaining scenarios). (As a side note: it's unfortunate we don't have a map type, which is complicated for a lot of the same reasons.)

@brendandburns
Copy link
Contributor Author

I took a random look at a common HTTP library (https://square.github.io/okhttp/4.x/okhttp/okhttp3/-headers/) and it looks like the way they solved this is that they have both a field-entry (which returns the first value) field-entries (which returns the list of values) as a way of making the API more explicit.

@lukewagner
Copy link
Member

Starting with what we have proposed now:

  • fields-entries: func(fields: fields) -> list<tuple<string,string>>
  • fields-get: func(fields: fields, name: string) -> list<string>

I'm guessing you're not asking for a variation on the first one that just returns the first value (hard to imagine that one being useful), but, rather, a variation on the second that just returns the first value for a given name (or none), so perhaps that suggests a name like:

  • fields-get-first: func(fields: fields, name: string) -> option<string>

@PiotrSikora
Copy link
Collaborator

I took a random look at a common HTTP library (https://square.github.io/okhttp/4.x/okhttp/okhttp3/-headers/) and it looks like the way they solved this is that they have both a field-entry (which returns the first value) field-entries (which returns the list of values) as a way of making the API more explicit.

Usually, such API is a result of fixing the issue of returning only the first value, while retaining backward compatibility at the API level, i.e. it's not a proper solution for a new API.

@brendandburns
Copy link
Contributor Author

I think that the fundamental issue is that most people's model of HTTP headers is a hash-map. This isn't correct, but it is the general mental model.

If we're going to push people towards the "list of tuples" mental model, then we need to be consistent, so I would prefer

  • fields-entries: func(fields: fields) -> list<tuple<string, string>>
  • fields-get: func(fields: fields, name: name, ix: int32) -> option<string>
  • fields-count: func(fields: fields) -> int32

I think the challenge with fields-get: func(fields: fields, name: string) -> list<string> paints a picture where it really looks like the structure is a hashmap.

Additionally given that the common (by far) case is a single (string, string) tuple for each header name, returning a list makes the programmer do extra work that is dissonant.

@brendandburns
Copy link
Contributor Author

fwiw golang also separates Header::Get (get the first value) from Header::Values (get all values)
https://pkg.go.dev/net/http#Header

On the other hand, dotnet only returns the list:
https://learn.microsoft.com/en-us/dotnet/api/system.net.http.headers.httpheaders.getvalues?view=net-8.0#system-net-http-headers-httpheaders-getvalues(system-string)

So maybe it doesn't matter :)

@dicej
Copy link
Collaborator

dicej commented Jun 8, 2023

Seems to me that new-fields should mirror fields-entries and accept a list<tuple<string,list<u8>>>.

@lukewagner
Copy link
Member

Ah yes, I think this was an omission in #29. I'll file a follow-up fix.

@lukewagner
Copy link
Member

#42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants