SPDX-FileCopyrightText | SPDX-License-Identifier | title | author | footer | description | keywords | color | class | style | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
© 2024 Menacit AB <[email protected]> |
CC-BY-SA-4.0 |
HTTP explained |
Joel Rangsmo <[email protected]> |
© Course authors (CC BY-SA 4.0) |
Introduction to wonderful world of HTTP |
|
#ffffff |
|
section.center {
text-align: center;
}
|
Our modern world runs on the Hypertext Transfer Protocol, but how does it really work?
Why is HTTPS a thing?
Where do "proxies" and "load balancers" come into the picture?
After this presentation, you should feel comfortable answering these questions! :-D
(Relatively) simple protocol for client (AKA "user agent") to server communication.
Introduced together with HTML in 1989 to serve files over the network.
Three major protocol versions exist, with the latest being release in 2022.
Basic serving of static files.
A client request for http://example.com/animals/horse.html would simply load the contents of /var/www/html/animals/horse.html from the server's file system and transfer it to the client.
People wanted to use the web to provide interactive applications, such as online shopping malls.
Lowered the bar for adoption significantly, as users didn't have to install/update additional software on their computers.
Instead of just serving static files from disk, the server would generate dynamic responses on-the-fly.
A client request for http://example.com/weather.cgi?city=Gnarp may resulted in the following response being generated and return by the server:
<html>
<head>
<title>Weather now in Gnarp</title>
</head>
<body>
<p>
The current (19:06) temperature
in <b>Gnarp</b> is
19 degrees celsius.<br>
It is raining! :-(
</p>
</body>
</html>
These days HTTP isn't only used to serve HTML data to web browsers, but for a wide variety of client-server communication needs.
Liked by developers for its simplicity and widespread support in programming languages/toolkits.
If it's so damn simple, can't you just get to it?!
Waow, chill - I shall!
Just one more thing...
Applications are typically given Uniform Resource Locators to known where they should send requests.
http://www.example.com/cocktails.txt tells the client to use the HTTP protocol, connect to the host address "www.example.com" and request the server path "/cocktails.txt".
irc://chat.example.com/the_corner_bar tells the client to use the Internet Relay Chat protocol, connect to the host address "chat.example.com" and join a chat room named "the_corner_bar".
Not so complicated, right?
httpː//bob:[email protected]:1234 ↴ /about+us/faq?lan=en&s=Q%26A#q:Refund
httpː//bob:[email protected]:1234 ↴ /about+us/faq?lan=en&s=Q%26A#q:Refund
Protocol, also known as "scheme".
Commonly "http" or "https".
httpː//bob:s3cret@t.example.com:1234 ↴ /about+us/faq?lan=en&s=Q%26A#q:Refund
Optional username and password for
authentication, separated by colon.
May be omitted and not considered
best-practice.
httpː//bob:s3cret@t.example.com:1234 ↴ /about+us/faq?lan=en&s=Q%26A#q:Refund
Target server network address.
Either host name, commonly resolved
by client using DNS, or IP address.
httpː//bob:[email protected]:1234 ↴ /about+us/faq?lan=en&s=Q%26A#q:Refund
Target port for connection to server.
If ommited, the default port is used:
HTTP version 1 and 2: 80/TCP
HTTPS: 443/TCP
HTTP version 3: 443/UDP
httpː//bob:[email protected]:1234 ↴ /about+us/faq?lan=en&s=Q%26A#q:Refund
Data path that client should request*
from the server.
The plus character is converted to space.
Other characters with special meaning in
URL path may be "percentage encoded":
%20 = Space, %2F = /, %26 = &, %25 = %...
httpː//bob:[email protected]:1234 ↴ /about+us/faq?lan=en&s=Q%26A#q:Refund
Base path.
Similar to a file system path.
Doesn't require file extension,
like ".html" or ".jpeg", other methods
exist for communicating response format.
httpː//bob:[email protected]:1234 ↴ /about+us/faq?lan=en&s=Q%26A#q:Refund
Optional "query string".
Key-value pairs, separated by
ampersand (or less commonly semicolon).
Commonly used to pass data to server
as input for generation of
dynamic responses.
httpː//bob:[email protected]:1234 ↴ /about+us/faq?lan=en&s=Q%26A#q:Refund
Optional "fragment".
Part of the URL that is never actually
in requests to the server, but may be
interpreted by the client application.
Commonly used for high-lighting text,
passing client-side secrets, etc.
Properly (and consistently) grokking URLs seems tricky for both humans and computers.
Where will we end up using httpː//chat.fb.com:1814/[email protected] , httpː//googIe.com or httpː//facebοοk.com ?
Loosely defined/interpreted standards have resulted in many security issues.
With that out of the way, let's examine the HTTP protocol!
HTTP version 1 is a text-based protocol.
Makes it simple to learn, debug and implement.
A request is sent by the client, resulting in a response being returned by the server.
<METHOD> <PATH> HTTP/1.1
Host: <TARGET HOST NAME OR IP ADDRESS>
<OPTIONAL HEADER NAME>: <HEADER VALUE>
<OPTIONAL BODY>
GET /cocktails.txt HTTP/1.1
Host: www.example.com
DELETE /api/user/42 HTTP/1.1
Host: management.example.com
Authorization: Basic Ym9iOnMzY3JldA==
Ym9iOnMzY3JldA== is "bob:s3cret" encoded using Base64.
POST /guest_book.php HTTP/1.1
Host: social.example.com
Content-Type: application/json
Content-Length: 51
{
"author": "adam",
"message": "Hello Eve!!!"
}
HTTP/1.1 <STATUS CODE> <STATUS MESSAGE>
<OPTIONAL HEADER NAME>: <HEADER VALUE>
<OPTIONAL BODY>
HTTP/1.1 204
- Informational (100 – 199)
- Successful (200 – 299)
- Redirection (300 – 399)
- Client error (400 – 499)
- Server error (500 – 599)
- 200: Informational: OK
- 204: Informational: No content
- 301: Redirection: Moved permanently
- 400: Client error: Bad request
- 401: Client error: Unauthorized
- 404: Client error: Not found
- 500: Server error: Internal server error
- 503: Server error: Bad gateway
...and of course 418:
The HTTP 418 ("I'm a teapot") status response code indicates that the server refuses to brew coffee because it is, permanently, a teapot.
— MDN web docs
HTTP/1.1 500 Wooops
X-Server: Example HTTPD v0.2
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 67
Top three coctails:
1. Caipirinha
2. White Russian
3. Bloody Mary
Doesn't seem too tricky.
Let's hack together our own client and server using Netcat!
HTTP is a "clear-text" protocol.
Communication can be intercepted (and modified) anywhere between the server and the client.
HTTPS was created to wrap HTTP in a layer of encryption.
Relies on both symmetric and asymmetric cryptography.
Let's jump into Menacit's "Practical cryptography course"!
An HTTP proxy is a piece of software acting as both a server and a client at the same time.
Can be used to filter, redirect and manipulate HTTP requests from clients.
Commonly used to restrict egress communication on a network or provide some client anonymity.
Commonly used to restrict or redirect client requests (ingress) to one or more servers.
Forwards traffic to multiple servers, distributing the load.
Can monitor status of servers and exclude them as targets if they become unhealthy.
All* HTTP load balancers are reverse proxies, but not all reverse proxies are load balancers.
Introduced back in 2015, first major change since 1997.
Still uses the same verbs, status codes, header/body concepts - but no longer a simple text based protocol.
Features like multi-plexing, server-side push and header compression provides better performance/lower latency.
Huge resource savings for large web-site operators.
Standardized in 2022, support still being implemented in client/server/proxy software.
Abandons TCP in favor of the UDP-based transport protocol "QUIC".
Mandatory TLS-like encryption and further performance improvements.
For copy-pasteable
speaker notes, example code
and similar goodies, see:
%RESOURCES_DOMAIN%/http.zip.
Was anything unclear?
Got ideas for improvements?
Don't fancy the animals in the slides?
Create an issue or submit a pull request to the repository on Github!