topics include
- principles of network applications
- web and http
- e-mail, smtp, imap
- the domain name system dns
- p2p applications
- video streaming and content distribution networks
- socket programming with udp and tcp
-
coneptual and implementation aspects of application layer protocols
- transport layer service models
- client server paradigm
- peer to peer paradigm
-
learn about protocols by examining popular application layer protocols and infrastructure
- http - hyper text transfer protocol
- smtp - simple mail transfer protocol
- imap - internet message access protocol
- dns - domain name system
- cdns - content delivery network services
-
programming network applications
- socket api
- social networking
- web
- text message
- multi user network games
- streaming stored video
- p2p file sharing
- voice over ip
- real time video conferencing
- internet search
- remote login
write programs that
- run on different end systems
- communicate over network
- e.g. web server software communicates with browser software
no need to write software for network core devices
- network core devices do not run user applications
- applications on end systems allows for rapid app development propagation
server
- always on host
- permanent ip address
- often in data centers for scaling
clients
- contact, communciate with server
- may be intermittently connected
- may have dynamic ip addresses
- do not communicate directly with each other
- examples: http, imap, ftp
- no always on server
- arbitrary end systems directly communciaye
- peers request service from other peers, provide service in return to other peers
- self scalability - new peers bring new service capacity, as well as new service demands
- peers are intermittently connected and change ip addresses
- complex management
- example: p2p file sharing such as bittorrent
process: program running within a host
- within same host, two processes communciate using inter-process communcation defined by the operating system
- processes in different hosts communicate by exchanging messages
clients, servers
-
client process: process that initiates communication
-
server process*: process that waits to be contacted
-
note: applications with p2p architectures have client processes & server processes
- process sends/receives messages to/from socket
- socket analogous to door
- sending process shoves messages out the door
- sending process relies on transport infrastructure on other side of door to deliver message to socket at receiving process
- two sockets involved: one on each side
-
to receive messages, process must have an identifer
-
host device has a unique 32-bit ip address
-
question: does ip address of host on which process runs suffice for identifying the process?
- answer: no many processes can be running on same host
-
identifier includes both ip address and port number associated with process on host
-
example port numbers
- http server: 80
- mail server: 25
-
to send http messages to
gaia.cs.umass.edu
web server- ip address: 128.119.245.12
- port number: 80
-
types of messages exchanged
- request, response
-
message syntax
- what fields in messages & how fields are delineated
-
message semantics
- meaning of information in fields
-
rules
- for when and how processes send & response to messages
open protocols
- defined in request for comments (rpcs), everyone has access to protocol definition
- allow for interoperability
- e.g. https, smtp
proprietary protocols
- e.g. skype, zoom
data integrity
- some apps (e.g. file transfer, web transactions) require 100% reliable data transfer
- other apps (e.g. audio) can tolerate some loss
timing
- some apps (e.g. internet telephony, interactive games) require low delay to be effective
throughput
- some apps (e.g. multimedia) require minimum amount of throughput to be effective
- other apps "elastic apps" make use of whatever throughput they get
security
- encryption, data integrity
application | data loss | throughput | time sensitive? |
---|---|---|---|
file transfer/download | no loss | elastic | no |
no loss | elastic | no | |
web documents | no loss | elastic | no |
real-time audio/video | loss-tolerant | audio: 5kbps - 1mbps, video: 10 kbps - 5mbps | yes, 10's msec |
streaming audio/video | loss-tolerant | audio: 5kbps - 1mbps, video: 10 kbps - 5mbps | yes, few sec |
interactive games | loss-tolerant | kbps+ | yes, 10' msecs |
text messaging | no loss | elastic | yes and no |
tcp service
- reliale transport between sending and receiving process
- flow control: sender won't overwhelm receiver
- congestion control: throttle sender when the network is overloaded
- connection-oriented: setup required between client and server processes
- does not provide: timing, minimum throughput guarantee, security
udp service
- unreliable data transfer between sending and receiving process
- does not provide: reliability, flow control, congestion control, timing, throughput guarantee, security, or connection setup
application | application layer protocol | transport protocol |
---|---|---|
file transfer/download | ftp | tcp |
smtp | tcp | |
web documents | http | tcp |
internet telephony | sip, rtp, or proprietary | tcp or udp |
streaming audio/video | http, dash | tcp |
interactive games | wow, fps, proprietary | udp or tcp |
vanilla tcp & udp sockets
- no encryption
- cleartext password sent into socket traverse internet in cleartext
transport layer security tls
- provides encrypted tcp connections
- data integrity
- end-point authentication
tls implemented in application layer
- apps use tls libraries, that use tcp in turn
- cleartext sent into socket traverse internet encrypted
-
web page consists of objects, each of which can be stored on different web servers
-
object can be html file, jpeg image, java applet, audio file,...
-
web page consists of base html-file which includes several reference objects, each addressable by url
-
www.someschool.edu/somedept/pic.gif
-
hostname/pathname/pathname
http: hypertext transfer protocol
- web application layer protocol
- client/server model
- client: browser that requests, receives, (using http protocol) and "displays" web objects
- server: web server sends (using http protocol) objects in response to requests
http uses tcp
- client initiates tcp connection, creates a socket to server on port number 80
- server accepts tcp connection from client
- http messages (application-layer protocol essages) exchanged between browser (http client) and web server (http server)
- tcp connection closed
http is stateless
- server maintains no information about past client requests
protocols that maintain state are complex
- past history (state) must be maintained
- if server/client crashes, their views of state may be inconsistent, must be reconciled
non-persistent http
- tcp connection opened
- at most one object sent over tcp connection
- tcp connection closed
downloading multiple objects request multiple connections
persistent http
- tcp connection opened to a server
- multiple objects can be sent over a single tcp connection between client, and that server
- tcp connection closed
user enters url: www.school.edu/department/home.index
-
http client initiates tcp connection to http server (process) at
www.school.edu
on port80
-
http server at host
www.school.edu
waiting for tcp connection at port80
accepts connection, notifying client -
http client sends http request message containing url into tcp connection socket. message indicates that client wants object
department/home.index
-
http server receives request message, forms response message containing request object, and sends message into its socket
-
http server closed tcp connection
-
http client receives response message containing html file, displays html. parsing html file, finds 10 referenced jpeg objects
-
steps 1 - 6 repeated for each of 10 jpeg objects
rtt
- time for a small packet to travel from client to server and back
http response time per object
- one rtt to initiate tcp connection
- one rtt for http request and first few bytes of http response to return
- object/file transmission time
non-persistent http response = 2rtt + file transmission time
non-persistent http issues
- requires 2 rtts per object
- os overhead for each tcp connection
- browsers often open multiple parallel tcp connections to fetch referenced objects in parallel
persistent http
- server leaves connection open after sending response
- subsequent http messages between same client/server sent over open connection
- client sends requests as soon as it encounters a referenced object
- as little as one rtt for all the referenced objects cutting the response time in half
- two types of http messages: request, response
- http request message
- ascii (human readable format)
GET \index.html HTTP/1.1\r\h
Host: www-net.cs.umas.edu\r\n
User-Agent: Mozilla/5.0 (macintosh; Intel mac OS X
10.15; rv:80.0) Gecko/20100101 Firefox/80.0 \r\n
Accept: text/html, application/xhtml+xml\r\n
Accepting-Language: en-us, en; q=0.5\r\n
Accept-Encoding: gzip, deflate\r\n
Connection: keep-alive\r\n
\r\n
request line (GET
, POST
, HEAD
commands)
GET \index.html HTTP/1.1\r\h
carriage return character \r
line-feed character \n
header lines
Host: www-net.cs.umas.edu\r\n
User-Agent: Mozilla/5.0 (macintosh; Intel mac OS X
10.15; rv:80.0) Gecko/20100101 Firefox/80.0 \r\n
Accept: text/html, application/xhtml+xml\r\n
Accepting-Language: en-us, en; q=0.5\r\n
Accept-Encoding: gzip, deflate\r\n
Connection: keep-alive\r\n
\r\n
carriage return, line feed at start of line indicates end of header lines
\r\n
method | sp | url | sp | version | cr | if
header field name | | value | cr | if |
~ |
~ |
header field name | | value | cr | if |
cr | if |
~ entire body |
~ |
request line method | sp | url | sp | version | cr | if
header lines
header field name | | value | cr | if |
~ |
~ |
header field name | | value | cr | if |
body
~ entire body |
~ |
POST
method
- web page often includes form input
- user input sent from client to server in entity body of
HTTP
POST
request message
GET
method
- for sending data to server
- include data in url field of
HTTP GET
request message following a?
www.somesite.com/animalsearch?monkey&banana
HEAD
method
- requests headers only that would be returned if specified url were requested with an
HTTP GET
method
PUT
method
- uploads new file (object) to server
- completely replaces file that exists at specified url with conent in entity body of
POST HTTP
request method
HTTP/1.1200 OK
Date: Tue, 08 Sep 2020 00:53:20 GMP
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips
PHP/7.4.9 mod_perl/2.0.11 Perl/v5.16.3
Last-Modified: Tue, 01 Mar 2016 18:57:50 GMT
ETag: "a5b-52d015789ee9e"
Accept-Ranges: bytes
Content-Length: 2651
Content-Type: text/html; charset=UTF-8
\r\n
data data data data data ...
status line (protocol status code status phrase)
HTTP/1.1 200 OK
header lines
Date: Tue, 08 Sep 2020 00:53:20 GMP
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips
PHP/7.4.9 mod_perl/2.0.11 Perl/v5.16.3
Last-Modified: Tue, 01 Mar 2016 18:57:50 GMT
ETag: "a5b-52d015789ee9e"
Accept-Ranges: bytes
Content-Length: 2651
Content-Type: text/html; charset=UTF-8
\r\n
data data data data data ...
data, e.g., requested HTML file
...
status code appears in 1st line in server-to-client response message
- 200 OK request succeeded, requested object later in this message
- 301 moved permanently requested object moved, new location specified later in this message in location: field
- 400 bad request request message not understood by server
- 404 not found requested document not found on this server
- 505 http version not supported
websites and client browser use cookies to maintain some state between transactions
four components:
- cookie header line of http response message
- cookie header line in next http request message
- cookie file kept on user's host, managed by user's browser
- back-end database at web site
what can cookies be used for:
- authorization
- shopping carts
- recommendations
- user session state (web email)
challenge: how do you keep state?
- at protocol endpoints: maintain state at sender/receiver over multiple transactions
- in messages: cookies in http messages carry state
cookies and privacy
- cookies permit sites to learn a lot about you on their site
- third party persistent cookies (tracking cookies) allow common identity (cookie value) to be tracked across multiple web sites
cookies can be used to
- track user behavior on given website (first party cookies)
- track user behavior across multiple websites (third party cookies) without user ever choosing to visit tracker site
- tracking may be invisible to user
- rather than displaying ad triggering
HTTP GET
to tracker, could be invisible link
- rather than displaying ad triggering
third party tracking via cookies:
- disabled by default in firefox, safari browsers
- to be disabled in chrome browser by the end of 2024
goal: satisfy client requests without involving origin server
- user configures browser to point to a local web cache
- browser sends all http requests to cache
- if object in cache: cache returns object to client
- else cache requests object from origin server, caches received object, then returns object to client
- web cache acs as both client and server
- server for original requesting client
- client to origin server
- server tells cache about object's allowable caching in response header
cache-control: max-age=<seconds>
cache-control: no-cache
why web caching
- reduces response time for client request
- cache is closer to client
- reduces traffic on an institution's access link
- internet is dense with caches
- enables poor content providers to more effectively delivery conetnt
scenario
- access link rate: 1.54 mbps
- rtt from institutional router to server: 2 sec
- web object size: 100k bits
- avg request rate browsers to origin servers: 15 sec
- average data rate to browsers: 1.50 mbps
performance
- access link utilization = 0.97
- lan utolization: 0.0015
- end-end delay
- = internet delay + access link delay + lan delay
- = 2 sec + minutes + u secs
scenario
- access link rate: 154 mbps
- rtt from institutional router to server: 2 sec
- web object size: 100k bits
- average request rate browser to origin servers: 15/sec
- average data rate to browsers: 1.50 mbps
performance
- access link utilization = 0.0097
- lan utilization = 0.0015
- end-end delay
- = internet delay + access link delay + lan delay
- 2 sec + milliseconds + u secs
cost: faster access link is expensive
suppose cache hit rate is 0.4:
-
40% requests served by cache, with low (msec) delay
-
60% requests satisfied at origin
-
rate to browsers over access link
- = 0.6 * 1.50 mbps
- = 0.9 mbps
-
access link utilization
- = 0.9 / 1.54
- = 0.58 means low (msec) queueing delay at access link
-
average end-end delay
- = 0.6 * (delay from origin servers)
- + 0.4 * (delay when satisfied at cache)
- = 0.6 (2.01) + 0.4 (~msecs)
- = ~1.2 seconds
lower average end-end delay than with 154 mbps link and cheaper too!
goal: dondoesnt send object if browser has up to date cached version
- no object transmission delay or use of network resources
- client: specify date of browser-cached copy in http request
if modified since date
- server: response contains no object if browser-cached copy is up to date
HTTP/1.0 304 Not Modified
key goal: decreased delay in multi-object http requets
http1.1: introduced multiple, piplined GETS over single tcp connection
- server responds in order (fcfs: first come first served scheduling) to
GET
requests - with fcfs, small object may have to wait for transmission (head-of-line (HOL) blocking) behind large objects
- loss recovery (retransmitting lost tcp segments) stalls object transmission
key goal: decreased delay in multi-obecjt http requests
http/2: increased flexibility at server in sending objects to clients
- methods, status codes, most header fields unchanged from http 1.1
- transmission order requested objects based on client-specified object priority
- push unrequested objects to client
- divide objects into frames, schedule frames to mitigate hol blocking
http 1.1: client requests 1 large object (e.g. video file) and 3 smaller objects
- objects delivered in order requested:
$O_2, O_3, O_4$ wait behind$O_1$
http 2: objects divided into frames, frame transmission is interleaved
- objects
$O_2, O_3, O_4$ delivered quickly,$O_1$ slightly delayed
http/2 over tcp connection means
- recovery from packet loss still stalls all object transmissions
- as in http 1.1, browsers have incentive to open multiple parallel tcp connections to reduce stalling, increase overall throughput
- no security over vanilla tcp connection
- http/3: adds security, per object error and congestion control (more pipelining) over udp
three major components
- user agents
- mail servers
- simple mail transfer protocol: smtp
user agent
- aka mail reader
- composing, editing, reading mail messages
- e.g. outlook, iphone mail client
- outgoing, incoming messages stored on server
mail servers
- mailbox contains incoming messages for user
- message queue of outgoing to be sent mail messages
smtp protocol between mail servers to send email messages
- client: sending mail server
- server : receiving mail server
- alice uses userger agent to compose an email message to bob
- alice's user agent sends message to her mail server using smtp; message is placed in a message queue
- client side of smtp at mail server opens tcp connection with bob's mail server
- smtp client sends alice's message over the tcp connection
- bob's mail server places the message in bob's mailbox
- bob invokes his user agent to read the message
- uses tcp to reliably transfer a email message from client (mail server is initiating connection) to sever, port 45
- direct transfer: sending server acting like a client to receiving server
- three phases of transfer
- smtp handshaking
- smtp transfer of messages
- smtp closure
- command/response interaction (like http)
- commands: ascii text
- response: status code and phrase
client <-----------------------------------> server
client <-initate tcp connection -----------> server
client <-----------tcp connected initiated-> server
client <--------------220------------------> server
client <--------------HELO------------------> server
client <--------------250 hello-------------> server
client <------smtp transfer-----------------> server
S: 220 hamburger.edu
C: HELO crepes.fr
S: 250 Hello crepes.fr, pleased to meet you
C: MAIL FROM: <[email protected]>
S: 250 [email protected]... Sender ok
C: RCPT TO: <[email protected]>
S: 250 [email protected] ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 hamburger.edu closing connection
comparison with http
- http: client pull, meaning you're pulling a file from a server
- smtp: client push, meaning you're pushing a file from a server
- both have ascii command/response interaction, status codes
- http: each object encapsulated in its own response message
- smtp: multiple objects sent in multipart message
- smtp uses persistent connections
- smtp requires message (header & body) to be in 7-bit ascii
- smtp server uses crlf.crlf (new line feed) to determine end of message
smtp: protocol for exchanging email messages, defined in rfc 5321 (like rfc 7231 defines http)
rfc 2822 defines syntax for an email message itself like html defines syntax for web documents
header lines
- to
- from
- subject
- body
- smtp: delivery storage of email messages to receiver's server
- mail access protocol: retrieval from server
- imap: internet mail access protocol: messages stored on server, impa provides retrieval, deletion, folders of stored messages on server
- http: gmail, hotmail, yahoo mail, etc. provides web-based interface on top of smtp to send, imap or pop to retrieve email messages
internet hosts, routers
- ip address (32 bit) used for addressing datagrams
- name, e.g.
www.wichita.edu
used by humans
domain name system dns
- distributed database implemented in hierarchy of many name server
- application layer protocol: hosts, dns servers communicate to resolve names (address/ name translation)
- note: core internet function, implemented as application layer protocol
- complexity at network's edge
question: how does one map between ip address and name, and vice versa?
dns services
- hostname to ip address translation
- host aliasing
- canonical alias names
- mail server aliasing
- load distribution
- relicated web servers: many ip addresses correspond to one name
question: why not centralize dns?
- single point of failure
- traffic volume
- distant centralized database
- maintenance
answer: it doesnt scale
- comcast dns servers alone: 600 billion dns queries/day
- akamai dns serves alone: 2.2 trillion dns queries/day
root dns servers
/ | \
.com servers .org servers .edu servers
/ \ \ / \
yahoo.com amazon.com pbs.org nyu.edu umass.edu
client wants ip address for www.amazon.com
1st approximation
- client queries root server to find
.com
dns server - client queries
.com
dns server to getamazon.com
dns server - client queries
amazon.com
dns server to get ip address forwww.amazon.com
- official, contact-of-last-resort by name servers that can not resolve name
- incredibly important internet function
- internet couldn't function without it
- dnssec - provides security authentication, message integrity
- icann internet corporation for assigned names and numbers manages root dns domain
- 13 logical root name servers worldwide each server replicated many time 200 servers in us
top level domain tld servers
- responsible for
.com
,.net
,.edu
,.aero
,.museumes
, and all top level country domains, e.g..cn
,.uk
- network solutions: authoritative registry for
.com
,.net
tld
dns distributed database storing resource records (rr)
- rr fromat:
(name, value, type, ttl)
type=A
name
is hostnamevalue
is ip address
type=NS
name
is domain (e.g. foo.com)value
is hostname of authoritative name server for this domain
type=CNAME
name
is alias name for some canonical real name- www.ibm is really servereast.backup2.ibm.com
value
is canonical name
type=MX
value
is name of smtp mail server associated withname
- cbr constant bit rate: video encoding rate is fixed
- vbr variable bit rate: video encoding rate changes as amount of spatial, temporal coding ranges
- examples of standards: include mpeg1, mpeg2, meg4
- mpeg1 (cd-rom) 1.5 mbps
- mpeg2 (dvd) 3-6 mbps
- mpeg4 (often used in internet) 64 kbps 12 mbps
main challenges:
- include server to client bandwidth will vary over time, with changing network congestion levels in house, access network, network code, video server
- packet loss, delay due to congestion will delay playout, or result in poor video quality
steps involves:
- video recorded (e.g. 30 frames/sec)
- video sent (network delay occured and is fixed)
- video received, played out at client (30 frames/sec)
streaming: at this time, client playing out early part of a video, while the server is still sending later part of the video.
- continuous playout constraint: during client video playout, playout timing must match original timing.
- but network delays are variable, so there will need to be a client-side buffer to match continuous playout constraint
- other challenges:
- client interactivity: pause, fast-forward, rewind, jump through video
- video packets may be lost, or retransmitted
- client side buffering and platout delay: compensate for network added delay, delay jitter
dash dynamic, adaptive streaming over http
server
- divides video file into multiple chunks
- each chunk is encoded at multiple different rates
- different rate encodings stored in different files
- files are replicated in various cdn (content distribution networks) nodes
- manifest file provides urls for different chunks, where a certain chuck is defined what timestamp it's located
client
- periodically estimates server to client bandwidth
- consulting manifest, requests one chunk at a time
- chooses maximum coding rate sustainable given current bandwidth
- can choose different coding rates at different points in time (depending on available bandwidth at time), and from different servers
- intelligence at client clients determines
- when to request a chunk (so that buffer starvation, or overflow does not occur)
- what encoding rate to request (higher quality when more bandwidth is available)
- where to request a chunk (can request from url server that is close to client or has high available bandwidth)
streaming video = encoding + dash + playout buffering
challenge: how to stream content (selected from millions of videos) to hundreds of thousands of simultaneous users?
-
option 1: single, large "mega server"... doesnt scale
- single point of failure
- point of network congestion
- long and possibly congested path to distant clients
-
option 2: store/serve multiple copies of videos at multiple geographically distributed sites (cdn)
- enter deep: push cdn servers deep into many access networks. where it's close to users. akamani is an example of 240,000 servers deployed in > 120 countries
- bring home: smaller number (10s) of larger clusters in pop's near access nets used by limelight.
- netflix: stores copies of content at its worldwide openconnect cdn nodes
- subscriber requests content, service provider returns manifest
- using manifest, client retrieves content at highest supportable rate
- may choose different rate or copy if network path is congested
- ott over the top of internet host to host communication as a service
- ott challenges: coping with a congested internet from the edge
- what content to play in which cdn node?
- from which cdn node to retrieve content? at which rate?
goal: learn how to build client/server applications that communicate using sockets
socket: door between application process and end-end transport protocol.
two socket types for two transport services
- udp user datagram protocol: unreliable datagram
- tcp transmission control protocol: reliable, byte stream-oriented
application example:
- client reads a line of characters (data) from its keyboard and sends data to server
- server receives the data and converts characters to uppercase
- server sends modified data to client
- client receives modified data and displays line on its screen
udp: no connection between client and server
- no handshaking before sending data
- sender explicitly attaches ip destination address and port # to each packet
- receiver extracts sender ip address and port # from received packet
udp: transmitted data may be lost or received out of order application viewpoint: udp provides unreliable transfer of groups of bytes ("datagram") between client and server processes
# include python's socket library
from socket import *
serverName = 'hostname'
serverPort = 12000
# create udp socket
clientSocket = socket(AF_INET, SOCK_DGRAM)
# get user keyboard input
message = input('Input lowercase sentence:')
# attach server name, port to message; sent into socket
clientSocket.sendto(message.encode(), (serverName, serverPort))
# read reply data (bytes) from socket
modifiedMessage, serverAddress = clientSocket.recvfrom(2048)
# print out received string and close socket
print(modifiedMessage.decode())
client.Socket.close()
from socket import *
serverPort = 12000
# create udp socket
serverSocket = socket(AF_INET, SOCK_DGRAM)
# bind socket to local port number 12000
serverSocket.bind(("", serverPort))
print('The server is ready to receive')
# loop forever
while True:
message, clientAddress = serverSocket.recvfrom(2048)
modifiedMessage = message.decode().upper()
serverSocket.sendto(modifiedMessage.encode(), clientAddress)
client must contact server
- server process must first be running
- server must have created socket that welcomes client's contact
client contacts server by
-
creating tcp socket, specifying ip address, port number of server process
-
when client creates socket: client tcp establishes connection to server tcp
-
when contacted by client, server tcp creates a new socket for the server process to communicate with that particular client
- allows server to talk with multiple clients
- client source port # and ip address used to distinguish clients