Transport
Transport
Application Application
Transport
The Transport Layer: Transport
TCP and UDP
Network Network
Textbook
Go to web.speakup.info or
download speakup app
Join room
46045
Solution
Answer D
Formally, a server is a role at the transport layer, where the program
waits for requests to come.
In contrast, a client initiates communication to a server.
Reminder from 1st lecture: Port Numbers
• assigned by OS to identify processes within a host
• servers’ port numbers must be well-known to clients (e.g. 53 for DNS)
• src and dest port numbers are inside transport-layer header
Host Host
Ephemeral port IP addr=A IP network IP addr=B
dynamically Default port number
for any DNS server
assigned by OS DNS process
process
process process
process
process DNS
client qa ra sa sb rb qb server
1267 53
IP SA=A DA=B prot=UDP
UDP TCP source port=1267 TCP UDP
destination port=53
…data…
IP IP
IP header
UDP Source Port UDP Dest Port
IP packet UDP Message Length UDP Checksum UDP datagram
UDP payload (data)
The picture shows two processes (= network application programs) pa, and pb that are
communicating. Each of them is associated locally with a port, as shown in the figure.
The example shows a packet sent by the name resolver process at host A, to the domain name
server (DNS) process at host B. The UDP header contains the source and destination ports.
The destination port number is used to contact the name server process at B;
the source port is not used directly; it will be used in the response from B to A.
The UDP header also contains a checksum of the UDP data plus the IP addresses and packet
length. Checksum computation is not performed by all systems.
Ports are 16 bits unsigned integers. They are defined statically or dynamically. Typically, a server
uses a port number defined statically.
Standard services use well-known default port numbers; e.g., all DNS servers use port 53 (look
at /etc/services).
Ports that are allocated dynamically are called ephemeral. They are usually above 1024. If you
write your own client server application on a multiprogramming machine, you need to define your
own server port number and code it into your application.
1. UDP is message-oriented, and unreliable
• UDP delivers the exact message (a.k.a. “datagram”) or nothing
• Consecutive messages may arrive out of order application layer should handle these,
if necessary
• Messages may be lost
• One message, up to 65,535 bytes
• If UDP message is too large to fit into a single IP packet (i.e. larger than MTU),
then IP layer fragments it
- at the IP layer of the source, info about fragments added inside IP header
- not visible to the transport layer
- if a fragment/piece is lost then the entire message is considered lost
How is UDP implemented in practice?
client server
Via a socket library = programming interface
s=socket.socket() s=socket.socket();
- sockets in Unix are similar to files for read/write
2020:baba::b0b0 11.22.33.44
From 2001:face:b00c::1 From 1.2.3.4
• Default in Linux, must be enabled for every socket (with setsockopt) in Windows.
• An IPv4 socket cannot be dual-stack. Why?
Solution
It is possible to map IPv4 addresses to a subset of the IPv6 space
because IPv6 addresses are much longer in bits. The converse is not
possible: there are more IPv6 addresses than IPv4 addresses.
UDP datagrams
id=5 id=3 id=4 UDP datagrams are delivered to
sockets based on dest IP address and
port number:
• Socket 5 is bound to local address
IPv6 IPv4 IPv4 2001:baba::b0b0 and port 32456;
socket socket socket send/
receives all data to 2001:baba::b0b0
receive
udp port 32456
buffers UDP
S R S R S R • Socket 3 is bound to local address
11.22.33.44 and port 32456; receives
port=32654 all data to 11.22.33.44 udp port 32456
port=32456 port=32456
• Socket 4 is bound to local address
11.22.33.44 and port 32654; receives
IP all data to 11.22.33.44 udp port 32654
address=2001:baba::b0b0 address=11.22.33.44
A socket is bound to a single port and one or multiple IP addresses of the local host
User’s browser sends DNS query to DNS server, over UDP.
What happens if query or answer is lost ?
- TCP knows the allowable maximum segment size (MSS) and segments data
accordingly —> avoids fragmentation at the IP layer
TCP Basic Operation 1: SEQ and ACK
A 1 seq 8001:8500 B
deliver
2 ack 8501 bytes
seq 8501:9000 8001:8500
3
seq 9001:9500
4
seq 9501:10000
5
6 cumulative
Timeout ! 7 ack 8501 acks
seq 8501:9000 deliver
8
bytes
8501:9000
ack 9001
9
deliver seq 9501:10001
seq 9001:9500 bytes
10 9001:10000 has been received
The previous slide shows A in the role of sender and B of receiver.
• The application at A sends data in blocks of 500 bytes at a slow pace. So, TCP initially sends 500-byte
segments.
• However, the maximum segment size in this example is 1000 bytes. So, TCP may also merge 2 blocks
of data in one segment if this data happens to be available at the send buffer of the socket.
• Packets 3, 4 and 7 are lost.
• B returns an acknowledgement in the ACK field. The ACK field is cumulative, so ACK 8501 means: B is
acknowledging all bytes up to (excluding) number 8501. I.e. the ACK field refers to the next byte
expected from the other side.
• At line 8, the timer that was set at line 3 expires (A has not received any acknowledgement for the
bytes in the packet sent at line 3 and experiences a timeout). A re-sends data that is detected as lost,
i.e. bytes 8501:9001. When receiving packet 8, B delivers all bytes from 8501 to 9000 in order.
• When receiving packet 10, B can deliver bytes 9001:10000 because packet 5 was received and kept
by B in the receive buffer.
TCP Basic Operation 2: SACK and optimized segmentation (if possible)
1 seq 8001:8500 B
A
deliver
2 ack 8501 bytes
seq 8501:9000 8001:8500
3
seq 9001:9500 cumulative
4
seq 9501:10000
+
5 selective
6
ack
7 ack 8501 sack (9501:10001)
seq 8501:9500 deliver
8
bytes
8501:10000
2 data blocks
are merged, ack 10001
because here:
MSS = 1000
deliver
9
seq 10001:10500 bytes
10 10001:10500
TcpMaxDupACKs set to 1 at A
In addition to the ACK field, most TCP implementations also use the SACK field (Selective
Acknowledgement).
• At line 7, B acknowledges all bytes up to 8501 and in the range 9501:10001. Since the set of
acknowledged bytes is not contiguous, the SACK option is used. It contains up to 3 blocks that are
acknowledged in addition to the range described by the ACK field.
• At line 8, A detects that the bytes 8501:9501 were lost and re-sends them ASAP without waiting for a
timeout, because in this example host A uses TcpMaxDupACKs = 1 (we will discuss TcpMaxDupACKs
later). What is important to notice is that at line 8, since the maximum segment size is 1000 bytes, only
one packet is sent. This is what the slide’s title means by “optimized segmentation”.
• When receiving packet 8, B can deliver bytes 9001:10001 because packet 5 was received and kept in
the receive buffer.
TCP receiver uses a receive buffer = re-sequencing buffer to
store incoming packets before delivering them to application
Why invented ?
• Application may not be ready to consume/read data
• Packets may need re-sequencing; out-of-order data is stored but is not visible to application
8001:8500
Can be read
(received)
8001:8500 by app
9501:10000 Invisible to app
(cannot be read)
8001:10000
TCP uses a sliding window
Why? Receive
P0 Buffer
• The receive buffer may overflow if one piece
P1
of data “hangs”
P2 A1
- multiple losses affect the same packet, P1
A2
- so, multiple out-of-order packets fill the buffer P1 P2
P1 P2 ... Pn+1
How does the sliding window work?
Suppose:
Window size = 4000B;
each segment =1000B
Join room
46045
Retransmission Resequencing
Solution Buffer Buffer
S=0
0;
S=1
0; 1 A=-1, SACK =1
S=2
0; 2 1
A=-1,
S=3 SACK =1-2
0; 2; 3 1; 2
𝑡1 A=-1, SACK =1-3
1; 2; 3
0;
S=0 deliver
0 ... 3
0; A=3 0;1;2;3
𝑡2
S=4
4; deliver 4
4
Answer B.
The window size is 4’000 B, namely here 4 packets.
At time 𝑡1 packets -1, 1, 2 and 3 are acked. The window is packets [0 ; 3]. Packet 4 is outside the window
and cannot be sent. It has to wait until the loss of packet 0 is repaired ( at time 𝑡2)
Sender also needs a buffer (“retransmission buffer”); its size is the window size.
Segments are removed from the resequencing/receive buffer when they are finally in-order and application
reads them.
A fixed-size window cannot prevent receive-buffer overflow
receive buffer
• In-order data still remains in Application reads
receive buffer, until it is
consumed by application
(typically using a socket
“read” or “receive”)
0 1 2 3 4 5 6 7 8 9 10 11 12
S=1
0 1 2 3 4 5 6 7 8 9 10 11 12
S=2
0 1 2 3 4 5 6 7 8 9 10 11 12
S=3
0 1 2 3 4 5 6 7 8 9 10 11 12 ack = 2, window = 4
0 1 2 3 4 5 6 7 8 9 10 11 12 S=4
0 1 2 3 4 5 6 7 8 9 10 11 12
S=5
0 1 2 3 4 5 6 7 8 9 10 11 12 ack = 4, window = 2
0 1 2 3 4 5 6 7 8 9 10 11 12 S=6
0 1 2 3 4 5 6 7 8 9 10 11 12 ack = 6, window = 0
ack = 6, window = 4
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 2 3 4 5 6 7 8 9 10 11 12
S=7
1 unit of data = 1000 bytes
1 packet = 1000 bytes
ack = -1, window = 2 -3 -2 -1
s.read()
01 S=0 -3 -2 -1
01 ack = 0, window = 2 -3 -2 -1 0
s.read()
ack = 0, window = 4 -3 -2 -1 0
12
1234
S=1
1234 S=2 -3 -2 -1 0 1
1234 S=3 -3 -2 -1 0 1 2
1234 ack = 2, window = 4 s.read()
-3 -2 -1 0 1 2
3456 S=4 -3 -2 -1 0 1 2 3
3456
S=5 -3 -2 -1 0 1 2 3 4
3456 ack = 4, window = 2
S=6 -3 -2 -1 0 1 2 3 4 5
3456
56 -3 -2 -1 0 1 2 3 4 5 6
ack = 6, window = 0 s.read()
ack = 6, window = 4 -3 -2 -1 0 1 2 3 4 5 6
7 8 9 10
7 8 9 10
S=7 free spaces in the buffer
data acked but not yet consumed
receive buffer size = 4000 bytes
1 unit of data, 1 packet = 1000 bytes
TCP Basic Operation, Putting Things Together
A 1 8001:8500(500) ack 101 win 6000 B
bytes
2 101:200(100) ack 8501 win 4000 ...:8500 are available
8501:9000(500) ack 201 win 14247 and consumed
3
9001:9500(500) ack 201 win 14247
4
9501:10000(500) ack 201 win 14247
5
6
(0) ack 8501 sack 9001:9501 win 3500
7 201:250(50) ack 8501 sack 9001:10001 win 3000
retransmission 8501:9000(500) ack 251 win 14247
8
after timeout bytes
8501:10000 are
9 251:400(150) ack 10001 win 2500 available
app
10 (0) ack 10001 win 4000 consumes
bytes
10001:10500(500) ack 401 win 14247 8501:10000
11
bytes
10001:10500
are available
The picture shows a sample exchange of messages. Every packet carries the
sequence number for the bytes in the packet; in the reverse direction, packets contain
the acknowledgements for the bytes already received in sequence. The connection is
bidirectional, with acknowledgements and sequence numbers for each direction. So
here A and B are both senders and receivers.
Acknowledgements are not sent in separate packets (“piggybacking”), but are in the
TCP header. Every segment thus contains a sequence number (for itself), plus an ack
number (for the reverse direction). The following notation is used:
firstByte”:”lastByte+1 “(“segmentDataLength”) ack” ackNumber+1 “win”
offeredWindowSise. Note the +1 with ack and lastByte numbers.
At line 8, A retransmits the lost data. When packet 8 is received, the application is not
yet ready to read the data.
Later, the application reads (and consumes) the data 8501:10001. This frees some
buffer space on the receiving side of B; the window can now be increased to 4000. At
line 10, B sends an empty TCP segment with the new value of the window.
Note that numbers on the figure are rounded for simplicity. In real examples we are
more likely to see non-round numbers (between 0 and 232 -1). The initial sequence
number is not 0, but is chosen at random.
If there’s no loss or reordering,
and on a link with capacity 𝒄
bytes/second, the min window
time
size required for sending at the
capacity is…
A. 𝑊𝑚𝑖𝑛 = 𝑅𝑇 𝑇 × 𝑐
𝑐
B. 𝑊𝑚𝑖𝑛 =
𝑅𝑇 𝑇
𝑅𝑇 𝑇
C. 𝑊𝑚𝑖𝑛 =
𝑐
Go to web.speakup.info or
D. None of the above download speakup app
active close
FIN, seq=u
fin_wait_1 close_wait
ack=u+1
fin_wait_2 application close:
Connection
FIN seq=v
last_ack
Release
time_wait
ack=v+1
closed
Before transmitting useful data, TCP requires a connection setup phase:
- used to agree on seq numbers and make sure buffers and window are initially empty
There are many more subtleties (e.g. how to handle connection termination, lost or
duplicated packets during connection setup, etc [see Textbook sections 4.3.1 and 4.3.2]
Recall: TCP connections involve only two hosts; routers in between are not involved.
IP header (20 or 40 B + options)
32 bits
flags meaning
NS used for explicit congestion notification
CWR used for explicit congestion notification
ECN used for explicit congestion notification Indicates the next
urg urgent ptr is valid
ack ack field is valid expected seq num
psh this seg requests a push from the other host
rst reset the connection
syn connection setup
fin sender has reached end of byte stream
The previous slide shows the TCP segment format.
• SYN and FIN are used to indicate connection setup and close. Each one uses one sequence number.
• Options may include the Selective ack (SACK) field, or the Maximum Segment Size (MSS), which is negotiated
during SYN-SYNACK phase—the negotiation of the maximum size for the connection results in the smallest value to
be selected.
• The checksum is mandatory.
• The NS, CRW and ECN bits are used for congestion control [see lecture on congestion control].
• The push bit can be used by the upper layer using TCP; it forces TCP on the sending side to create a segment
immediately. If it is not set, TCP may pack together several SDUs (=data passed to TCP by the upper layer) into one
PDU (= segment). On the receiving side, the push bit forces TCP to deliver the data immediately. If it is not set, TCP
may pack together several PDUs into one SDU. This is because of the stream orientation of TCP. TCP accepts and
delivers contiguous sets of bytes, without any structure visible to TCP. The push bit is used by Telnet after every end
of line.
• The urgent bit indicates that there is urgent data, pointed to by the urgent pointer (the urgent data need not be in
the segment). The receiving TCP must inform the application that there is urgent data. Otherwise, the segments do
not receive any special treatment. This is used by Telnet to send interrupt type commands.
• RST is used to indicate a RESET command. Its reception causes the connection to be aborted.
server S
TCP Sockets
client s1=socket.socket()
SYN ACK
At t=1, client can use the
connection to send or receive ACK
1
data on this socket 2
server S
A New Socket is client s1=socket.socket()
Created by accept() s=socket.socket();
At t=2, on server side, a new s1.bind(5003)
socket (conn) is created – will
be used by server to send or
receive data. s1.listen()
s.connect(S,5003)
SYN conn=s1.accept()
This example is simplistic: SYN ACK
client sends one message to 1
server and quits; ACK 2
server handles one client at a s.send(…)
time. conn.recv()
s.close()
conn.close()
The figure of the previous 2 slides shows toy client and servers. The client sends a string
of chars to the server which reads and displays it.
• socket(AF_INET,…) creates an IPv4 socket and returns a socket object if succesful
socket(AF_INET6,…) creates an IPv6 socket
• bind(5003) associates the local port number 5003 with the socket; the server must
bind, the client need not bind, a temporary port number is allocated by the OS
• connect(S,5003) associates the remote IP address of S and its port number with the
socket and sends a SYN packet
• send() sends a block of data to the remote destination
• listen() declares the size of the buffer used for storing incoming SYN packets;
• accept() blocks until a SYN packet is received for this local port number. It creates a
new socket (in pink) and returns the file descriptor to be used to interact with this new
socket
• recv() blocks until one block of data is ready to be consumed on this port number. You
must tell in the argument how many bytes at most you want to read. It returns a block of
bytes or raises an exception when the connection was closed by the other end.
A more practical server server S
client s1=socket.socket()
TCP Server uses parallel execution
s=socket.socket()
threads to handle several TCP connections s1.bind(5003)
+ to listen to incoming connections
s.send(…)
A TCP connection is identified by: conn.recv()
conn.recv()
conn.recv()
src IP addr, src port, dest IP addr, dest port conn.recv()
conn.recv()
conn.recv()
s.close() conn.recv()
conn.recv()
conn.close();
conn.close();
conn.close();
conn.close();
conn.close();
conn.close();
conn.close();
conn.close();
How the Operating System views TCP Sockets
Application program
App Connection
App App
Connection data requests
data data
id=3 requests id=4 id=5 id=6 id=7
port=32456 port=32456
address=
2001:620:618:1a6:3:80b2:9754:1 IP
address=128.178.151.84
IPv4 packets
IPv6 packets
MSS and segmentation
TCP, not the application, chooses how to segment data
TCP segments should not be fragmented at source
Modern OSs use TCP Segmentation Offloading (TSO): Segmentation is performed at the
network interface card NIC with hardware assistance (reduces CPU consumption of TCP)
Recap: TCP offers a streaming service
Sender side:
• data accumulates in send buffer until TCP decides to create a segment
Receiver side:
• data accumulates in receive buffer until put in order and application reads it
No boundaries between bytes: several small messages written by A’s app may
be received by B as a single segment—
and conversely, a single message written by A’s app may be received by B as
multiple segments;
➡ so, apps need to group bytes to messages (if needed)
A side effect is head of the line blocking: If one packet sent by A is lost, all data
following this packet is delayed until the loss is repaired
For which types of apps may TCP’s streaming
service be an issue? (multiple answers are fine)
A. an app using http/1, where we have one TCP connection per object
B. an app using http/2, where we have one TCP connection per website
C. a real time video streaming application that sends a new packet every msec
D. None
E. I do not know
Go to web.speakup.info or
download speakup app
Join room
46045
48
Solution
Answer F: (B and C) For http/2 with one single connection, head-of-the
line blocking can occur: if one packet is lost in the transfer of one
object of the page, the entire page download is delayed until the loss
is repaired.
Head-of-the line blocking may also occur for a real-time streaming app
and is probably even worse: with TCP, the loss of one packet delays all
subsequent packets until the loss is repaired, whereas the live
application might prefer to skip the lost packet and receive the most
recent one. Such an app should use UDP.
50
Why both TCP and UDP ?
Most applications use TCP rather than UDP, as this avoids re-inventing
error recovery in every application
But some applications do not need error recovery in the way TCP does it
(i.e. by packet retransmission)
For example: Voice applications / Sensor data streaming
Q. why ?
For example: an application that sends just one message, like name
resolution (DNS).
Q. Why ?
51
Why both TCP and UDP ?
Most applications use TCP rather than UDP, as this avoids re-inventing error
recovery in every application
But some applications do not need error recovery in the way TCP does it (i.e. by
packet retransmission)
For example: Voice applications / Sensor data streaming
Q. why ?
A. Delay is important for interactive voice, while packet retransmission may introduce too much
delay in some cases.
Sensor data streaming may send a new packet every few msecs, better to receive latest packet
than to repeat a lost one.
For example: an application that sends just one message, like name resolution
(DNS).
Q. Why ?
A. TCP sends several packets of overhead before one single useful data message. Such an
application is better served by a Stop and Go protocol at the application layer.
For example: multicast (TCP does not support multicast IP addresses)
52