Advanced Computer Networks
TCP Congestion Control
Thanks to Kamil Sarac
What is congestion?
Increase in network load results in decrease
of useful work done
Different sources compete for resources inside
network
Why is it a problem?
Sources are unaware of current state of resource
Sources are unaware of each other
In many situations, this will result in decrease in
throughput
Source (congestion collapse)
1 1 0 -M
bps
Ethe
rnet Router Destination
1.5-Mbps T1 link
FD DI
ps
Source -Mb
100
2
Issues
How to deal with congestion?
pre-allocate resources so as to avoid
congestion (avoidance)
control congestion if (and when) it occurs
(control)
Two points of implementation
hosts at the edges of the network (transport
protocol)
routers inside the network (queuing discipline)
Underlying service model
best-effort data delivery
TCP Congestion Control
Idea
assumes best-effort network (FIFO or FQ routers)
each source determines network capacity for itself
uses implicit feedback
ACKs pace transmission (self-clocking)
Challenge
determining the available capacity in the first place
adjusting to changes in the available capacity
TCP Congestion Control
TCP sender is in one of two states:
slow start OR congestion avoidance
Three components of implementation
Original TCP (TCP Tahoe)
1. Slow Start
2. Additive Increase Multiplicative Decrease (AIMD)
3. Fast Retransmit
TCP Reno
3. Fast Recovery
TCP Vegas
Introduces Congestion Avoidance
TCP Congestion Control
Objective: adjust to changes in the available
capacity
New state variables per connection:
CongestionWindow and (slow start)threshold
limits how much data source has in transit
MaxWin = MIN(CongestionWindow,
AdvertisedWindow)
EffWin = MaxWin - (LastByteSent -
LastByteAcked)
Slow Start
Initial value: Set cwnd = 1
Note: Unit is a segment size. TCP actually is based on bytes
and increments by 1 MSS (maximum segment size)
The receiver sends an acknowledgement (ACK) for
each packet
Note: Generally, a TCP receiver sends an ACK for every
other segment.
Each time an ACK is received by the sender, the
congestion window is increased by 1 segment:
cwnd = cwnd + 1
If an ACK acknowledges two segments, cwnd is still
increased by only 1 segment.
Even if ACK acknowledges a segment that is smaller than MSS
bytes long, cwnd is increased by 1.
Does Slow Start increment slowly? Not really.
In fact, the increase of cwnd is exponential (why?)
Slow Start Example
The congestion
window size grows
segm ent 1
cwnd = 1
very rapidly ACK for segm
ent 1
For every ACK, we cwnd = 2 segm ent 2
segm ent 3
increase cwnd by 1
ents 2
ACK for segm
irrespective of the ACK for segm
ents 3
cwnd = 4
number of segm ent 4
segm ent 5
segments ACKed segm ent 6
segm ent 7
TCP slows down ents 4
ACK for segm
the increase of ACK for segm ents 5
ents 6
ACK for segm
cwnd when cwnd = 8 ACK for segm
ents 7
cwnd > ssthresh
Congestion Avoidance via AIMD
Congestion avoidance phase is started if cwnd
has reached the slow-start threshold value
If cwnd >= ssthresh then each time an ACK is
received, increment cwnd as follows:
cwnd = cwnd + 1/ cwnd
So cwnd is increased by one only if all cwnd
segments have been acknowledged.
Example of Slow Start/Congestion Avoidance
Assume that ssthresh = 8 cw nd = 1
cw nd = 2
14 cw nd = 4
12
10 ssthresh cw nd = 8
Cwnd (in segments)
8
6
4 cw nd = 9
2
0
0
6
t=
t=
t=
t=
Roundtrip times cw nd = 10
Responses to Congestion
So, TCP assumes there is congestion if it detects a
packet loss
A TCP sender can detect lost packets via:
Expiration of a retransmission timer
Receipt of a duplicate ACK (why?)
TCP interprets a Timeout as a binary congestion signal.
When a timeout occurs, the sender performs:
cwnd is reset to one:
cwnd = 1
ssthresh is set to half the current size of the congestion
window:
ssthresh = cwnd / 2
and slow-start is entered
Summary of TCP congestion
control
Initially:
cwnd = 1;
ssthresh =
advertised window size;
New Ack received:
if (cwnd < ssthresh)
/* Slow Start*/
cwnd = cwnd + 1;
else
/* Cong. Avoidance */
cwnd = cwnd + 1/cwnd;
Timeout:
/* Multiplicative decrease */
ssthresh = cwnd/2;
cwnd = 1;
Fast Retransmit
If three or more 1K SeqNo=0
duplicate ACKs are
AckNo=1024
received in a row, the TCP 1K SeqNo=10
sender believes that a
24
1K SeqNo=204
segment has been lost. 8
1K SeqNo=30
72
AckNo=1024
Then TCP performs a
retransmission of what AckNo=1024
seems to be the missing 1K SeqNo=10
24
segment, without waiting
for a timeout to happen. AckNo=40 96
1K SeqNo=40
96
Enter slow start:
ssthresh = cwnd/2
cwnd = 1
Flavors of TCP Congestion
Control
TCP Tahoe (1988, FreeBSD 4.3 Tahoe)
Slow Start
Congestion Avoidance
Fast Retransmit
TCP Reno (1990, FreeBSD 4.3 Reno)
Fast Recovery
New Reno (1996)
SACK (1996)
RED (Floyd and Jacobson 1993)
TCP Reno
Duplicate ACKs:
Fast retransmit
Fast recovery
Fast Recovery avoids slow start
Timeout:
Retransmit
Slow Start
TCP Reno improves upon TCP Tahoe when a single
packet is dropped in a round-trip time.
Fast Recovery
Fast recovery avoids slow 1K SeqNo=0
start after a fast
retransmit AckNo=1024
1K SeqNo=10
24
Intuition: Duplicate ACKs 1K SeqNo=204
8
indicate that data is
getting through AckNo=1024
1K SeqNo=30
72
After three duplicate
ACKs set: AckNo=1024
Retransmit lost packet 1K SeqNo=10
24
1K SeqNo=4
096
AckNo=4069
On packet loss detected
by 3 dup ACKs:
ssthresh = cwnd/2
cwnd=ssthresh
enter congestion
avoidance
TCP Tahoe and TCP Reno
cwnd (for single segment losses)
Tahoe
time
Reno
cwnd
time
TCP CC
TCP New Reno
When multiple packets are dropped, Reno has problems
Partial ACK:
Occurs when multiple packets are lost
A partial ACK acknowledges some, but not all packets that are
outstanding at the start of a fast recovery, takes sender out of
fast recovery
Sender has to wait until timeout occurs
New Reno:
Partial ACK does not take sender out of fast recovery
Partial ACK causes retransmission of the segment
following the acknowledged segment
New Reno can deal with multiple lost segments without
going to slow start
SACK
SACK = Selective acknowledgment
Issue: Reno and New Reno retransmit at most 1 lost
packet per round trip time
Selective acknowledgments: The receiver can
acknowledge non-continuous blocks of data (SACK 0-
1023, 1024-2047)
Multiple blocks can be sent in a single segment.
TCP SACK:
Enters fast recovery upon 3 duplicate ACKs
Sender keeps track of SACKs and infers if segments are lost.
Sender retransmits the next segment from the list of
segments that are deemed lost.
Congestion Avoidance
TCPs strategy
control congestion once it happens
repeatedly increase load in an effort to find the point at
which congestion occurs and then back off
Alternative strategy
predict when congestion is about to happen
reduce rate before packets start being discarded
call this congestion avoidance, instead of congestion
control
Two possibilities
host-centric: TCP Vegas
router-centric: DECbit and RED Gateways
Congestion Avoidance in TCP
(TCP Vegas)
Idea: source watches for some sign that routers queue is
building up and congestion will happen; e.g.,
RTT grows
sending rate flattens 70
60
50
congestion B 40
K 30
window 20
10
s
p 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5
B Time (seconds)
K
g1100
nr
ie 900
sending dt
700
n 500
u
rate eo 300
Sr 100
n 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5
i Time (seconds)
e
z
i 10
s
buffer at e 5
bottleneck router u
e
u
Q 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5
Time (seconds)
Algorithm
Let BaseRTT be the minimum of all measured RTTs
(commonly the RTT of the first packet)
If not overflowing the connection, then
ExpectRate = CongestionWindow/BaseRTT
Source calculates sending rate (ActualRate) once
per RTT
Source compares ActualRate with ExpectRate
Diff = ExpectRate - ActualRate
if Diff < a
increase CongestionWindow linearly
else if Diff > b
decrease CongestionWindow linearly
else
leave CongestionWindow unchanged