Congestion Control
Outline
Queuing Discipline
Reacting to Congestion
Avoiding Congestion
Spring 2003 CS 461 1
Issues
• Two sides of the same coin
– pre-allocate resources so at to avoid congestion
– control congestion if (and when) is occurs
Source
1 10-M
bps
Ethe
rnet Router Destination
1.5-Mbps T1 link
sF DDI
Source 0 -Mb p
10
2
• Two points of implementation
– hosts at the edges of the network (transport protocol)
– routers inside the network (queuing discipline)
• Underlying service model
– best-effort (assume for now)
– multiple qualities of service (later)
Spring 2003 CS 461 2
Framework
• Connectionless flows
– sequence of packets sent between source/destination pair
– maintain soft state at the routers
Source
1
Router Destination
1
Router
Source
2
Router
Destination
2
Source
3
• Taxonomy
– router-centric versus host-centric
– reservation-based versus feedback-based
– window-based versus rate-based
Spring 2003 CS 461 3
Evaluation
• Fairness
• Power (ratio of throughput to delay)
Throughput/delay
Optimal Load
load
Spring 2003 CS 461 4
Queuing Discipline
• First-In-First-Out (FIFO)
– does not discriminate between traffic sources
• Fair Queuing (FQ)
– explicitly segregates traffic based on flows
– ensures no flow captures more than its share of capacity
– variation: weighted fair queuing (WFQ)
• Problem?
Flow 1
Flow 2
Round-robin
service
Flow 3
Flow 4
Spring 2003 CS 461 5
FQ Algorithm
• Suppose clock ticks each time a bit is transmitted
• Let Pi denote the length of packet i
• Let Si denote the time when start to transmit packet i
• Let Fi denote the time when finish transmitting packet i
• F i = S i + Pi
• When does router start transmitting packet i?
– if before router finished packet i - 1 from this flow, then
immediately after last bit of i - 1 (Fi-1 )
– if no current packets for this flow, then start transmitting when
arrives (call this Ai)
• Thus: Fi = MAX (Fi -1 , Ai) + Pi
Spring 2003 CS 461 6
FQ Algorithm (cont)
• For multiple flows
– calculate Fi for each packet that arrives on each flow
– treat all Fi’s as timestamps
– next packet to transmit is one with lowest timestamp
• Not perfect: can’t preempt current packet
• Example
Flow 1 Flow 2
Flow 1 Flow 2 Output (arriving) (transmitting) Output
F=8 F = 10 F = 10
F=5 F=2
(a) (b)
Spring 2003 CS 461 7
TCP Congestion Control
• Idea
– assumes best-effort network (FIFO or FQ routers) each
source determines network capacity for itself
– uses implicit feedback
– ACKs pace transmission (self-clocking)
• Challenge
– determining the available capacity in the first place
– adjusting to changes in the available capacity
Spring 2003 CS 461 8
Additive Increase/Multiplicative
Decrease
• Objective: adjust to changes in the available capacity
• New state variable per connection: CongestionWindow
– limits how much data source has in transit
MaxWin = MIN(CongestionWindow,
AdvertisedWindow)
EffWin = MaxWin - (LastByteSent -
LastByteAcked)
• Idea:
– increase CongestionWindow when congestion goes down
– decrease CongestionWindow when congestion goes up
Spring 2003 CS 461 9
AIMD (cont)
• Question: how does the source determine whether or
not the network is congested?
• Answer: a timeout occurs
– timeout signals that a packet was lost
– packets are seldom lost due to transmission error
– lost packet implies congestion
Spring 2003 CS 461 10
AIMD (cont)
Source Destination
• Algorithm
– increment CongestionWindow by
one packet per RTT (linear increase)
– divide CongestionWindow by two
whenever a timeout occurs
(multiplicative decrease)
…
• In practice: increment a little for each ACK
Increment = (MSS * MSS)/CongestionWindow
CongestionWindow += Increment
Spring 2003 CS 461 11
AIMD (cont)
• Trace: sawtooth behavior
70
60
50
40
KB
30
20
10
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
Time (seconds)
Spring 2003 CS 461 12
Slow Start
Source Destination
• Objective: determine the available
capacity in the first
• Idea:
– begin with CongestionWindow = 1
packet
– double CongestionWindow each RTT
(increment by 1 packet for each ACK)
…
Spring 2003 CS 461 13
Slow Start (cont)
• Exponential growth, but slower than all at once
• Used…
– when first starting connection
– when connection goes dead waiting for timeout
• Trace
70
60
50
KB
40
30
20
10
• Problem:1.0lose up
2.0 to half
3.0 a CongestionWindow
4.0 5.0 6.0 7.0 ’s worth
8.0 of data
9.0
Spring 2003 CS 461 14
Fast Retransmit and Fast Recovery
Sender Receiver
• Problem: coarse-grain TCP Packet 1
timeouts lead to idle periods Packet 2
Packet 3 ACK 1
• Fast retransmit: use duplicate Packet 4 ACK 2
ACKs to trigger retransmission
Packet 5 ACK 2
Packet 6
ACK 2
ACK 2
Retransmit
packet 3
ACK 6
Spring 2003 CS 461 15
Results
70
60
50
40
KB
30
20
10
1.0 2.0 3.0 4.0 5.0 6.0 7.0
• Fast recovery
– skip the slow start phase
– go directly to half the last successful CongestionWindow (ssthresh)
Spring 2003 CS 461 16
Congestion Avoidance
• TCP’s strategy
– control congestion once it happens
– repeatedly increase load in an effort to find the point at which
congestion occurs, and then back off
• Alternative strategy
– predict when congestion is about to happen
– reduce rate before packets start being discarded
– call this congestion avoidance, instead of congestion control
• Two possibilities
– router-centric: DECbit and RED Gateways
– host-centric: TCP Vegas
Spring 2003 CS 461 17
DECbit
• Add binary congestion bit to each packet header
• Router
– monitors average queue length over last busy+idle cycle
Queue length
Current
time
Time
Previous Current
cycle
queue length > 1 cycle
– set congestion bit if average Averaging
– attempts to balance throughout against delay
interval
Spring 2003 CS 461 18
End Hosts
• Destination echoes bit back to source
• Source records how many packets resulted in set bit
• If less than 50% of last window’s worth had bit set
– increase CongestionWindow by 1 packet
• If 50% or more of last window’s worth had bit set
– decrease CongestionWindow by 0.875 times
Spring 2003 CS 461 19
Random Early Detection (RED)
• Notification is implicit
– just drop the packet (TCP will timeout)
– could make explicit by marking the packet
• Early random drop
– rather than wait for queue to become full, drop each
arriving packet with some drop probability whenever
the queue length exceeds some drop level
Spring 2003 CS 461 20
RED Details
• Compute average queue length
AvgLen = (1 - Weight) * AvgLen +
Weight * SampleLen
0 < Weight < 1 (usually 0.002)
SampleLen is queue length each time a
packet arrives
MaxThreshold MinThreshold
AvgLen
Spring 2003 CS 461 21
RED Details (cont)
• Two queue length thresholds
if AvgLen <= MinThreshold then
enqueue the packet
if MinThreshold < AvgLen < MaxThreshold then
calculate probability P
drop arriving packet with probability P
if MaxThreshold <= AvgLen then
drop arriving packet
Spring 2003 CS 461 22
RED Details (cont)
• Computing probability P
TempP = MaxP * (AvgLen -
MinThreshold)/
(MaxThreshold - MinThreshold)
P = TempP/(1 - count * TempP)
P(drop)
• Drop Probability Curve
1.0
MaxP
AvgLen
MinThresh MaxThresh
Spring 2003 CS 461 23
Tuning RED
• Probability of dropping a particular flow’s packet(s) is roughly proportional
to the share of the bandwidth that flow is currently getting
• MaxP is typically set to 0.02, meaning that when the average queue size is
halfway between the two thresholds, the gateway drops roughly one out of
50 packets.
• If traffic id bursty, then MinThreshold should be sufficiently large to
allow link utilization to be maintained at an acceptably high level
• Difference between two thresholds should be larger than the typical increase
in the calculated average queue length in one RTT; setting MaxThreshold
to twice MinThreshold is reasonable for traffic on today’s Internet
• Penalty Box for Offenders
Spring 2003 CS 461 24
TCP Vegas
• Idea: source watches for some sign that router’s queue is
building up and congestion will happen too; e.g.,
– RTT grows 70
60
– sending rate flattens 50
40
KB
30
20
10
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5
Time (seconds)
1100
Sending KBps 900
700
500
300
100
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5
Time (seconds)
Queue size in router
10
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5
Time (seconds)
Spring 2003 CS 461 25
Algorithm
• Let BaseRTT be the minimum of all measured RTTs (commonly the
RTT of the first packet)
• If not overflowing the connection, then
ExpectRate = CongestionWindow/BaseRTT
• Source calculates sending rate (ActualRate) once per RTT
• Source compares ActualRate with ExpectRate
Diff = ExpectedRate - ActualRate
if Diff < α
increase CongestionWindow linearly
else if Diff > β
decrease CongestionWindow linearly
else
leave CongestionWindow unchanged
Spring 2003 CS 461 26
Algorithm (cont)
Parameters
• α = 1 packet 70
− β = 3 packets 60
− 50
40
KB
30
20
10
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
Time (seconds)
240
200
CAM KBps
160
120
80
40
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
Time (seconds)
Even faster retransmit
• keep fine-grained timestamps for each packet
– check for timeout on first duplicate ACK
–
Spring 2003 CS 461 27