UNIT IV-Transport Layer
UNIT IV-Transport Layer
Port Number
Each process is assigned a unique 16-bit port number on that host.
Processes are identified by (host, port) pair.
Processes can be classified as either as client / server.
o Client process usually initiates exchange of information with the server
o Server process is identified by a well-known port number (0 – 1023).
o Client process is assigned an ephemeral port number (49152 – 65,535) by the OS.
o Some well known UDP ports are:
Process-to-Process Communication
Like UDP, TCP provides process-to-process communication. A TCP connection is identified a
4-tuple (SrcPort, SrcIPAddr, DstPort, DstIPAddr).
Some well-known port numbers used by TCP are
Segment Format
TCP is a byte-oriented protocol, i.e. the sender writes bytes into a TCP connection and the
receiver reads bytes out of the TCP connection.
TCP groups a number of bytes together into a packet called segment and adds a header onto
each segment. Segment is encapsulated in a IP datagram and transmitted.
SrcPort and DstPort fields identify the source and destination ports.
SequenceNum field contains sequence number, i.e. first byte of data in that segment.
Acknowledgment defines byte number of the segment, the receiver expects next.
HdrLen field specifies the number of 4-byte words in the TCP header.
Flags field contains six control bits or flags. They are set to indicate:
o URG—indicates that the segment contains urgent data.
o ACK—the value of acknowledgment field is valid.
o PSH—indicates sender has invoked the push operation.
o RESET—signifies that receiver wants to abort the connection.
o SYN—synchronize sequence numbers during connection establishment.
o FIN—terminates the connection
AdvertisedWindow field defines the receiver window and acts as flow control.
Checksum field is computed over the TCP header, the TCP data, and pseudoheader.
UrgPtr field indicates where the non-urgent data contained in the segment begins.
Optional information (max. 40 bytes) can be contained in the header.
Connection Establishment
The connection establishment in TCP is called three-way handshaking as shown below:
1. The client (active participant) sends a segment to the server (passive participant) stating the
initial sequence number it is to use (Flags = SYN, SequenceNum = x)
2. The server responds with a single segment that both acknowledges the client’s sequence
number (Flags = ACK, Ack = x + 1) and states its own beginning sequence number (Flags =
SYN, SequenceNum = y).
3. Finally, the client responds with a segment that acknowledges the server’s sequence number
(Flags = ACK, Ack = y + 1).
Connection Termination
Three-way Handshaking—Most implementation follow three-way handshaking as shown.
1. The client TCP after receiving a Close command from the client process sends a FIN
segment. A FIN segment can include the last chunk of data.
2. The server TCP responds with FIN + ACK segment to inform its closing.
3. The client TCP finally sends an ACK segment.
Four-way Half-Close—In TCP, one end can stop sending data while still receiving data,
known as half-close. For instance, submit its data to the server initially for processing and
close its connection. At a later time, the client receives the processed data from the server.
1. The client TCP half-closes the connection by sending a FIN segment.
2. The server TCP accepts the half-close by sending the ACK segment. The data transfer
from the client to the server stops.
3. The server can send data to the client and acknowledgement can come from the client.
4. When the server has sent all the processed data, it sends a FIN segment to the client.
5. The FIN segment is acknowledged by the client.
Write short notes on urgent data in TCP?
TCP is a stream-oriented protocol, i.e., each byte of data has a position in the stream.
At times an application may need to send urgent data, i.e., sending process wants a piece of
data to be read out of order by the receiving process. For example, to abort the process by issuing
Ctrl + C keystroke.
The above scenario is handled by setting the URG bit.
The sending TCP inserts the urgent data at beginning of the segment.
The urgent pointer field in the header defines start of normal data.
When the receiving TCP receives a segment with the URG bit set, it delivers urgent data out
of order to the receiving application.
Flow Control
The capacity of send and receiver buffer is MaxSendBuffer and MaxRcvBuffer respectively.
The sending TCP prevents overflowing of its buffer by maintaining
LastByteWritten - LastByteAcked ≤ MaxSendBuffer
The receiving TCP avoids overflowing its receive buffer by maintaining
LastByteRcvd - LastByteRead ≤ MaxRcvBuffer
The receiver throttles the sender by advertising a window that is no larger than the amount of
free space that it can buffer as
AdvertisedWindow = MaxRcvBuffer - ((NextByteExpected - 1) LastByteRead)
When data arrives, the receiver acknowledges it as long as preceding bytes have arrived.
o LastByteRcvd moves to its right (incremented), and the advertised window shrinks
The advertised window expands when the data is read by the application
o It data is read as fast as it arrives then AdvertisedWindow = MaxRcvBuffer
o If it is read slow, it eventually leads to a AdvertisedWindow of size 0.
The sending TCP adheres to the advertised window by computing effective window, that limits
how much data it should send as
EffectiveWindow = AdvertisedWindow - (LastByteSent -LastByteAcked)
When a acknowledgement arrives for x bytes, LastByteAcked is incremented by x and the
buffer space is freed accordingly
AdvertisedWindow
The TCP's AdvertisedWindow field is 16 bits long, half the size of SequenceNum
The length of 16-bits ensures that it does not wrap around
The length of AdvertisedWindow is designed such that it allows the sender to keep the pipe
full.
The 16-bit length also accounts for product of delay × bandwidth. It is not big enough, in case
of a T3 connection, but taken care by TCP extension headers.
Original Algorithm
TCP estimates SampleRTT by computing the duration between sending of a packet and arrival
of its ACK.
TCP then computes EstimatedRTT as a weighted average between the previous and current
estimate as
EstimatedRTT = α × EstimatedRTT + (1 -α) × SampleRTT
where α is the smoothening factor and its value is in the range 0.8–0.9
Timeout is twice the EstimatedRTT
TimeOut = 2 × EstimatedRTT
Karn/Partridge Algorithm
The flaw discovered in original algorithm after years of use is
o whether an ACK should be associated with the original or retransmission segment
o If ACK is associated with original one, then SampleRTT becomes too large
o If ACK is associated with retransmission, then SampleRTT becomes too small
Karn/Partridge proposed a solution to the above by making changes to the timeout mechanism.
Each time TCP retransmits, it sets the next timeout to be twice the last timeout.
o Loss of segments is mostly due to congestion and hence TCP source does not react
aggressively to a timeout.
Jacobson/Karels Algorithm
The main problem with original algorithm is that variance of the sample RTTs is not taken
into account.
o if variation among samples is small, then EstimatedRTT can be trusted
o otherwise timeout should not be tightly coupled with the EstimatedRTT
In this new approach, the sender measures a new SampleRTT as before.
The Deviation amongst RTTs is computed as follows:
Difference = SampleRTT - EstimatedRTT
EstimatedRTT = EstimatedRTT + (δ × Difference)
Deviation = Deviation + ( |Difference| Deviation)
where δ is a fraction between 0 and 1
TCP now computes TimeOut as a function of both EstimatedRTT and Deviation as listed:
TimeOut = µ × EstimatedRTT + φ × Deviation
where µ = 1 and φ = 4 usually
When variance is small, difference between TimeOut and EstimatedRTT is negligible.
When variance is larger, Deviation plays a greater role in deciding TimeOut.
Nagle’s Algorithm
Nagle's suggests a solution as to what the sending TCP should do when there is data to send
and window size is less than one MSS. The algorithm is listed below:
When the application produces data to send
if both the available data and the window ≥ MSS
send a full segment
else
if there is unACKed data in flight
buffer the new data until an ACK arrives
else
send all the new data now
It’s always OK to send a full segment if the window allows.
It’s also OK to immediately send a small amount of data if there are currently no
segments in transit, but if there is anything in flight, the sender must wait for an ACK before
transmitting the next segment.
Slow Start
Slow start increases the congestion window exponentially, rather than linearly. It is usually
used from cold start.
The source starts by setting CongestionWindow to one packet.
o When ACK arrives, TCP adds 1 to CongestionWindow and sends two packets.
o Upon receiving two ACKs, TCP increments CongestionWindow by 2 and sends four
packets.
o Thus TCP doubles the number of packets every RTT as shown.
Slow start provides exponential growth and is designed to avoid bursty nature of TCP.
Initially TCP has no idea about congestion, henceforth it increases CongestionWindow rapidly
until there is a packet loss.
When a packet is lost:
o TCP immediately decreases CongestionWindow by half (multiplicative decrease).
o It stores the current value of CongestionWindow as CongestionThreshold and resets to
CongestionWindow one packet
o The CongestionWindow is incremented one packet for each ACK arrived until it
reaches CongestionThreshold and thereafter one packet per RTT.
In initial stages, TCP loses more packets because it attempts to learn the available bandwidth
quickly through exponential increase
An alternate strategy to slow start is known as packet pair
o Send packets without space and then observe timings of their ACKs.
o The difference between ACKs is taken as a measure of congestion in the network
In the above example, initial slow start causes increase in CongestionWindow (34KB). The trace
then flattens at 2 sec due to loss of packets. CongestionThreshold is set to 17KB (34/2) and
CongestionWindow to 1 packet. Thereafter additive increase is followed
Fast Retransmit and Fast Recovery
Fast retransmit is a heuristic that triggers the retransmission of a dropped packet sooner than
the regular timeout mechanism. It does not replace regular timeouts.
When a packet arrives out of order, the receiving TCP resends the same acknowledgment
(duplicate ACK) it sent the last time.
The sending TCP waits for three duplicate ACK, to confirm that the packet is lost before
retransmitting the lost packet. This is known as fast retransmit and it signals congestion.
Instead of setting CongestionWindow to one packet, this method uses the ACKs that are still
in the pipe to clock the sending of packets. This mechanism is called fast recovery.
The fast recovery mechanism removes the slow start phase and follows additive increase.
The fast retransmit/recovery results increase in throughput by 20%.
The following example shows transmission of packets in which the third packet gets lost. The
sender on receiving three duplicate ACKs (ACK 2) retransmits the third packet as shown below.
On receiving the lost one, the receiver acknowledges the packet with highest number.
In this strategy:
Slow start is only used at the beginning of a connection and when the regular timeout
occurs.
At other times, the congestion window follows a pure additive increase/multiplicative
decrease pattern
TCP's fast retransmit can detect up to three dropped packets per window.
Explain in detail about TCP congestion avoidance algorithms.
Congestion avoidance refers to mechanisms that prevent congestion before it actually
occurs.
TCP increases the load and when congestion is likely to occur, it decreases load on
the network.
TCP creates loss of packets in order to determine bandwidth of the connection
The three congestion-avoidance mechanisms are:
o DECbit
o Random Early Detection (RED)
o Source-based congestion avoidance
DECbit
Was developed for use on Digital Network Architecture
In DEC bit, each router monitors the load it is experiencing and explicitly notifies the end
nodes when congestion is about to occur by setting a binary congestion bit called DECbit in
packets that flow through it.
The destination host copies the DECbit into the ACK and sends back to the source.
Eventually the source reduces its transmission rate and congestion is avoided.
Algorithm
A single congestion bit is added to the packet header.
A router sets this bit in a packet if its average queue length is 1 when the packet
arrives.
The average queue length is measured over a time interval that spans the last busy +
last idle cycle + current busy cycle as shown below.
Router calculates average queue length by dividing the curve area by time interval
The source computes how many ACK has the DECbit set for the previous window packets it
has sent.
o If less than 50% of the packets had its DECbit set, then source increases its congestion window
by 1 packet.
o Otherwise, source decrease the congestion window by 87.5% (multiply its previous value by
0.875)
“Increase by 1, decrease by 0.875” rule is its additive increase/multiplicative decrease
strategy.
RED has two queue length thresholds MinThreshold and MaxThreshold. When a packet
arrives at the gateway, RED compares the current AvgLen with these thresholds and decides
whether to queue or drop the packet as follows:
if AvgLen MinThreshold
queue the packet
if MinThreshold < AvgLen < MaxThreshold
calculate probability P
drop the arriving packet with probability P
if MaxThreshold AvgLen
drop the arriving packet
o The probability of drop increases slowly when AvgLen is between the two thresholds, reaching
MaxP at the upper threshold, at which point it jumps to unity as shown.
RED thresholds Drop probability function
o P is a function of both AvgLen and how long it has been since the last packet was dropped. It is
computed as
TempP = MaxP × (AvgLen - MinThreshold)/(MaxThreshold - MinThreshold)
P = TempP/(1 - count × TempP)
Because RED drops packets randomly, the probability that RED decides to drop a flow’s
packet(s) is roughly proportional to the share of the bandwidth for that flow.
MaxThreshold is set to twice of MinThreshold as it works well for the Internet traffic.
There should be enough free buffer space above MaxThreshold to absorb bursty traffic.
Source-Based Congestion Avoidance
The source looks for signs of congestion on the network, for example, a considerable increase
in the RTT, indicate queuing at a router.
Some mechanisms
1. Every two round-trip delays, it checks to see if the current RTT is greater than the average of
the minimum and maximum RTTs.
a. If it is, then the algorithm decreases the congestion window by one-eighth.
b. Otherwise the normal increase as in TCP.
2. The window is adjusted once every two round-trip delays based on the product
(CurrentWindow - OldWindow) × (CurrentRTT - OldRTT)
a. If the result is positive, the source decreases the window size by one-eighth
b. Otherwise, the source increases the window by one maximum packet size.
3. Every RTT, it increases the window size by one packet and compares the throughput achieved
to the throughput when the window was one packet smaller.
a. If the difference is less than one-half the throughput achieved when only one packet was in
transit, it decreases the window by one packet.
TCP Vegas
In standard TCP, it was observed that throughput increases as congestion window increases,
but not beyond the available bandwidth.
Any further increase in the window size only results in packets taking up buffer space at the
bottleneck router
TCP Vegas uses this idea to measure and control the right amount of extra data in transit.
If a source is sending too much extra data, it will cause long delays and possibly lead to
congestion.
TCP Vegas’s congestion-avoidance actions are based on changes in the estimated amount of
extra data in the network.
A flow’s BaseRTT is set to the minimum of all RTTs and is mostly the first packet
sent.
The expected throughput is given by ExpectedRate = CongestionWindow/BaseRTT
The sending rate, ActualRate is computed by dividing number of bytes transmitted during a
RTT by that RTT.
The difference between two rates is computed, say Diff = ExpectedRate – ActualRate
Two thresholds α and β are defined such that α < β
o When Diff < α, the congestion window is linearly increased during the next RTT
o When Diff > β, the congestion window is linearly decreased during the next RTT
o When α < Diff < β, the congestion window is unchanged
When actual and expected output varies significantly, the congestion window is reduced as it
indicates congestion in the network.
When actual and expected output is almost the same, the congestion window is increased to
utilize the available bandwidth.
The overall goal is to keep between α and β extra bytes in the network. The expected & actual
throughput with thresholds α and β (shaded region) is shown below
Flowspec
The set of information given to the network for a given flow is called flowspec. It has two
parts namely
o Tspec defines the traffic characterization of the flow
o Rspec defines resources that the flow needs to reserve (buffer, bandwidth, etc.)
TSpec
The bandwidth of real-time application varies constantly for most application.
The average rate of flows cannot be taken into account as variable bit rate applications exceed
the average rate. This leads to queuing and subsequent delay/loss of packets.
Token Bucket
The solution to manage varying bandwidth is to use token bucket filter that can describe
bandwidth characteristics of a source/flow.
The two parameters used are token rate r and a bucket depth B
A token is required to send a byte of data.
A source can accumulate tokens at rate r/second, but not more than B tokens.
Bursty data of more than r bytes per second is not permitted. Therefore bursty data should be
spread over a long interval.
The token bucket provides information that is used by admission control algorithm to
determine whether or not to consider the new request for service.
The following example shows two flows with equal average rates but different token bucket
descriptions.
Flow A generates data at a steady rate of 1 Mbps, which is described using a token
bucket filter with rate r = 1 Mbps and a bucket depth B = 1 byte.
Flow B sends at rate of 0.5 Mbps for 2 seconds and then at 2 Mbps for 1 second,
which is described using a token bucket filter with rate r = 1 Mbps and a bucket depth
B = 1 MB. The additional depth allows it to accumulate tokens when it sends 0.5
Mbps (2 × 0.5 = 1 MB) and uses the same to send for bursty data of 2 Mbps.
Admission Control
When a flow requests a level of service, admission control examines TSpec and RSpec of the
flow.
It checks to see whether the desired service can be provided with currently available resources,
without causing any worse service to previously admitted flows.
o If it can provide the service, the flow is admitted otherwise denied.
The decision to allow/deny a service can be heuristic such as "currently delays are within
bounds, therefore another service can be admitted."
Admission control is closely related to policy. For example, a network admin will allow CEO
to make reservations and forbid requests from other employees.
Reservation Protocol (RSVP)
The Resource Reservation Protocol (RSVP) is a signaling protocol to help IP create a flow and
make a resource reservation.
RSVP provides resource reservations for all kinds of traffic including multimedia which uses
multicasting. RSVP supports both unicast and multicast flows.
RSVP is a robust protocol that relies on soft state in the routers.
o Soft state unlike hard state (as in ATM, VC), times out after a short period if it is not
refreshed. It does not require to be deleted.
o The default interval is 30 ms.
Since multicasting involves large number of receivers than senders, RSVP follows receiver-
oriented approach that makes receivers to keep track of their requirements.
RSVP Messages
To make a reservation, the receiver needs to know:
o What traffic the sender is likely to send so as to make an appropriate reservation, i.e., TSpec.
o Secondly, what path the packets will travel.
The sender sends a PATH message to all receivers (downstream) containing TSpec.
A PATH message stores necessary information for the receivers on the way. PATH messages
are sent about every 30 seconds.
The receiver sends a reservation request as a RESV message back to the sender (upstream),
containing sender's TSpec and receiver requirement RSpec.
Each router on the path looks at the RESV request and tries to allocate necessary resources to
satisfy and passes the request onto the next router.
o If allocation is not feasible, the router sends an error message to the receiver
If there is any failure in the link a new path is discovered between sender and the receiver. The
RESV message follows the new path thereafter.
A router reserves resources as long as it receives RESV message, otherwise released.
If a router does not support RSVP, then best-effort delivery is followed.
Reservation Merging
In RSVP, the resources are not reserved for each receiver in a flow, but merged.
When a RESV message travels from receiver up the multicast tree, it is likely to come across a
router where reservations have already been made for some other flow.
If the new resource requirements can be met using existing allocations, then new allocations
need not be made.
o For example, receiver A has already made a request for a guaranteed delay of less than 100 ms.
If B comes with a new request for a delay of less than 200 ms, then no new reservations are
made.
o Another example shows router R3 merging requests from Rc1, Rc2 and Rc3 before making
bandwidth reservation.
A router that handles multiple requests with one reservation is known as merge point. This is
because, different receivers require different quality.
Reservation merging meets the needs of all receivers downstream of the merge point.
Packet Classifying and Scheduling
Packet classification refers to the process of associating each packet with corresponding
reservation.
o This is done by examining the fields source address, destination address, protocol
number, source port and destination port in the packet header.
Scheduling refers to the process of managing packets in queues to ensure that they get the
requested service.
o Weighted fair queuing or a combination of queuing disciplines can be used.
6-bit DSCP can be used to define 64 PHB that could be applied to a packet.
The three PHBs defined are default PHB (DE PHB), expedited forwarding PHB (EF PHB)
and assured forwarding PHB (AF PHB).
The DE PHB is the same as best-effort delivery and is compatible with TOS
Assured Forwarding
The AF PHB is based on RED with In and Out (RIO) algorithm.
In RIO, the drop probability increases as the average queue length increases.
The following example shows RIO with two classes named in and out.
The out curve has a lower MinThreshold than in curve, therefore under low levels of
congestion, only packets marked out will be discarded.
If the average queue length exceeds Minin, packets marked in are also dropped.
The terms in and out are explained with the example "Customer X is allowed to send up to y
Mbps of assured traffic".
o If the customer sends packets less than y Mbps then packets are marked in.
o When the customer exceeds y Mbps, the excess packets are marked out.
Thus combination of profile meter at the edge router and RIO in all routers, assures (but does
not guarantee) the customer that packets within the profile will be delivered
RIO does not change the delivery order of in and out packets.
If weighted fair queuing is used, then weight for the premium queue is chosen using the
formula. It is based on the load of premium packets.
Bpremium = Wpremium / (Wpremium + Wbest-effort)
o For example, if weight of premium queue is 1 and best-effort is 4, then only 20% of the
link is reserved for premium packets.
ABR allows a source to increase or decrease its allotted rate as conditions dictate.
ABR class delivers cells at a minimum rate. If more network capacity is available, this
minimum rate can be exceeded.
ABR is suitable for applications that are bursty in nature.