0% found this document useful (0 votes)
95 views544 pages

Computer Networks

This document provides an introduction to networking concepts, focusing on the Internet's structure, protocols, and components. It covers topics such as the network edge, core, performance metrics, and security, while also explaining the differences between packet switching and circuit switching. The slides are made available for free use with proper attribution to the authors and their work.

Uploaded by

emperorjnx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views544 pages

Computer Networks

This document provides an introduction to networking concepts, focusing on the Internet's structure, protocols, and components. It covers topics such as the network edge, core, performance metrics, and security, while also explaining the differences between packet switching and circuit switching. The slides are made available for free use with proper attribution to the authors and their work.

Uploaded by

emperorjnx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 544

Chapter 1

Introduction

A note on the use of these ppt slides:


We’re making these slides freely available to all (faculty, students, readers). Computer
They’re in PowerPoint form so you see the animations; and can add, modify,
and delete slides (including this one) and slide content to suit your needs. Networking: A Top
They obviously represent a lot of work on our part. In return for use, we only
ask the following: Down Approach
 If you use these slides (e.g., in a class) that you mention their source
(after all, we’d like people to use our book!)
6th edition
 If you post any slides on a www site, that you note that they are adapted Jim Kurose, Keith Ross
from (or perhaps identical to) our slides, and note our copyright of this
material.
Addison-Wesley
March 2012
Thanks and enjoy! JFK/KWR

All material copyright 1996-2012


J.F Kurose and K.W. Ross, All Rights Reserved

Introduction 1-1
Chapter 1: introduction
our goal: overview:
 get “feel” and  what’s the Internet?
terminology  what’s a protocol?
 network edge; hosts, access net,
 more depth, detail
physical media
later in course  network core: packet/circuit
 approach: switching, Internet structure
 use Internet as  performance: loss, delay,
throughput
example
 security
 protocol layers, service models
 history

Introduction 1-2
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
 end systems, access networks, links
1.3 network core
 packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-3
What’s the Internet: “nuts and bolts” view
 millions of connected
PC mobile network
server computing devices:
wireless  hosts = end systems global ISP
laptop
smartphone  running network apps
home
 communication links network
regional ISP
wireless  fiber, copper, radio,
links satellite
wired
links  transmission rate:
bandwidth

 Packetswitches: forward
router packets (chunks of data) institutional
network
 routers and switches
Introduction 1-4
“Fun” internet appliances

Web-enabled toaster +
weather forecaster

IP picture frame
http://www.ceiva.com/

Tweet-a-watt:
monitor energy use

Slingbox: watch,
control cable TV remotely
Internet
refrigerator Internet phones
Introduction 1-5
What’s the Internet: “nuts and bolts” view
mobile network
 Internet: “network of networks”
 Interconnected ISPs
global ISP
 protocols control sending,
receiving of msgs
 e.g., TCP, IP, HTTP, Skype, 802.11 home
network
 Internet standards regional ISP
 RFC: Request for comments
 IETF: Internet Engineering Task
Force

institutional
network

Introduction 1-6
What’s the Internet: a service view
mobile network
 Infrastructure that provides
services to applications: global ISP

 Web, VoIP, email, games, e-


commerce, social nets, … home
 provides programming network
regional ISP
interface to apps
 hooks that allow sending
and receiving app programs
to “connect” to Internet
 provides service options,
analogous to postal service
institutional
network

Introduction 1-7
What’s a protocol?
human protocols: network protocols:
 “what’s the time?”  machines rather than
 “I have a question” humans
 introductions  all communication activity
in Internet governed by
protocols
… specific msgs sent
… specific actions taken
when msgs received, or protocols define format, order
other events
of msgs sent and received
among network entities,
and actions taken on msg
transmission, receipt
Introduction 1-8
What’s a protocol?
a human protocol and a computer network protocol:

Hi TCP connection
request
Hi TCP connection
response
Got the
time? Get http://www.awl.com/kurose-ross
2:00
<file>
time

Q: other human protocols?


Introduction 1-9
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
 end systems, access networks, links
1.3 network core
 packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-10
A closer look at network structure:
 network edge: mobile network

 hosts: clients and servers


global ISP
 servers often in data
centers
home
 access networks, physical network
regional ISP
media: wired, wireless
communication links

 network core:
 interconnected routers
 network of networks institutional
network

Introduction 1-11
Access networks and physical media

Q: How to connect end


systems to edge router?
 residential access nets
 institutional access
networks (school,
company)
 mobile access networks
keep in mind:
 bandwidth (bits per second)
of access network?
 shared or dedicated?

Introduction 1-12
Access net: digital subscriber line (DSL)
central office telephone
network

DSL splitter
modem DSLAM

ISP
voice, data transmitted
at different frequencies over DSL access
dedicated line to central office multiplexer

 use existing telephone line to central office DSLAM


 data over DSL phone line goes to Internet
 voice over DSL phone line goes to telephone net
 < 2.5 Mbps upstream transmission rate (typically < 1 Mbps)
 < 24 Mbps downstream transmission rate (typically < 10 Mbps)
Introduction 1-13
Access net: cable network
cable headend

cable splitter
modem

C
O
V V V V V V N
I I I I I I D D T
D D D D D D A A R
E E E E E E T T O
O O O O O O A A L

1 2 3 4 5 6 7 8 9

Channels

frequency division multiplexing: different channels transmitted


in different frequency bands
Introduction 1-14
Access net: cable network
cable headend

cable splitter cable modem


modem CMTS termination system

data, TV transmitted at different


frequencies over shared cable ISP
distribution network

 HFC: hybrid fiber coax


 asymmetric: up to 30Mbps downstream transmission rate, 2
Mbps upstream transmission rate
 network of cable, fiber attaches homes to ISP router
 homes share access network to cable headend
 unlike DSL, which has dedicated access to central office
Introduction 1-15
Access net: home network
wireless
devices

to/from headend or
central office
often combined
in single box

cable or DSL modem

wireless access router, firewall, NAT


point (54 Mbps)
wired Ethernet (100 Mbps)

Introduction 1-16
Enterprise access networks (Ethernet)

institutional link to
ISP (Internet)
institutional router

Ethernet institutional mail,


switch web servers

 typically used in companies, universities, etc


 10 Mbps, 100Mbps, 1Gbps, 10Gbps transmission rates
 today, end systems typically connect into Ethernet switch

Introduction 1-17
Wireless access networks
 shared wireless access network connects end system to router
 via base station aka “access point”

wireless LANs: wide-area wireless access


 within building (100 ft)  provided by telco (cellular)
 802.11b/g (WiFi): 11, 54 Mbps operator, 10’s km
transmission rate  between 1 and 10 Mbps
 3G, 4G: LTE

to Internet

to Internet

Introduction 1-18
Host: sends packets of data
host sending function:
 takes application message
 breaks into smaller two packets,
chunks, known as packets, L bits each
of length L bits
 transmits packet into
access network at 2 1
transmission rate R R: link transmission rate
 link transmission rate, host
aka link capacity, aka
link bandwidth

packet time needed to L (bits)


transmission = transmit L-bit =
delay packet into link R (bits/sec)
1-19
Physical media
 bit: propagates between
transmitter/receiver pairs
 physical link: what lies twisted pair (TP)
between transmitter &  two insulated copper
receiver wires
 guided media:  Category 5: 100 Mbps, 1
Gpbs Ethernet
 signals propagate in solid  Category 6: 10Gbps
media: copper, fiber, coax
 unguided media:
 signals propagate freely,
e.g., radio

Introduction 1-20
Physical media: coax, fiber
coaxial cable: fiber optic cable:
 two concentric copper  glass fiber carrying light
conductors pulses, each pulse a bit
 bidirectional  high-speed operation:
 broadband:  high-speed point-to-point
 multiple channels on cable transmission (e.g., 10’s-100’s
Gpbs transmission rate)
 HFC
 low error rate:
 repeaters spaced far apart
 immune to electromagnetic
noise

Introduction 1-21
Physical media: radio
 signal carried in radio link types:
electromagnetic spectrum  terrestrial microwave
 no physical “wire”  e.g. up to 45 Mbps channels
 bidirectional  LAN (e.g., WiFi)
 propagation environment  11Mbps, 54 Mbps
effects:  wide-area (e.g., cellular)
 reflection  3G cellular: ~ few Mbps
 obstruction by objects  satellite
 interference  Kbps to 45Mbps channel (or
multiple smaller channels)
 270 msec end-end delay
 geosynchronous versus low
altitude

Introduction 1-22
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
 end systems, access networks, links
1.3 network core
 packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-23
The network core
 mesh of interconnected
routers
 packet-switching: hosts
break application-layer
messages into packets
 forward packets from one
router to the next, across
links on path from source
to destination
 each packet transmitted at
full link capacity

Introduction 1-24
Packet-switching: store-and-forward

L bits
per packet

3 2 1
source destination
R bps R bps

 takes L/R seconds to one-hop numerical example:


transmit (push out) L-bit
packet into link at R bps  L = 7.5 Mbits
 store and forward: entire  R = 1.5 Mbps
packet must arrive at router  one-hop transmission
before it can be transmitted delay = 5 sec
on next link
 end-end delay = 2L/R (assuming
zero propagation delay) more on delay shortly …
Introduction 1-25
Packet Switching: queueing delay, loss

R = 100 Mb/s C
A
D
R = 1.5 Mb/s
B
queue of packets E
waiting for output link

queuing and loss:


 If arrival rate (in bits) to link exceeds transmission rate of
link for a period of time:
 packets will queue, wait to be transmitted on link
 packets can be dropped (lost) if memory (buffer) fills up

Introduction 1-26
Two key network-core functions
routing: determines source- forwarding: move packets from
destination route taken by router’s input to appropriate
packets router output
 routing algorithms

routing algorithm

local forwarding table


header value output link
0100 3 1
0101 2
0111 2 3 2
1001 1

dest address in arriving


packet’s header
Network Layer 4-27
Alternative core: circuit switching
end-end resources allocated
to, reserved for “call”
between source & dest:
 In diagram, each link has four
circuits.
 call gets 2nd circuit in top
link and 1st circuit in right
link.
 dedicated resources: no sharing
 circuit-like (guaranteed)
performance
 circuit segment idle if not used
by call (no sharing)
 Commonly used in traditional
telephone networks
Introduction 1-28
Circuit switching: FDM versus TDM
Example:
FDM
4 users

frequency

time
TDM

frequency

time
Introduction 1-29
Packet switching versus circuit switching
packet switching allows more users to use network!

example:
 1 Mb/s link
 each user: N
users
• 100 kb/s when “active”
• active 10% of time 1 Mbps link

 circuit-switching:
 10 users
 packet switching: Q: how did we get value 0.0004?
 with 35 users, probability >
10 active at same time is less Q: what happens if > 35 users ?
than .0004 *

* Check out the online interactive exercises for more examples Introduction 1-30
Packet switching versus circuit switching
is packet switching a “slam dunk winner?”
 great for bursty data
 resource sharing
 simpler, no call setup
 excessive congestion possible: packet delay and loss
 protocols needed for reliable data transfer, congestion
control
 Q: How to provide circuit-like behavior?
 bandwidth guarantees needed for audio/video apps
 still an unsolved problem (chapter 7)

Q: human analogies of reserved resources (circuit switching)


versus on-demand allocation (packet-switching)?
Introduction 1-31
Internet structure: network of networks
 End systems connect to Internet via access ISPs (Internet
Service Providers)
 Residential, company and university ISPs
 Access ISPs in turn must be interconnected.
 So that any two hosts can send packets to each other
 Resulting network of networks is very complex
 Evolution was driven by economics and national policies
 Let’s take a stepwise approach to describe current Internet
structure
Internet structure: network of networks
Question: given millions of access ISPs, how to connect them
together?
access access
net net
access
net
access
access net
net
access
access net
net

access access
net net

access
net
access
net

access
net
access
net
access access
net access net
net
Internet structure: network of networks
Option: connect each access ISP to every other access ISP?

access access
net net
access
net
access
access net
net
access
access net
net

connecting each access ISP


access
to each other directly doesn’t access
net
scale: O(N2) connections. net

access
net
access
net

access
net
access
net
access access
net access net
net
Internet structure: network of networks
Option: connect each access ISP to a global transit ISP? Customer
and provider ISPs have economic agreement.
access access
net net
access
net
access
access net
net
access
access net
net

global
access
net
ISP access
net

access
net
access
net

access
net
access
net
access access
net access net
net
Internet structure: network of networks
But if one global ISP is viable business, there will be competitors
….
access access
net net
access
net
access
access net
net
access
access net
net
ISP A

access access
net ISP B net

access
ISP C
net
access
net

access
net
access
net
access access
net access net
net
Internet structure: network of networks
But if one global ISP is viable business, there will be competitors
…. which must be interconnected
access access
Internet exchange point
net net
access
net
access
access net
net

access
IXP access
net
net
ISP A

access IXP access


net ISP B net

access
ISP C
net
access
net

access peering link


net
access
net
access access
net access net
net
Internet structure: network of networks
… and regional networks may arise to connect access nets to
ISPS
access access
net net
access
net
access
access net
net

access
IXP access
net
net
ISP A

access IXP access


net ISP B net

access
ISP C
net
access
net

access
net regional net
access
net
access access
net access net
net
Internet structure: network of networks
… and content provider networks (e.g., Google, Microsoft,
Akamai ) may run their own network, to bring services, content
close to end users
access access
net net
access
net
access
access net
net

access
IXP access
net
net
ISP A
Content provider network
access IXP access
net ISP B net

access
ISP B
net
access
net

access
net regional net
access
net
access access
net access net
net
Internet structure: network of networks

Tier 1 ISP Tier 1 ISP Google

IXP IXP IXP

Regional ISP Regional ISP

access access access access access access access access


ISP ISP ISP ISP ISP ISP ISP ISP

 at center: small # of well-connected large networks


 “tier-1” commercial ISPs (e.g., Level 3, Sprint, AT&T, NTT), national &
international coverage
 content provider network (e.g, Google): private network that connects
it data centers to Internet, often bypassing tier-1, regional ISPs Introduction 1-40
Tier-1 ISP: e.g., Sprint
POP: point-of-presence

to/from backbone

peering
… … …

to/from customers

Introduction 1-41
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
 end systems, access networks, links
1.3 network core
 packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-42
How do loss and delay occur?
packets queue in router buffers
 packet arrival rate to link (temporarily) exceeds output link
capacity
 packets queue, wait for turn
packet being transmitted (delay)

B
packets queueing (delay)
free (available) buffers: arriving packets
dropped (loss) if no free buffers

Introduction 1-43
Four sources of packet delay
transmission
A propagation

B
nodal
processing queueing

dnodal = dproc + dqueue + dtrans + dprop

dproc: nodal processing dqueue: queueing delay


 check bit errors  time waiting at output link
 determine output link for transmission
 typically < msec  depends on congestion
level of router
Introduction 1-44
Four sources of packet delay
transmission
A propagation

B
nodal
processing queueing

dnodal = dproc + dqueue + dtrans + dprop

dtrans: transmission delay: dprop: propagation delay:


 L: packet length (bits)  d: length of physical link
 R: link bandwidth (bps)  s: propagation speed in medium
 dtrans = L/R (~2x108 m/sec)
dtrans and dprop  dprop = d/s
very different
* Check out the Java applet for an interactive animation on trans vs. prop delay Introduction 1-45
Caravan analogy
100 km 100 km
ten-car toll toll
caravan booth booth

 cars “propagate” at  time to “push” entire


100 km/hr caravan through toll
 toll booth takes 12 sec to booth onto highway =
service car (bit transmission 12*10 = 120 sec
time)  time for last car to
 car~bit; caravan ~ packet propagate from 1st to
 Q: How long until caravan is 2nd toll both:
lined up before 2nd toll 100km/(100km/hr)= 1
booth? hr
 A: 62 minutes
Introduction 1-46
Caravan analogy (more)
100 km 100 km
ten-car toll toll
caravan booth booth

 suppose cars now “propagate” at 1000 km/hr


 and suppose toll booth now takes one min to service a car
 Q: Will cars arrive to 2nd booth before all cars serviced at first
booth?
 A: Yes! after 7 min, 1st car arrives at second booth; three
cars still at 1st booth.

Introduction 1-47
Queueing delay (revisited)

average queueing
 R: link bandwidth (bps)

delay
 L: packet length (bits)
 a: average packet arrival
rate
traffic intensity
= La/R
 La/R ~ 0: avg. queueing delay small La/R ~ 0

 La/R -> 1: avg. queueing delay large


 La/R > 1: more “work” arriving
than can be serviced, average delay infinite!

* Check out the Java applet for an interactive animation on queuing and loss La/R -> 1
Introduction 1-48
“Real” Internet delays and routes
 what do “real” Internet delay & loss look like?
 traceroute program: provides delay
measurement from source to router along end-
end Internet path towards destination. For all i:
 sends three packets that will reach router i on path
towards destination
 router i will return packets to sender
 sender times interval between transmission and reply.

3 probes 3 probes

3 probes

Introduction 1-49
“Real” Internet delays, routes
traceroute: gaia.cs.umass.edu to www.eurecom.fr
3 delay measurements from
gaia.cs.umass.edu to cs-gw.cs.umass.edu
1 cs-gw (128.119.240.254) 1 ms 1 ms 2 ms
2 border1-rt-fa5-1-0.gw.umass.edu (128.119.3.145) 1 ms 1 ms 2 ms
3 cht-vbns.gw.umass.edu (128.119.3.130) 6 ms 5 ms 5 ms
4 jn1-at1-0-0-19.wor.vbns.net (204.147.132.129) 16 ms 11 ms 13 ms
5 jn1-so7-0-0-0.wae.vbns.net (204.147.136.136) 21 ms 18 ms 18 ms
6 abilene-vbns.abilene.ucaid.edu (198.32.11.9) 22 ms 18 ms 22 ms
7 nycm-wash.abilene.ucaid.edu (198.32.8.46) 22 ms 22 ms 22 ms trans-oceanic
8 62.40.103.253 (62.40.103.253) 104 ms 109 ms 106 ms
9 de2-1.de1.de.geant.net (62.40.96.129) 109 ms 102 ms 104 ms link
10 de.fr1.fr.geant.net (62.40.96.50) 113 ms 121 ms 114 ms
11 renater-gw.fr1.fr.geant.net (62.40.103.54) 112 ms 114 ms 112 ms
12 nio-n2.cssi.renater.fr (193.51.206.13) 111 ms 114 ms 116 ms
13 nice.cssi.renater.fr (195.220.98.102) 123 ms 125 ms 124 ms
14 r3t2-nice.cssi.renater.fr (195.220.98.110) 126 ms 126 ms 124 ms
15 eurecom-valbonne.r3t2.ft.net (193.48.50.54) 135 ms 128 ms 133 ms
16 194.214.211.25 (194.214.211.25) 126 ms 128 ms 126 ms
17 * * *
18 * * * * means no response (probe lost, router not replying)
19 fantasia.eurecom.fr (193.55.113.142) 132 ms 128 ms 136 ms

* Do some traceroutes from exotic countries at www.traceroute.org


Introduction 1-50
Packet loss
 queue (aka buffer) preceding link in buffer has finite
capacity
 packet arriving to full queue dropped (aka lost)
 lost packet may be retransmitted by previous node,
by source end system, or not at all

buffer
(waiting area) packet being transmitted
A

B
packet arriving to
full buffer is lost
* Check out the Java applet for an interactive animation on queuing and loss Introduction 1-51
Throughput
 throughput: rate (bits/time unit) at which bits
transferred between sender/receiver
 instantaneous: rate at given point in time
 average: rate over longer period of time

server,
server with bits
sends linkpipe
capacity
that can carry linkpipe
capacity
that can carry
file of into
(fluid) F bitspipe Rs bits/sec
fluid at rate Rc bits/sec
fluid at rate
to send to client Rs bits/sec) Rc bits/sec)

Introduction 1-52
Throughput (more)
 Rs < Rc What is average end-end throughput?

Rs bits/sec Rc bits/sec

 Rs > Rc What is average end-end throughput?

Rs bits/sec Rc bits/sec

bottleneck link
link on end-end path that constrains end-end throughput
Introduction 1-53
Throughput: Internet scenario

 per-connection end-
end throughput: Rs
min(Rc,Rs,R/10) Rs Rs
 in practice: Rc or Rs
is often bottleneck
R

Rc Rc

Rc

10 connections (fairly) share


backbone bottleneck link R bits/sec
Introduction 1-54
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
 end systems, access networks, links
1.3 network core
 packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-55
Protocol “layers”
Networks are complex,
with many “pieces”:
 hosts Question:
 routers is there any hope of
 links of various organizing structure of
media network?
 applications
 protocols …. or at least our
 hardware, discussion of networks?
software

Introduction 1-56
Organization of air travel
ticket (purchase) ticket (complain)

baggage (check) baggage (claim)

gates (load) gates (unload)

runway takeoff runway landing

airplane routing airplane routing


airplane routing

 a series of steps

Introduction 1-57
Layering of airline functionality

ticket (purchase) ticket (complain) ticket

baggage (check) baggage (claim baggage

gates (load) gates (unload) gate

runway (takeoff) runway (land) takeoff/landing

airplane routing airplane routing airplane routing airplane routing airplane routing

departure intermediate air-traffic arrival


airport control centers airport

layers: each layer implements a service


 via its own internal-layer actions
 relying on services provided by layer below

Introduction 1-58
Why layering?
dealing with complex systems:
 explicit structure allows identification,
relationship of complex system’s pieces
 layered reference model for discussion
 modularization eases maintenance, updating of
system
 change of implementation of layer’s service
transparent to rest of system
 e.g., change in gate procedure doesn’t affect rest of
system
 layering considered harmful?

Introduction 1-59
Internet protocol stack
 application: supporting network
applications
 FTP, SMTP, HTTP application
 transport: process-process data
transfer transport
 TCP, UDP
network
 network: routing of datagrams
from source to destination
link
 IP, routing protocols
 link: data transfer between physical
neighboring network elements
 Ethernet, 802.111 (WiFi), PPP
 physical: bits “on the wire”
Introduction 1-60
ISO/OSI reference model
 presentation: allow applications
to interpret meaning of data, application
e.g., encryption, compression,
machine-specific conventions presentation
 session: synchronization, session
checkpointing, recovery of data transport
exchange
network
 Internet stack “missing” these
layers! link
 these services, if needed, must be physical
implemented in application
 needed?

Introduction 1-61
message M
source
application
Encapsulation
segment Ht M transport
datagram Hn Ht M network
frame Hl Hn Ht M link
physical
link
physical

switch

destination Hn Ht M network
M application Hl Hn Ht M link Hn Ht M
Ht M transport physical
Hn Ht M network
Hl Hn Ht M link router
physical

Introduction 1-62
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
 end systems, access networks, links
1.3 network core
 packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-63
Network security
 field of network security:
 how bad guys can attack computer networks
 how we can defend networks against attacks
 how to design architectures that are immune to
attacks
 Internet not originally designed with (much)
security in mind
 original vision: “a group of mutually trusting users
attached to a transparent network” 
 Internet protocol designers playing “catch-up”
 security considerations in all layers!

Introduction 1-64
Bad guys: put malware into hosts via Internet
 malware can get in host from:
 virus: self-replicating infection by receiving/executing
object (e.g., e-mail attachment)
 worm: self-replicating infection by passively receiving
object that gets itself executed
 spyware malware can record keystrokes, web
sites visited, upload info to collection site
 infected host can be enrolled in botnet, used for
spam. DDoS attacks

Introduction 1-65
Bad guys: attack server, network infrastructure
Denial of Service (DoS): attackers make resources
(server, bandwidth) unavailable to legitimate traffic
by overwhelming resource with bogus traffic

1. select target
2. break into hosts around
the network (see botnet)
3. send packets to target from
compromised hosts
target

Introduction 1-66
Bad guys can sniff packets
packet “sniffing”:
 broadcast media (shared ethernet, wireless)
 promiscuous network interface reads/records all packets
(e.g., including passwords!) passing by

A C

src:B dest:A payload


B

 wireshark software used for end-of-chapter labs is a


(free) packet-sniffer
Introduction 1-67
Bad guys can use fake addresses
IP spoofing: send packet with false source address
A C

src:B dest:A payload

… lots more on security (throughout, Chapter 8)

Introduction 1-68
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
 end systems, access networks, links
1.3 network core
 packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-69
Internet history
1961-1972: Early packet-switching principles
 1961: Kleinrock -  1972:
queueing theory shows  ARPAnet public demo
effectiveness of packet-  NCP (Network Control
switching Protocol) first host-host
 1964: Baran - packet- protocol
switching in military nets  first e-mail program
 1967: ARPAnet  ARPAnet has 15 nodes
conceived by Advanced
Research Projects
Agency
 1969: first ARPAnet
node operational

Introduction 1-70
Internet history
1972-1980: Internetworking, new and proprietary nets

 1970: ALOHAnet satellite


network in Hawaii Cerf and Kahn’s
 1974: Cerf and Kahn - internetworking principles:
architecture for interconnecting  minimalism, autonomy - no
networks internal changes required to
 1976: Ethernet at Xerox PARC interconnect networks
 best effort service model
 late70’s: proprietary
architectures: DECnet, SNA,  stateless routers
XNA  decentralized control
 late 70’s: switching fixed length define today’s Internet
packets (ATM precursor) architecture
 1979: ARPAnet has 200 nodes

Introduction 1-71
Internet history
1980-1990: new protocols, a proliferation of networks

 1983: deployment of  new national networks:


TCP/IP Csnet, BITnet, NSFnet,
 1982: smtp e-mail Minitel
protocol defined  100,000 hosts connected
 1983: DNS defined for to confederation of
name-to-IP-address networks
translation
 1985: ftp protocol defined
 1988: TCP congestion
control

Introduction 1-72
Internet history
1990, 2000’s: commercialization, the Web, new apps
 early 1990’s: ARPAnet late 1990’s – 2000’s:
decommissioned  more killer apps: instant
 1991: NSF lifts restrictions on messaging, P2P file sharing
commercial use of NSFnet  network security to
(decommissioned, 1995) forefront
 early 1990s: Web  est. 50 million host, 100
 hypertext [Bush 1945, million+ users
Nelson 1960’s]  backbone links running at
 HTML, HTTP: Berners-Lee Gbps
 1994: Mosaic, later Netscape
 late 1990’s:
commercialization of the Web

Introduction 1-73
Internet history
2005-present
 ~750 million hosts
 Smartphones and tablets
 Aggressive deployment of broadband access
 Increasing ubiquity of high-speed wireless access
 Emergence of online social networks:
 Facebook: soon one billion users
 Service providers (Google, Microsoft) create their own
networks
 Bypass Internet, providing “instantaneous” access
to search, emai, etc.
 E-commerce, universities, enterprises running their
services in “cloud” (eg, Amazon EC2)

Introduction 1-74
Introduction: summary
covered a “ton” of material! you now have:
 Internet overview  context, overview, “feel”
 what’s a protocol? of networking
 network edge, core, access  more depth, detail to
network follow!
 packet-switching versus
circuit-switching
 Internet structure
 performance: loss, delay,
throughput
 layering, service models
 security
 history

Introduction 1-75
Chapter 2
Application Layer

A note on the use of these ppt slides:


We’re making these slides freely available to all (faculty, students, readers). Computer
They’re in PowerPoint form so you see the animations; and can add, modify,
and delete slides (including this one) and slide content to suit your needs. Networking: A Top
They obviously represent a lot of work on our part. In return for use, we only
ask the following: Down Approach
❖ If you use these slides (e.g., in a class) that you mention their source
(after all, we’d like people to use our book!)
6th edition
❖ If you post any slides on a www site, that you note that they are adapted Jim Kurose, Keith Ross
from (or perhaps identical to) our slides, and note our copyright of this
material.
Addison-Wesley
March 2012
Thanks and enjoy! JFK/KWR

All material copyright 1996-2012


J.F Kurose and K.W. Ross, All Rights Reserved

Application Layer 2-1


Chapter 2: outline
2.1 principles of network 2.6 P2P applications
applications 2.7 socket programming
2.2 Web and HTTP with UDP and TCP
2.3 FTP
2.4 electronic mail
▪ SMTP, POP3, IMAP
2.5 DNS

Application Layer 2-2


Chapter 2: application layer
our goals: ❖ learn about protocols by
❖ conceptual, examining popular
implementation aspects application-level
of network application protocols
protocols ▪ HTTP
▪ transport-layer ▪ FTP
service models ▪ SMTP / POP3 / IMAP
▪ client-server ▪ DNS
paradigm ❖ creating network
▪ peer-to-peer applications
paradigm ▪ socket API

Application Layer 2-3


Some network apps
❖ e-mail ❖ voice over IP (e.g., Skype)
❖ web ❖ real-time video
❖ text messaging conferencing
❖ remote login ❖ social networking
❖ P2P file sharing ❖ search
❖ multi-user network games ❖ …
❖ streaming stored video ❖ …
(YouTube, Hulu, Netflix)

Application Layer 2-4


Creating a network app application
transport
network
data link

write programs that: physical

❖ run on (different) end systems


❖ communicate over network
❖ e.g., web server software
communicates with browser
software
no need to write software for application

network-core devices transport


network
data link application
❖ network-core devices do not physical transport
network
run user applications data link
physical

❖ applications on end systems


allows for rapid app
development, propagation

Application Layer 2-5


Application architectures
possible structure of applications:
❖ client-server
❖ peer-to-peer (P2P)

Application Layer 2-6


Client-server architecture
server:
❖ always-on host
❖ permanent IP address
❖ data centers for scaling

clients:
❖ communicate with server
client/server ❖ may be intermittently
connected
❖ may have dynamic IP
addresses
❖ do not communicate directly
with each other
Application Layer 2-7
P2P architecture
❖ no always-on server peer-peer
❖ arbitrary end systems
directly communicate
❖ peers request service from
other peers, provide service
in return to other peers
▪ self scalability – new
peers bring new service
capacity, as well as new
service demands
❖ peers are intermittently
connected and change IP
addresses
▪ complex management

Application Layer 2-8


Processes communicating
process: program running clients, servers
within a host client process: process that
❖ within same host, two initiates communication
processes communicate server process: process that
using inter-process waits to be contacted
communication (defined by
OS)
❖ processes in different hosts
communicate by exchanging ❖ aside: applications with P2P
messages architectures have client
processes & server
processes

Application Layer 2-9


Sockets
❖ process sends/receives messages to/from its socket
❖ socket analogous to door
▪ sending process shoves message out door
▪ sending process relies on transport infrastructure on
other side of door to deliver message to socket at
receiving process

application application
socket controlled by
process process app developer

transport transport
network network controlled
link by OS
link Internet
physical physical

Application Layer 2-10


Addressing processes
❖ to receive messages, ❖ identifier includes both IP
process must have identifier address and port numbers
❖ host device has unique 32- associated with process on
bit IP address host.
❖ Q: does IP address of host ❖ example port numbers:
on which process runs ▪ HTTP server: 80
suffice for identifying the ▪ mail server: 25
process? ❖ to send HTTP message to
▪ A: no, many processes gaia.cs.umass.edu web
can be running on same server:
host ▪ IP address: 128.119.245.12
▪ port number: 80
❖ more shortly…

Application Layer 2-11


App-layer protocol defines
❖ types of messages open protocols:
exchanged, ❖ defined in RFCs
▪ e.g., request, response ❖ allows for interoperability
❖ message syntax: ❖ e.g., HTTP, SMTP
▪ what fields in messages proprietary protocols:
& how fields are
❖ e.g., Skype
delineated
❖ message semantics
▪ meaning of information
in fields
❖ rules for when and how
processes send & respond
to messages

Application Layer 2-12


What transport service does an app need?
data integrity throughput
❖ some apps (e.g., file transfer, ❖ some apps (e.g.,
web transactions) require multimedia) require
100% reliable data transfer minimum amount of
❖ other apps (e.g., audio) can
throughput to be
tolerate some loss “effective”
❖ other apps (“elastic apps”)

timing make use of whatever


throughput they get
❖ some apps (e.g., Internet
telephony, interactive security
games) require low delay
to be “effective” ❖ encryption, data integrity,

Application Layer 2-13


Transport service requirements: common apps

application data loss throughput time sensitive

file transfer no loss elastic no


e-mail no loss elastic no
Web documents no loss elastic no
real-time audio/video loss-tolerant audio: 5kbps-1Mbps yes, 100’s
video:10kbps-5Mbps msec
stored audio/video loss-tolerant same as above
interactive games loss-tolerant few kbps up yes, few secs
text messaging no loss elastic yes, 100’s
msec
yes and no

Application Layer 2-14


Internet transport protocols services

TCP service: UDP service:


❖ reliable transport between ❖ unreliable data transfer
sending and receiving between sending and
process receiving process
❖ flow control: sender won’t ❖ does not provide:
overwhelm receiver
reliability, flow control,
❖ congestion control: throttle congestion control,
sender when network
overloaded timing, throughput
❖ does not provide: timing, guarantee, security,
minimum throughput orconnection setup,
guarantee, security
❖ connection-oriented: setup Q: why bother? Why is
required between client and there a UDP?
server processes

Application Layer 2-15


Internet apps: application, transport protocols

application underlying
application layer protocol transport protocol

e-mail SMTP [RFC 2821] TCP


remote terminal access Telnet [RFC 854] TCP
Web HTTP [RFC 2616] TCP
file transfer FTP [RFC 959] TCP
streaming multimedia HTTP (e.g., YouTube), TCP or UDP
RTP [RFC 1889]
Internet telephony SIP, RTP, proprietary
(e.g., Skype) TCP or UDP

Application Layer 2-16


Securing TCP

TCP & UDP SSL is at app layer


❖ no encryption ❖ Apps use SSL libraries,
❖ cleartext passwds sent which “talk” to TCP
into socket traverse SSL socket API
Internet in cleartext ❖ cleartext passwds sent
SSL into socket traverse
❖ provides encrypted Internet encrypted
TCP connection ❖ See Chapter 7
❖ data integrity
❖ end-point
authentication
Application Layer 2-17
Chapter 2: outline
2.1 principles of network 2.6 P2P applications
applications 2.7 socket programming
▪ app architectures with UDP and TCP
▪ app requirements
2.2 Web and HTTP
2.3 FTP
2.4 electronic mail
▪ SMTP, POP3, IMAP
2.5 DNS

Application Layer 2-18


Web and HTTP
First, a review…
❖ web page consists of objects
❖ object can be HTML file, JPEG image, Java applet,
audio file,…
❖ web page consists of base HTML-file which
includes several referenced objects
❖ each object is addressable by a URL, e.g.,
www.someschool.edu/someDept/pic.gif

host name path name

Application Layer 2-19


HTTP overview
HTTP: hypertext
transfer protocol
❖ Web’s application layer
protocol PC running
❖ client/server model Firefox browser

▪ client: browser that


requests, receives,
(using HTTP protocol) server
and “displays” Web running
objects Apache Web
▪ server: Web server server
sends (using HTTP
protocol) objects in iphone running
response to requests Safari browser

Application Layer 2-20


HTTP overview (continued)
uses TCP: HTTP is “stateless”
❖ client initiates TCP ❖ server maintains no
connection (creates information about
socket) to server, port 80 past client requests
❖ server accepts TCP
connection from client aside
protocols that maintain
❖ HTTP messages “state” are complex!
(application-layer protocol
❖ past history (state) must be
messages) exchanged maintained
between browser (HTTP ❖ if server/client crashes, their
client) and Web server views of “state” may be
(HTTP server) inconsistent, must be
❖ TCP connection closed reconciled

Application Layer 2-21


HTTP connections
non-persistent HTTP persistent HTTP
❖ at most one object ❖ multiple objects can
sent over TCP be sent over single
connection TCP connection
▪ connection then between client, server
closed
❖ downloading multiple
objects required
multiple connections

Application Layer 2-22


Non-persistent HTTP
suppose user enters URL: (contains text,
www.someSchool.edu/someDepartment/home.index references to 10
jpeg images)
1a. HTTP client initiates TCP
connection to HTTP server
(process) at 1b. HTTP server at host
www.someSchool.edu on port www.someSchool.edu waiting
80 for TCP connection at port 80.
“accepts” connection, notifying
2. HTTP client sends HTTP request client
message (containing URL) into
TCP connection socket. 3. HTTP server receives request
Message indicates that client message, forms response
wants object message containing requested
someDepartment/home.index object, and sends message into
its socket
time
Application Layer 2-23
Non-persistent HTTP (cont.)
4. HTTP server closes TCP
connection.
5. HTTP client receives response
message containing html file,
displays html. Parsing html file,
finds 10 referenced jpeg objects

time
6. Steps 1-5 repeated for each of
10 jpeg objects

Application Layer 2-24


Non-persistent HTTP: response time

RTT (definition): time for a


small packet to travel from
client to server and back
HTTP response time: initiate TCP
❖ one RTT to initiate TCP
connection

connection RTT

❖ one RTT for HTTP request


request
file
and first few bytes of HTTP RTT
time to
response to return transmit
file
❖ file transmission time file
received
❖ non-persistent HTTP
response time =
time time
2RTT+ file transmission
time

Application Layer 2-25


Persistent HTTP

non-persistent HTTP issues: persistent HTTP:


❖ requires 2 RTTs per object ❖ server leaves connection
❖ OS overhead for each TCP open after sending
connection response
❖ browsers often open ❖ subsequent HTTP
parallel TCP connections messages between same
to fetch referenced objects client/server sent over
open connection
❖ client sends requests as
soon as it encounters a
referenced object
❖ as little as one RTT for all
the referenced objects

Application Layer 2-26


HTTP request message

❖ two types of HTTP messages: request, response


❖ HTTP request message:
▪ ASCII (human-readable format)
carriage return character
line-feed character
request line
(GET, POST, GET /index.html HTTP/1.1\r\n
HEAD commands) Host: www-net.cs.umass.edu\r\n
User-Agent: Firefox/3.6.10\r\n
Accept: text/html,application/xhtml+xml\r\n
header Accept-Language: en-us,en;q=0.5\r\n
lines Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n
carriage return, Keep-Alive: 115\r\n
line feed at start Connection: keep-alive\r\n
\r\n
of line indicates
end of header lines
Application Layer 2-27
HTTP request message: general format

method sp URL sp version cr lf request


line
header field name value cr lf
header
~
~ ~
~ lines

header field name value cr lf


cr lf

~
~ entity body ~
~ body

Application Layer 2-28


Uploading form input
POST method:
❖ web page often includes
form input
❖ input is uploaded to
server in entity body

URL method:
❖ uses GET method
❖ input is uploaded in URL
field of request line:
www.somesite.com/animalsearch?monkeys&banana

Application Layer 2-29


Method types
HTTP/1.0: HTTP/1.1:
❖ GET ❖ GET, POST, HEAD
❖ POST ❖ PUT
❖ HEAD ▪ uploads file in entity
▪ asks server to leave body to path specified
requested object out in URL field
of response ❖ DELETE
▪ deletes file specified in
the URL field

Application Layer 2-30


HTTP response message
status line
(protocol
status code HTTP/1.1 200 OK\r\n
status phrase) Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n
Server: Apache/2.0.52 (CentOS)\r\n
Last-Modified: Tue, 30 Oct 2007 17:00:02
GMT\r\n
header ETag: "17dc6-a5c-bf716880"\r\n
Accept-Ranges: bytes\r\n
lines Content-Length: 2652\r\n
Keep-Alive: timeout=10, max=100\r\n
Connection: Keep-Alive\r\n
Content-Type: text/html; charset=ISO-8859-
1\r\n
\r\n
data, e.g., data data data data data ...
requested
HTML file
Application Layer 2-31
HTTP response status codes
❖ status code appears in 1st line in server-to-
client response message.
❖ some sample codes:
200 OK
▪ request succeeded, requested object later in this msg
301 Moved Permanently
▪ requested object moved, new location specified later in this msg
(Location:)
400 Bad Request
▪ request msg not understood by server
404 Not Found
▪ requested document not found on this server
505 HTTP Version Not Supported
Application Layer 2-32
Trying out HTTP (client side) for yourself
1. Telnet to your favorite Web server:

telnet cis.poly.edu 80 opens TCP connection to port 80


(default HTTP server port) at cis.poly.edu.
anything typed in sent
to port 80 at cis.poly.edu

2. type in a GET HTTP request:


GET /~ross/ HTTP/1.1 by typing this in (hit carriage
Host: cis.poly.edu return twice), you send
this minimal (but complete)
GET request to HTTP server

3. look at response message sent by HTTP server!


(or use Wireshark to look at captured HTTP request/response)
Application Layer 2-33
User-server state: cookies
example:
many Web sites use cookies ❖ Susan always access Internet
four components: from PC
1) cookie header line of ❖ visits specific e-commerce
HTTP response site for first time
message ❖ when initial HTTP requests
2) cookie header line in arrives at site, site creates:
next HTTP request ▪ unique ID
message ▪ entry in backend
3) cookie file kept on database for ID
user’s host, managed
by user’s browser
4) back-end database at
Web site
Application Layer 2-34
Cookies: keeping “state” (cont.)
client server

ebay 8734
usual http request msg Amazon server
cookie file creates ID
usual http response
1678 for user create backend
ebay 8734
set-cookie: 1678 entry database
amazon 1678
usual http request msg
cookie: 1678 cookie- access
specific
usual http response msg action

one week later:


access
ebay 8734 usual http request msg
amazon 1678 cookie: 1678 cookie-
specific
usual http response msg action
Application Layer 2-35
Cookies (continued)
aside
what cookies can be used cookies and privacy:
for: ❖ cookies permit sites to
❖ authorization learn a lot about you
❖ shopping carts
❖ you may supply name and
❖ recommendations
e-mail to sites
❖ user session state (Web
e-mail)

how to keep “state”:


❖ protocol endpoints: maintain state at
sender/receiver over multiple
transactions
❖ cookies: http messages carry state

Application Layer 2-36


Web caches (proxy server)
goal: satisfy client request without involving origin server
❖ user sets browser: Web
accesses via cache
❖ browser sends all HTTP proxy
requests to cache server
▪ object in cache: cache client
origin
returns object server
▪ else cache requests
object from origin
server, then returns
object to client
client origin
server

Application Layer 2-37


More about Web caching
❖ cache acts as both why Web caching?
client and server ❖ reduce response time
▪ server for original for client request
requesting client
▪ client to origin server ❖ reduce traffic on an
❖ typically cache is institution’s access link
installed by ISP ❖ Internet dense with
(university, company, caches: enables “poor”
residential ISP) content providers to
effectively deliver
content (so too does
P2P file sharing)

Application Layer 2-38


Caching example:
assumptions:
❖ avg object size: 100K bits origin
❖ avg request rate from browsers to servers
origin servers:15/sec public
❖ avg data rate to browsers: 1.50 Mbps Internet
❖ RTT from institutional router to any
origin server: 2 sec
❖ access link rate: 1.54 Mbps 1.54 Mbps
consequences: access link

❖ LAN utilization: 15% problem! institutional


network
❖ access link utilization = 99% 1 Gbps LAN
❖ total delay = Internet delay + access
delay + LAN delay
= 2 sec + minutes + usecs

Application Layer 2-39


Caching example: fatter access link
assumptions:
❖ avg object size: 100K bits origin
❖ avg request rate from browsers to servers
origin servers:15/sec public
❖ avg data rate to browsers: 1.50 Mbps Internet
❖ RTT from institutional router to any
origin server: 2 sec
❖ access link rate: 1.54 Mbps
154 Mbps 1.54 Mbps
154 Mbps
consequences: access link

❖ LAN utilization: 15% institutional


❖ access link utilization = 99% 9.9% network
1 Gbps LAN
❖ total delay = Internet delay + access
delay + LAN delay
= 2 sec + minutes + usecs
msecs
Cost: increased access link speed (not cheap!)
Application Layer 2-40
Caching example: install local cache
assumptions:
❖ avg object size: 100K bits origin
❖ avg request rate from browsers to servers
origin servers:15/sec public
❖ avg data rate to browsers: 1.50 Mbps Internet
❖ RTT from institutional router to any
origin server: 2 sec
❖ access link rate: 1.54 Mbps 1.54 Mbps
consequences: access link

❖ LAN utilization: 15% institutional


access link utilization = 100% network
❖ ? 1 Gbps LAN
❖ total delay = Internet
? delay + access
delay + LAN delay local web
How to compute link
= 2 sec + minutes + usecs cache
utilization, delay?
Cost: web cache (cheap!)
Application Layer 2-41
Caching example: install local cache
Calculating access link
utilization, delay with cache: origin
❖ suppose cache hit rate is 0.4 servers
▪ 40% requests satisfied at cache, public
Internet
60% requests satisfied at origin
❖ access link utilization:
▪ 60% of requests use access link
❖ data rate to browsers over access link 1.54 Mbps
= 0.6*1.50 Mbps = .9 Mbps access link
▪ utilization = 0.9/1.54 = .58 institutional
❖ total delay network
1 Gbps LAN
▪ = 0.6 * (delay from origin servers) +0.4
* (delay when satisfied at cache) local web
▪ = 0.6 (2.01) + 0.4 (~msecs) cache
▪ = ~ 1.2 secs
▪ less than with 154 Mbps link (and
cheaper too!)
Application Layer 2-42
Conditional GET
client server
❖ Goal: don’t send object if
cache has up-to-date
cached version HTTP request msg
object
If-modified-since: <date>
▪ no object transmission not
delay modified
▪ lower link utilization HTTP response
before
HTTP/1.0
❖ cache: specify date of 304 Not Modified
<date>
cached copy in HTTP
request
If-modified-since:
<date> HTTP request msg
❖ server: response contains If-modified-since: <date> object
modified
no object if cached copy after
HTTP response
is up-to-date: HTTP/1.0 200 OK <date>
HTTP/1.0 304 Not <data>
Modified
Application Layer 2-43
Chapter 2: outline
2.1 principles of network 2.6 P2P applications
applications 2.7 socket programming
▪ app architectures with UDP and TCP
▪ app requirements
2.2 Web and HTTP
2.3 FTP
2.4 electronic mail
▪ SMTP, POP3, IMAP
2.5 DNS

Application Layer 2-44


FTP: the file transfer protocol
file transfer
FTP FTP FTP
user client server
interface
user
at host remote file
local file system
system

❖ transfer file to/from remote host


❖ client/server model
▪ client: side that initiates transfer (either to/from remote)
▪ server: remote host
❖ ftp: RFC 959
❖ ftp server: port 21
Application Layer 2-45
FTP: separate control, data connections

❖ FTP client contacts FTP server TCP control connection,


server port 21
at port 21, using TCP
❖ client authorized over control TCP data connection,
connection FTP server port 20 FTP
client server
❖ client browses remote
directory, sends commands
over control connection ❖ server opens another TCP
data connection to transfer
❖ when server receives file another file
transfer command, server
opens 2nd TCP data ❖ control connection: “out of
connection (for file) to client band”
❖ after transferring one file, ❖ FTP server maintains
server closes data connection “state”: current directory,
earlier authentication

Application Layer 2-46


FTP commands, responses
sample commands: sample return codes
❖ sent as ASCII text over ❖ status code and phrase (as
control channel in HTTP)
❖ USER username ❖ 331 Username OK,
❖ PASS password password required
❖ LIST return list of file in ❖ 125 data
current directory connection
already open;
❖ RETR filename transfer starting
retrieves (gets) file ❖ 425 Can’t open
❖ STOR filename stores data connection
(puts) file onto remote ❖ 452 Error writing
host file

Application Layer 2-47


Chapter 2: outline
2.1 principles of network 2.6 P2P applications
applications 2.7 socket programming
▪ app architectures with UDP and TCP
▪ app requirements
2.2 Web and HTTP
2.3 FTP
2.4 electronic mail
▪ SMTP, POP3, IMAP
2.5 DNS

Application Layer 2-48


Electronic mail outgoing
message queue
user mailbox
Three major components: user
agent
❖ user agents
❖ mail servers mail user
server agent
❖ simple mail transfer
protocol: SMTP SMTP mail user
server agent

User Agent SMTP


❖ a.k.a. “mail reader” SMTP user
agent
❖ composing, editing, reading mail
server
mail messages user
❖ e.g., Outlook, Thunderbird, agent
iPhone mail client user
agent
❖ outgoing, incoming
messages stored on server
Application Layer 2-49
Electronic mail: mail servers
mail servers: user
agent
❖ mailbox contains incoming
messages for user mail user
server
❖ message queue of outgoing agent

(to be sent) mail messages SMTP mail user


❖ SMTP protocol between server agent
mail servers to send email SMTP
messages user
▪ client: sending mail SMTP
agent
mail
server server
▪ “server”: receiving mail user
agent
server
user
agent

Application Layer 2-50


Electronic Mail: SMTP [RFC 2821]
❖ uses TCP to reliably transfer email message from
client to server, port 25
❖ direct transfer: sending server to receiving
server
❖ three phases of transfer
▪ handshaking (greeting)
▪ transfer of messages
▪ closure
❖ command/response interaction (like HTTP, FTP)
▪ commands: ASCII text
▪ response: status code and phrase
❖ messages must be in 7-bit ASCI
Application Layer 2-51
Scenario: Alice sends message to Bob
1) Alice uses UA to compose 4) SMTP client sends Alice’s
message “to” message over the TCP
bob@someschool.edu connection
2) Alice’s UA sends message 5) Bob’s mail server places the
to her mail server; message message in Bob’s mailbox
placed in message queue 6) Bob invokes his user agent
3) client side of SMTP opens to read message
TCP connection with Bob’s
mail server

1 user mail user


mail agent
agent server server
2 3 6
4
5
Alice’s mail server Bob’s mail server
Application Layer 2-52
Sample SMTP interaction
S: 220 hamburger.edu
C: HELO crepes.fr
S: 250 Hello crepes.fr, pleased to meet you
C: MAIL FROM: <alice@crepes.fr>
S: 250 alice@crepes.fr... Sender ok
C: RCPT TO: <bob@hamburger.edu>
S: 250 bob@hamburger.edu ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 hamburger.edu closing connection

Application Layer 2-53


Try SMTP interaction for yourself:
❖ telnet servername 25
❖ see 220 reply from server
❖ enter HELO, MAIL FROM, RCPT TO, DATA, QUIT
commands

above lets you send email without using email client (reader)

Application Layer 2-54


SMTP: final words
❖ SMTP uses persistent comparison with HTTP:
connections
❖ HTTP: pull
❖ SMTP requires message
(header & body) to be in ❖ SMTP: push
7-bit ASCII ❖ both have ASCII
❖ SMTP server uses command/response
CRLF.CRLF to interaction, status codes
determine end of message
❖ HTTP: each object
encapsulated in its own
response msg
❖ SMTP: multiple objects
sent in multipart msg

Application Layer 2-55


Mail message format

SMTP: protocol for


exchanging email msgs header
blank
RFC 822: standard for text line
message format:
❖ header lines, e.g.,
▪ To: body
▪ From:
▪ Subject:
different from SMTP MAIL
FROM, RCPT TO:
commands!
❖ Body: the “message”
▪ ASCII characters only

Application Layer 2-56


Mail access protocols
user
mail access user
SMTP SMTP protocol
agent agent
(e.g., POP,
IMAP)

sender’s mail receiver’s mail


server server

❖ SMTP: delivery/storage to receiver’s server


❖ mail access protocol: retrieval from server
▪ POP: Post Office Protocol [RFC 1939]: authorization,
download
▪ IMAP: Internet Mail Access Protocol [RFC 1730]: more
features, including manipulation of stored msgs on
server
▪ HTTP: gmail, Hotmail, Yahoo! Mail, etc.

Application Layer 2-57


POP3 protocol
S: +OK POP3 server ready
C: user bob
authorization phase S:
C:
+OK
pass hungry
❖ client commands: S: +OK user successfully logged on
▪ user: declare username
▪ pass: password C: list
S: 1 498
❖ server responses
S: 2 912
▪ +OK S: .
▪ -ERR C: retr 1
transaction phase, client: S:
S:
<message 1 contents>
.
❖ list: list message numbers C: dele 1
❖ retr: retrieve message by C: retr 2
number S: <message 1 contents>
❖ dele: delete S: .
❖ quit C: dele 2
C: quit
S: +OK POP3 server signing off
Application Layer 2-58
POP3 (more) and IMAP
more about POP3 IMAP
❖ previous example uses ❖ keeps all messages in one
POP3 “download and place: at server
delete” mode ❖ allows user to organize
▪ Bob cannot re-read e- messages in folders
mail if he changes ❖ keeps user state across
client sessions:
❖ POP3 “download-and- ▪ names of folders and
keep”: copies of messages mappings between
on different clients message IDs and folder
❖ POP3 is stateless across name
sessions

Application Layer 2-59


Chapter 2: outline
2.1 principles of network 2.6 P2P applications
applications 2.7 socket programming
▪ app architectures with UDP and TCP
▪ app requirements
2.2 Web and HTTP
2.3 FTP
2.4 electronic mail
▪ SMTP, POP3, IMAP
2.5 DNS

Application Layer 2-60


DNS: domain name system
people: many identifiers: Domain Name System:
▪ SSN, name, passport # ❖ distributed database
Internet hosts, routers: implemented in hierarchy of
▪ IP address (32 bit) - many name servers
used for addressing ❖ application-layer protocol: hosts,
datagrams name servers communicate to
▪ “name”, e.g., resolve names (address/name
www.yahoo.com - translation)
used by humans ▪ note: core Internet function,
Q: how to map between IP implemented as application-
layer protocol
address and name, and
vice versa ? ▪ complexity at network’s
“edge”

Application Layer 2-61


DNS: services, structure
DNS services why not centralize DNS?
❖ hostname to IP address ❖ single point of failure
translation ❖ traffic volume
❖ host aliasing ❖ distant centralized database
▪ canonical, alias names ❖ maintenance
❖ mail server aliasing
❖ load distribution A: doesn’t scale!
▪ replicated Web
servers: many IP
addresses correspond
to one name

Application Layer 2-62


DNS: a distributed, hierarchical database
Root DNS Servers

… …

com DNS servers org DNS servers edu DNS servers

pbs.org poly.edu umass.edu


yahoo.com amazon.com
DNS servers DNS serversDNS servers
DNS servers DNS servers

client wants IP for www.amazon.com; 1st approx:


❖ client queries root server to find com DNS server
❖ client queries .com DNS server to get amazon.com DNS server
❖ client queries amazon.com DNS server to get IP address for
www.amazon.com

Application Layer 2-63


DNS: root name servers
❖ contacted by local name server that can not resolve name
❖ root name server:
▪ contacts authoritative name server if name mapping not known
▪ gets mapping
▪ returns mapping to local name server

c. Cogent, Herndon, VA (5 other sites)


d. U Maryland College Park, MD k. RIPE London (17 other sites)
h. ARL Aberdeen, MD
j. Verisign, Dulles VA (69 other sites ) i. Netnod, Stockholm (37 other sites)

e. NASA Mt View, CA m. WIDE Tokyo


f. Internet Software C. (5 other sites)
Palo Alto, CA (and 48 other
sites)

a. Verisign, Los Angeles CA 13 root name


(5 other sites)
b. USC-ISI Marina del Rey, CA
“servers”
l. ICANN Los Angeles, CA worldwide
(41 other sites)
g. US DoD Columbus,
OH (5 other sites)

Application Layer 2-64


TLD, authoritative servers
top-level domain (TLD) servers:
▪ responsible for com, org, net, edu, aero, jobs, museums,
and all top-level country domains, e.g.: uk, fr, ca, jp
▪ Network Solutions maintains servers for .com TLD
▪ Educause for .edu TLD
authoritative DNS servers:
▪ organization’s own DNS server(s), providing
authoritative hostname to IP mappings for organization’s
named hosts
▪ can be maintained by organization or service provider

Application Layer 2-65


Local DNS name server
❖ does not strictly belong to hierarchy
❖ each ISP (residential ISP, company, university) has
one
▪ also called “default name server”
❖ when host makes DNS query, query is sent to its
local DNS server
▪ has local cache of recent name-to-address translation
pairs (but may be out of date!)
▪ acts as proxy, forwards query into hierarchy

Application Layer 2-66


DNS name root DNS server
resolution example
2
❖ host at cis.poly.edu 3
TLD DNS server
wants IP address for 4
gaia.cs.umass.edu
5

iterated query: local DNS server


dns.poly.edu
❖ contacted server 7 6
1 8
replies with name of
server to contact
authoritative DNS server
❖ “I don’t know this dns.cs.umass.edu
name, but ask this requesting host
server” cis.poly.edu

gaia.cs.umass.edu

Application Layer 2-67


DNS name root DNS server
resolution example
2 3
recursive query: 7
6
❖ puts burden of name TLD DNS
server
resolution on
contacted name local DNS server
server dns.poly.edu 5 4

❖ heavy load at upper 1 8


levels of hierarchy?
authoritative DNS server
dns.cs.umass.edu
requesting host
cis.poly.edu

gaia.cs.umass.edu

Application Layer 2-68


DNS: caching, updating records
❖ once (any) name server learns mapping, it caches
mapping
▪ cache entries timeout (disappear) after some time (TTL)
▪ TLD servers typically cached in local name servers
• thus root name servers not often visited
❖ cached entries may be out-of-date (best effort
name-to-address translation!)
▪ if name host changes IP address, may not be known
Internet-wide until all TTLs expire
❖ update/notify mechanisms proposed IETF standard
▪ RFC 2136

Application Layer 2-69


DNS records
DNS: distributed db storing resource records (RR)
RR format: (name, value, type, ttl)

type=A type=CNAME
▪ name is hostname ▪ name is alias name for some
▪ value is IP address “canonical” (the real) name
type=NS ▪ www.ibm.com is really
▪ name is domain (e.g., servereast.backup2.ibm.com
foo.com) ▪ value is canonical name
▪ value is hostname of
authoritative name type=MX
server for this domain ▪ value is name of mailserver
associated with name

Application Layer 2-70


DNS protocol, messages
❖ query and reply messages, both with same message
format 2 bytes 2 bytes

msg header identification flags

❖ identification: 16 bit # for # questions # answer RRs


query, reply to query uses
# authority RRs # additional RRs
same #
❖ flags: questions (variable # of questions)
▪ query or reply
▪ recursion desired answers (variable # of RRs)
▪ recursion available
▪ reply is authoritative authority (variable # of RRs)

additional info (variable # of RRs)

Application Layer 2-71


DNS protocol, messages

2 bytes 2 bytes

identification flags

# questions # answer RRs

# authority RRs # additional RRs

name, type fields


questions (variable # of questions)
for a query
RRs in response answers (variable # of RRs)
to query
records for
authority (variable # of RRs)
authoritative servers
additional “helpful” additional info (variable # of RRs)
info that may be used
Application Layer 2-72
Inserting records into DNS
❖ example: new startup “Network Utopia”
❖ register name networkuptopia.com at DNS registrar
(e.g., Network Solutions)
▪ provide names, IP addresses of authoritative name server
(primary and secondary)
▪ registrar inserts two RRs into .com TLD server:
(networkutopia.com, dns1.networkutopia.com, NS)
(dns1.networkutopia.com, 212.212.212.1, A)
❖ create authoritative server type A record for
www.networkuptopia.com; type MX record for
networkutopia.com

Application Layer 2-73


Attacking DNS
DDoS attacks Redirect attacks
❖ Bombard root servers ❖ Man-in-middle
with traffic ▪ Intercept queries
▪ Not successful to date ❖ DNS poisoning
▪ Traffic Filtering ▪ Send bogus relies to
▪ Local DNS servers DNS server, which
cache IPs of TLD caches
servers, allowing root Exploit DNS for DDoS
server bypass
❖ Send queries with
❖ Bombard TLD servers
spoofed source
▪ Potentially more
dangerous address: target IP
❖ Requires amplification
Application Layer 2-74
Chapter 2: outline
2.1 principles of network 2.6 P2P applications
applications 2.7 socket programming
▪ app architectures with UDP and TCP
▪ app requirements
2.2 Web and HTTP
2.3 FTP
2.4 electronic mail
▪ SMTP, POP3, IMAP
2.5 DNS

Application Layer 2-75


Pure P2P architecture
❖ no always-on server
❖ arbitrary end systems
directly communicate
❖ peers are intermittently
connected and change IP
addresses
examples:
▪ file distribution
(BitTorrent)
▪ Streaming (KanKan)
▪ VoIP (Skype)

Application Layer 2-76


File distribution: client-server vs P2P
Question: how much time to distribute file (size F) from
one server to N peers?
▪ peer upload/download capacity is limited resource

us: server upload


capacity

di: peer i download


file, size F u1 d1 capacity
us u2 d2
server
di
uN network (with abundant
bandwidth) ui
dN
ui: peer i upload
capacity

Application Layer 2-77


File distribution time: client-server
❖ server transmission: must
sequentially send (upload) N F
us
file copies:
di
▪ time to send one copy: F/us
network
▪ time to send N copies: NF/us ui

❖ client: each client must


download file copy
▪ dmin = min client download rate
▪ min client download time: F/dmin

time to distribute F
to N clients using Dc-s > max{NF/us,,F/dmin}
client-server approach

increases linearly in N
Application Layer 2-78
File distribution time: P2P
❖ server transmission: must
upload at least one copy F
us
▪ time to send one copy: F/us
di
❖ client: each client must network
download file copy ui
▪ min client download time: F/dmin
❖ clients: as aggregate must download NF bits
▪ max upload rate (limting max download rate) is us + ui

time to distribute F
to N clients using DP2P > max{F/us,,F/dmin,,NF/(us + ui)}
P2P approach

increases linearly in N …
… but so does this, as each peer brings service capacity
Application Layer 2-79
Client-server vs. P2P: example
client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us

3.5
P2P
Minimum Distribution Time

3
Client-Server
2.5

1.5

0.5

0
0 5 10 15 20 25 30 35

N
Application Layer 2-80
P2P file distribution: BitTorrent
❖ file divided into 256Kb chunks
❖ peers in torrent send/receive file chunks

tracker: tracks peers torrent: group of peers


participating in torrent exchanging chunks of a file

Alice arrives …
… obtains list
of peers from tracker
… and begins exchanging
file chunks with peers in torrent

Application Layer 2-81


P2P file distribution: BitTorrent
❖ peer joining torrent:
▪ has no chunks, but will
accumulate them over time
from other peers
▪ registers with tracker to get
list of peers, connects to
subset of peers
(“neighbors”)
❖ while downloading, peer uploads chunks to other peers
❖ peer may change peers with whom it exchanges chunks
❖ churn: peers may come and go
❖ once peer has entire file, it may (selfishly) leave or
(altruistically) remain in torrent

Application Layer 2-82


BitTorrent: requesting, sending file chunks

requesting chunks: sending chunks: tit-for-tat


❖ at any given time, different ❖ Alice sends chunks to those
peers have different subsets four peers currently sending her
of file chunks chunks at highest rate
❖ periodically, Alice asks each ▪ other peers are choked by Alice
peer for list of chunks that (do not receive chunks from her)
they have ▪ re-evaluate top 4 every10 secs
❖ Alice requests missing ❖ every 30 secs: randomly select
chunks from peers, rarest another peer, starts sending
first chunks
▪ “optimistically unchoke” this peer
▪ newly chosen peer may join top 4

Application Layer 2-83


BitTorrent: tit-for-tat
(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers

higher upload rate: find better


trading partners, get file faster !

Application Layer 2-84


Distributed Hash Table (DHT)
❖ Hash table

❖ DHT paradigm

❖ Circular DHT and overlay networks

❖ Peer churn
Simple Database
Simple database with(key, value) pairs:
• key: human name; value: social security #
Key Value
John Washington 132-54-3570
Diana Louise Jones 761-55-3791
Xiaoming Liu 385-41-0902
Rakesh Gopal 441-89-1956
Linda Cohen 217-66-5609
……. ………
Lisa Kobayashi 177-23-0199

• key: movie title; value: IP address


Hash Table
• More convenient to store and search on
numerical representation of key
• key = hash(original key)
Original Key Key Value
John Washington 8962458 132-54-3570
Diana Louise Jones 7800356 761-55-3791
Xiaoming Liu 1567109 385-41-0902
Rakesh Gopal 2360012 441-89-1956
Linda Cohen 5430938 217-66-5609
……. ………
Lisa Kobayashi 9290124 177-23-0199
Distributed Hash Table (DHT)
❖ Distribute (key, value) pairs over millions of peers
▪ pairs are evenly distributed over peers
❖ Any peer can query database with a key
▪ database returns value for the key
▪ To resolve query, small number of messages exchanged among
peers
❖ Each peer only knows about a small number of other
peers
❖ Robust to peers coming and going (churn)
Assign key-value pairs to peers
❖ rule: assign key-value pair to the peer that has the
closest ID.
❖ convention: closest is the immediate successor of
the key.
❖ e.g., ID space {0,1,2,3,…,63}
❖ suppose 8 peers: 1,12,13,25,32,40,48,60
▪ If key = 51, then assigned to peer 60
▪ If key = 60, then assigned to peer 60
▪ If key = 61, then assigned to peer 1
Circular DHT
• each peer only aware of
immediate successor and
predecessor.

12
60

13
48
25
40
32 “overlay network”
Resolving a query

1 What is the value


associated with key 53 ?
value 12

60

13
48
O(N) messages 25
on avgerage to resolve
query, when there 40
32
are N peers
Circular DHT with shortcuts
1 What is the value for
value
key 53
12
60

13
48
25
40
32
• each peer keeps track of IP addresses of predecessor,
successor, short cuts.
• reduced from 6 to 3 messages.
• possible to design shortcuts with O(log N) neighbors, O(log N)
messages in query
Peer churn handling peer churn:
1
❖peers may come and go (churn)
❖each peer knows address of its
3 two successors
15
❖each peer periodically pings its
4 two successors to check aliveness
❖if immediate successor leaves,
12
5 choose next successor as new
immediate successor
10
8
example: peer 5 abruptly leaves
Peer churn handling peer churn:
1
❖peers may come and go (churn)
❖each peer knows address of its
3 two successors
15
❖each peer periodically pings its
4 two successors to check aliveness
❖if immediate successor leaves,
12
choose next successor as new
immediate successor
10
8
example: peer 5 abruptly leaves
❖peer 4 detects peer 5’s departure; makes 8 its immediate
successor
❖ 4 asks 8 who its immediate successor is; makes 8’s
immediate successor its second successor.
Chapter 2: outline
2.1 principles of network 2.6 P2P applications
applications 2.7 socket programming
▪ app architectures with UDP and TCP
▪ app requirements
2.2 Web and HTTP
2.3 FTP
2.4 electronic mail
▪ SMTP, POP3, IMAP
2.5 DNS

Application Layer 2-95


Socket programming
goal: learn how to build client/server applications that
communicate using sockets
socket: door between application process and end-
end-transport protocol

application application
socket controlled by
process process app developer

transport transport
network network controlled
link by OS
link Internet
physical physical

Application Layer 2-96


Socket programming
Two socket types for two transport services:
▪ UDP: unreliable datagram
▪ TCP: reliable, byte stream-oriented

Application Example:
1. Client reads a line of characters (data) from its
keyboard and sends the data to the server.
2. The server receives the data and converts
characters to uppercase.
3. The server sends the modified data to the client.
4. The client receives the modified data and displays
the line on its screen.
Application Layer 2-97
Socket programming with UDP
UDP: no “connection” between client & server
❖ no handshaking before sending data
❖ sender explicitly attaches IP destination address and
port # to each packet
❖ rcvr extracts sender IP address and port# from
received packet
UDP: transmitted data may be lost or received
out-of-order
Application viewpoint:
❖ UDP provides unreliable transfer of groups of bytes
(“datagrams”) between client and server

Application Layer 2-98


Client/server socket interaction: UDP

server (running on serverIP) client


create socket:
create socket, port= x: clientSocket =
serverSocket = socket(AF_INET,SOCK_DGRAM)
socket(AF_INET,SOCK_DGRAM)
Create datagram with server IP and
port=x; send datagram via
read datagram from clientSocket
serverSocket

write reply to
serverSocket read datagram from
specifying clientSocket
client address,
port number close
clientSocket

Application 2-99
Example app: UDP client
Python UDPClient
include Python’s socket
library
from socket import *
serverName = ‘hostname’
serverPort = 12000
create UDP socket for clientSocket = socket(socket.AF_INET,
server

get user keyboard


socket.SOCK_DGRAM)
input message = raw_input(’Input lowercase sentence:’)
Attach server name, port to
message; send into socket clientSocket.sendto(message,(serverName, serverPort))
read reply characters from modifiedMessage, serverAddress =
socket into string
clientSocket.recvfrom(2048)
print out received string print modifiedMessage
and close socket
clientSocket.close()

Application Layer 2-100


Example app: UDP server
Python UDPServer
from socket import *
serverPort = 12000
create UDP socket serverSocket = socket(AF_INET, SOCK_DGRAM)
bind socket to local port
number 12000
serverSocket.bind(('', serverPort))
print “The server is ready to receive”
loop forever while 1:
Read from UDP socket into message, clientAddress = serverSocket.recvfrom(2048)
message, getting client’s
address (client IP and port) modifiedMessage = message.upper()
send upper case string serverSocket.sendto(modifiedMessage, clientAddress)
back to this client

Application Layer 2-101


Socket programming with TCP
client must contact server ❖ when contacted by client,
❖ server process must first be server TCP creates new socket
running for server process to
❖ server must have created communicate with that
socket (door) that particular client
welcomes client’s contact ▪ allows server to talk with
multiple clients
client contacts server by: ▪ source port numbers used
❖ Creating TCP socket, to distinguish clients
specifying IP address, port (more in Chap 3)
number of server process
❖ when client creates socket: application viewpoint:
client TCP establishes TCP provides reliable, in-order
connection to server TCP byte-stream transfer (“pipe”)
between client and server

Application Layer 2-102


Client/server socket interaction: TCP
server (running on hostid) client
create socket,
port=x, for incoming
request:
serverSocket = socket()

wait for incoming create socket,


connection request
TCP connect to hostid, port=x
connectionSocket = connection setup clientSocket = socket()
serverSocket.accept()

send request using


read request from clientSocket
connectionSocket

write reply to
connectionSocket read reply from
clientSocket
close
connectionSocket close
clientSocket

Application Layer 2-103


Example app: TCP client
Python TCPClient
from socket import *
serverName = ’servername’
create TCP socket for
serverPort = 12000
server, remote port 12000
clientSocket = socket(AF_INET, SOCK_STREAM)
clientSocket.connect((serverName,serverPort))
sentence = raw_input(‘Input lowercase sentence:’)
No need to attach server clientSocket.send(sentence)
name, port
modifiedSentence = clientSocket.recv(1024)
print ‘From Server:’, modifiedSentence
clientSocket.close()

Application Layer 2-104


Example app: TCP server
Python TCPServer
from socket import *
create TCP welcoming serverPort = 12000
socket serverSocket = socket(AF_INET,SOCK_STREAM)
serverSocket.bind((‘’,serverPort))
server begins listening for
incoming TCP requests serverSocket.listen(1)
print ‘The server is ready to receive’
loop forever
while 1:
server waits on accept()
for incoming requests, new
connectionSocket, addr = serverSocket.accept()
socket created on return

sentence = connectionSocket.recv(1024)
read bytes from socket (but
not address as in UDP) capitalizedSentence = sentence.upper()
close connection to this connectionSocket.send(capitalizedSentence)
client (but not welcoming
socket) connectionSocket.close()
Application Layer 2-105
Chapter 2: summary
our study of network apps now complete!
❖ application architectures ❖ specific protocols:
▪ client-server ▪ HTTP
▪ P2P ▪ FTP
❖ application service
requirements: ▪ SMTP, POP, IMAP
▪ reliability, bandwidth, delay ▪ DNS
❖ Internet transport service ▪ P2P: BitTorrent, DHT
model ❖ socket programming: TCP,
▪ connection-oriented, UDP sockets
reliable: TCP
▪ unreliable, datagrams: UDP

Application Layer 2-106


Chapter 2: summary
most importantly: learned about protocols!

❖ typical request/reply important themes:


message exchange:
❖ control vs. data msgs
▪ client requests info or
service ▪ in-band, out-of-band
▪ server responds with ❖ centralized vs. decentralized
data, status code
❖ stateless vs. stateful
❖ message formats:
❖ reliable vs. unreliable msg
▪ headers: fields giving
info about data transfer
▪ data: info being ❖ “complexity at network
communicated edge”

Application Layer 2-107


Chapter 3
Transport Layer

A note on the use of these ppt slides:


We’re making these slides freely available to all (faculty, students, readers). Computer
They’re in PowerPoint form so you see the animations; and can add, modify,
and delete slides (including this one) and slide content to suit your needs. Networking: A Top
They obviously represent a lot of work on our part. In return for use, we only
ask the following: Down Approach
❖ If you use these slides (e.g., in a class) that you mention their source
(after all, we’d like people to use our book!)
6th edition
❖ If you post any slides on a www site, that you note that they are adapted Jim Kurose, Keith Ross
from (or perhaps identical to) our slides, and note our copyright of this
material.
Addison-Wesley
March 2012
Thanks and enjoy! JFK/KWR

All material copyright 1996-2012


J.F Kurose and K.W. Ross, All Rights Reserved

Transport Layer 3-1


Chapter 3: Transport Layer
our goals:
❖ understand ❖ learn about Internet
principles behind transport layer protocols:
transport layer ▪ UDP: connectionless
services: transport
▪ multiplexing, ▪ TCP: connection-oriented
demultiplexing reliable transport
▪ reliable data transfer ▪ TCP congestion control
▪ flow control
▪ congestion control

Transport Layer 3-2


Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and ▪ segment structure
demultiplexing ▪ reliable data transfer
3.3 connectionless ▪ flow control
transport: UDP ▪ connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-3


Transport services and protocols
application
transport
❖ provide logical communication network
data link
between app processes physical

running on different hosts


❖ transport protocols run in
end systems
▪ send side: breaks app
messages into segments,
passes to network layer
▪ rcv side: reassembles application
segments into messages, transport
network
passes to app layer data link
physical

❖ more than one transport


protocol available to apps
▪ Internet: TCP and UDP
Transport Layer 3-4
Transport vs. network layer
❖ network layer: logical household analogy:
communication
between hosts 12 kids in Ann’s house sending
letters to 12 kids in Bill’s
❖ transport layer: house:
logical ❖ hosts = houses
communication ❖ processes = kids

between processes ❖ app messages = letters in


envelopes
▪ relies on, enhances, ❖ transport protocol = Ann
network layer and Bill who demux to in-
services house siblings
❖ network-layer protocol =
postal service

Transport Layer 3-5


Internet transport-layer protocols
application
❖ reliable, in-order transport
network

delivery (TCP) data link


physical
network

▪ congestion control network


data link
data link
physical
physical
▪ flow control network
data link

▪ connection setup physical

network

❖ unreliable, unordered data link


physical

delivery: UDP network


data link
physical
▪ no-frills extension of network
data link application
“best-effort” IP physical
network
data link
transport
network
data link
❖ services not available: physical
physical

▪ delay guarantees
▪ bandwidth guarantees

Transport Layer 3-6


Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and ▪ segment structure
demultiplexing ▪ reliable data transfer
3.3 connectionless ▪ flow control
transport: UDP ▪ connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-7


Multiplexing/demultiplexing
multiplexing at sender:
handle data from multiple demultiplexing at receiver:
sockets, add transport header use header info to deliver
(later used for demultiplexing) received segments to correct
socket

application

application P1 P2 application socket


P3 transport P4
process
transport network transport
network link network
link physical link
physical physical

Transport Layer 3-8


How demultiplexing works
❖ host receives IP datagrams 32 bits
▪ each datagram has source IP source port # dest port #
address, destination IP
address
other header fields
▪ each datagram carries one
transport-layer segment
▪ each segment has source, application
destination port number data
❖ host uses IP addresses & (payload)
port numbers to direct
segment to appropriate
TCP/UDP segment format
socket

Transport Layer 3-9


Connectionless demultiplexing
❖ recall: created socket has ❖ recall: when creating
host-local port #: datagram to send into
DatagramSocket mySocket1 UDP socket, must specify
= new DatagramSocket(12534);
▪ destination IP address
▪ destination port #

❖ when host receives UDP IP datagrams with same


segment: dest. port #, but different
▪ checks destination port # source IP addresses
in segment and/or source port
numbers will be directed
▪ directs UDP segment to to same socket at dest
socket with that port #

Transport Layer 3-10


Connectionless demux: example
DatagramSocket
DatagramSocket serverSocket = new
DatagramSocket DatagramSocket
mySocket2 = new mySocket1 = new
DatagramSocket (6428); DatagramSocket
(9157); application
(5775);
application application
P1
P3 P4
transport
transport transport
network
network link network
link physical link
physical physical

source port: 6428 source port: ?


dest port: 9157 dest port: ?

source port: 9157 source port: ?


dest port: 6428 dest port: ?
Transport Layer 3-11
Connection-oriented demux
❖ TCP socket identified ❖ server host may support
by 4-tuple: many simultaneous TCP
▪ source IP address sockets:
▪ source port number ▪ each socket identified by
▪ dest IP address its own 4-tuple
▪ dest port number ❖ web servers have
❖ demux: receiver uses different sockets for
all four values to direct each connecting client
segment to appropriate ▪ non-persistent HTTP will
socket have different socket for
each request

Transport Layer 3-12


Connection-oriented demux: example

application
application P4 P5 P6 application
P3 P2 P3
transport
transport transport
network
network link network
link physical link
physical server: IP physical
address B

host: IP source IP,port: B,80 host: IP


address A dest IP,port: A,9157 source IP,port: C,5775 address C
dest IP,port: B,80
source IP,port: A,9157
dest IP, port: B,80
source IP,port: C,9157
dest IP,port: B,80
three segments, all destined to IP address: B,
dest port: 80 are demultiplexed to different sockets Transport Layer 3-13
Connection-oriented demux: example
threaded server
application
application application
P4
P3 P2 P3
transport
transport transport
network
network link network
link physical link
physical server: IP physical
address B

host: IP source IP,port: B,80 host: IP


address A dest IP,port: A,9157 source IP,port: C,5775 address C
dest IP,port: B,80
source IP,port: A,9157
dest IP, port: B,80
source IP,port: C,9157
dest IP,port: B,80

Transport Layer 3-14


Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and ▪ segment structure
demultiplexing ▪ reliable data transfer
3.3 connectionless ▪ flow control
transport: UDP ▪ connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-15


UDP: User Datagram Protocol [RFC 768]
❖ “no frills,” “bare bones” ❖ UDP use:
Internet transport ▪ streaming multimedia
protocol apps (loss tolerant, rate
❖ “best effort” service, sensitive)
UDP segments may be: ▪ DNS
▪ lost ▪ SNMP
▪ delivered out-of-order ❖ reliable transfer over
to app
UDP:
❖ connectionless:
▪ add reliability at
▪ no handshaking application layer
between UDP sender,
receiver ▪ application-specific error
recovery!
▪ each UDP segment
handled independently
of others
Transport Layer 3-16
UDP: segment header
length, in bytes of
32 bits UDP segment,
source port # dest port # including header

length checksum
why is there a UDP?
❖ no connection
application establishment (which can
data add delay)
(payload) ❖ simple: no connection
state at sender, receiver
❖ small header size
UDP segment format ❖ no congestion control:
UDP can blast away as
fast as desired

Transport Layer 3-17


UDP checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted
segment
sender: receiver:
❖ treat segment contents, ❖ compute checksum of
including header fields, received segment
as sequence of 16-bit ❖ check if computed
integers
checksum equals checksum
❖ checksum: addition field value:
(one’s complement
sum) of segment ▪ NO - error detected
contents ▪ YES - no error detected.
❖ sender puts checksum But maybe errors
value into UDP nonetheless? More later
checksum field ….
Transport Layer 3-18
Internet checksum: example
example: add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

Note: when adding numbers, a carryout from the most


significant bit needs to be added to the result

Transport Layer 3-19


Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and ▪ segment structure
demultiplexing ▪ reliable data transfer
3.3 connectionless ▪ flow control
transport: UDP ▪ connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-20


Principles of reliable data transfer
❖ important in application, transport, link layers
▪ top-10 list of important networking topics!

❖ characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)
Transport Layer 3-21
Principles of reliable data transfer
❖ important in application, transport, link layers
▪ top-10 list of important networking topics!

❖ characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)
Transport Layer 3-22
Principles of reliable data transfer
❖ important in application, transport, link layers
▪ top-10 list of important networking topics!

❖ characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)
Transport Layer 3-23
Reliable data transfer: getting started
rdt_send(): called from above, deliver_data(): called by
(e.g., by app.). Passed data to rdt to deliver data to upper
deliver to receiver upper layer

send receive
side side

udt_send(): called by rdt, rdt_rcv(): called when packet


to transfer packet over arrives on rcv-side of channel
unreliable channel to receiver

Transport Layer 3-24


Reliable data transfer: getting started
we’ll:
❖ incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
❖ consider only unidirectional data transfer
▪ but control info will flow on both directions!
❖ use finite state machines (FSM) to specify sender,
receiver
event causing state transition
actions taken on state transition
state: when in this
“state” next state state state
uniquely determined 1 event
by next event 2
actions

Transport Layer 3-25


rdt1.0: reliable transfer over a reliable channel
❖ underlying channel perfectly reliable
▪ no bit errors
▪ no loss of packets
❖ separate FSMs for sender, receiver:
▪ sender sends data into underlying channel
▪ receiver reads data from underlying channel

Wait for rdt_send(data) Wait for rdt_rcv(packet)


call from call from extract (packet,data)
above packet = make_pkt(data) below deliver_data(data)
udt_send(packet)

sender receiver

Transport Layer 3-26


rdt2.0: channel with bit errors
❖ underlying channel may flip bits in packet
▪ checksum to detect bit errors
❖ the question: how to recover from errors:
▪ acknowledgements (ACKs): receiver explicitly tells sender
that pkt received OK
▪ negative acknowledgements (NAKs): receiver explicitly tells
sender that pkt had errors
▪ sender
Howretransmits
do humanspkt on receipt from
recover of NAK“errors”
❖ new mechanisms in rdt2.0 (beyond rdt1.0):
▪ error detection
during conversation?
▪ receiver feedback: control msgs (ACK,NAK) rcvr-
>sender

Transport Layer 3-27


rdt2.0: channel with bit errors
❖ underlying channel may flip bits in packet
▪ checksum to detect bit errors
❖ the question: how to recover from errors:
▪ acknowledgements (ACKs): receiver explicitly tells sender
that pkt received OK
▪ negative acknowledgements (NAKs): receiver explicitly tells
sender that pkt had errors
▪ sender retransmits pkt on receipt of NAK
❖ new mechanisms in rdt2.0 (beyond rdt1.0):
▪ error detection
▪ feedback: control msgs (ACK,NAK) from receiver to
sender

Transport Layer 3-28


rdt2.0: FSM specification
rdt_send(data)
sndpkt = make_pkt(data, checksum) receiver
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for

call from
sender below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

Transport Layer 3-29


rdt2.0: operation with no errors
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
 call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

Transport Layer 3-30


rdt2.0: error scenario
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
 call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

Transport Layer 3-31


rdt2.0 has a fatal flaw!
what happens if handling duplicates:
ACK/NAK corrupted? ❖ sender retransmits
❖ sender doesn’t know current pkt if ACK/NAK
what happened at corrupted
receiver!
❖ sender adds sequence
❖ can’t just retransmit: number to each pkt
possible duplicate
❖ receiver discards (doesn’t
deliver up) duplicate pkt
stop and wait
sender sends one packet,
then waits for receiver
response

Transport Layer 3-32


rdt2.1: sender, handles garbled ACK/NAKs
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt) rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK or
isNAK(rcvpkt) )
call 0 from
NAK 0 udt_send(sndpkt)
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt) && notcorrupt(rcvpkt)
&& isACK(rcvpkt)


Wait for Wait for
ACK or call 1 from
rdt_rcv(rcvpkt) && NAK 1 above
( corrupt(rcvpkt) ||
isNAK(rcvpkt) ) rdt_send(data)

udt_send(sndpkt) sndpkt = make_pkt(1, data, checksum)


udt_send(sndpkt)

Transport Layer 3-33


rdt2.1: receiver, handles garbled ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
Wait for Wait for
rdt_rcv(rcvpkt) && 0 from 1 from rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && below below not corrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)

extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)

Transport Layer 3-34


rdt2.1: discussion
sender: receiver:
❖ seq # added to pkt ❖ must check if received
❖ two seq. #’s (0,1) will packet is duplicate
suffice. Why? ▪ state indicates whether
0 or 1 is expected pkt
❖ must check if received
seq #
ACK/NAK corrupted
❖ note: receiver can not
❖ twice as many states know if its last
▪ state must ACK/NAK received
“remember” whether OK at sender
“expected” pkt should
have seq # of 0 or 1

Transport Layer 3-35


rdt2.2: a NAK-free protocol
❖ same functionality as rdt2.1, using ACKs only
❖ instead of NAK, receiver sends ACK for last pkt
received OK
▪ receiver must explicitly include seq # of pkt being ACKed
❖ duplicate ACK at sender results in same action as
NAK: retransmit current pkt

Transport Layer 3-36


rdt2.2: sender, receiver fragments
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK isACK(rcvpkt,1) )
call 0 from
above 0 udt_send(sndpkt)
sender FSM
fragment rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && && isACK(rcvpkt,0)
(corrupt(rcvpkt) || 
has_seq1(rcvpkt)) Wait for receiver FSM
0 from
udt_send(sndpkt) below fragment
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
udt_send(sndpkt) Transport Layer 3-37
rdt3.0: channels with errors and loss

new assumption: approach: sender waits


underlying channel can “reasonable” amount of
also lose packets time for ACK
(data, ACKs) ❖ retransmits if no ACK
▪ checksum, seq. #, received in this time
ACKs, retransmissions ❖ if pkt (or ACK) just delayed
(not lost):
will be of help … but
not enough ▪ retransmission will be
duplicate, but seq. #’s
already handles this
▪ receiver must specify seq
# of pkt being ACKed
❖ requires countdown timer

Transport Layer 3-38


rdt3.0 sender
rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
udt_send(sndpkt) isACK(rcvpkt,1) )
rdt_rcv(rcvpkt) start_timer 
 Wait for Wait
for timeout
call 0from
ACK0 udt_send(sndpkt)
above
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt,1) && notcorrupt(rcvpkt)
stop_timer && isACK(rcvpkt,0)
stop_timer
Wait Wait for
timeout for call 1 from
udt_send(sndpkt) ACK1 above
start_timer rdt_rcv(rcvpkt)
rdt_send(data) 
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) || sndpkt = make_pkt(1, data, checksum)
isACK(rcvpkt,0) ) udt_send(sndpkt)
start_timer

Transport Layer 3-39


rdt3.0 in action
sender receiver sender receiver
send pkt0 pkt0 send pkt0 pkt0
rcv pkt0 rcv pkt0
ack0 send ack0 ack0 send ack0
rcv ack0 rcv ack0
send pkt1 pkt1 send pkt1 pkt1
rcv pkt1 X
ack1 send ack1 loss
rcv ack1
send pkt0 pkt0
rcv pkt0 timeout
ack0 send ack0 resend pkt1 pkt1
rcv pkt1
ack1 send ack1
rcv ack1
send pkt0 pkt0
(a) no loss rcv pkt0
ack0 send ack0

(b) packet loss


Transport Layer 3-40
rdt3.0 in action
sender receiver
sender receiver send pkt0 pkt0
send pkt0 pkt0 rcv pkt0
send ack0
rcv pkt0 ack0
send ack0 rcv ack0
ack0 send pkt1 pkt1
rcv ack0 rcv pkt1
send pkt1 pkt1
send ack1
rcv pkt1 ack1
ack1 send ack1
X
loss timeout
resend pkt1 pkt1
rcv pkt1
timeout
resend pkt1 pkt1 rcv ack1 (detect duplicate)
rcv pkt1 send pkt0
pkt0
send ack1
(detect duplicate) ack1
ack1 send ack1 rcv ack1 rcv pkt0
rcv ack1 send pkt0
ack0 send ack0
send pkt0 pkt0 pkt0
rcv pkt0
rcv pkt0 ack0 (detect duplicate)
ack0 send ack0 send ack0

(c) ACK loss (d) premature timeout/ delayed ACK

Transport Layer 3-41


Performance of rdt3.0
❖ rdt3.0 is correct, but performance stinks
❖ e.g.: 1 Gbps link, 15 ms prop. delay, 8000 bit packet:
L 8000 bits
Dtrans = R = = 8 microsecs
109 bits/sec

▪ U sender: utilization – fraction of time sender busy sending

U L/R .008
sender = = = 0.00027
RTT + L / R 30.008

▪ if RTT=30 msec, 1KB pkt every 30 msec: 33kB/sec thruput


over 1 Gbps link
❖ network protocol limits use of physical resources!
Transport Layer 3-42
rdt3.0: stop-and-wait operation
sender receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK

ACK arrives, send next


packet, t = RTT + L / R

U L/R .008
sender = = = 0.00027
RTT + L / R 30.008

Transport Layer 3-43


Pipelined protocols
pipelining: sender allows multiple, “in-flight”, yet-
to-be-acknowledged pkts
▪ range of sequence numbers must be increased
▪ buffering at sender and/or receiver

❖ two generic forms of pipelined protocols: go-Back-N,


selective repeat
Transport Layer 3-44
Pipelining: increased utilization
sender receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R
3-packet pipelining increases
utilization by a factor of 3!

U 3L / R .0024
sender = = = 0.00081
RTT + L / R 30.008

Transport Layer 3-45


Pipelined protocols: overview
Go-back-N: Selective Repeat:
❖ sender can have up to ❖ sender can have up to N
N unacked packets in unack’ed packets in
pipeline pipeline
❖ receiver only sends ❖ rcvr sends individual ack
cumulative ack for each packet
▪ doesn’t ack packet if
there’s a gap
❖ sender has timer for ❖ sender maintains timer
oldest unacked packet for each unacked packet
▪ when timer expires, ▪ when timer expires,
retransmit all unacked retransmit only that
packets unacked packet

Transport Layer 3-46


Go-Back-N: sender
❖ k-bit seq # in pkt header
❖ “window” of up to N, consecutive unack’ed pkts allowed

❖ ACK(n): ACKs all pkts up to, including seq # n - “cumulative


ACK”
▪ may receive duplicate ACKs (see receiver)
❖ timer for oldest in-flight pkt
❖ timeout(n): retransmit packet n and all higher seq # pkts in
window
Transport Layer 3-47
GBN: sender extended FSM
rdt_send(data)
if (nextseqnum < base+N) {
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum)
udt_send(sndpkt[nextseqnum])
if (base == nextseqnum)
start_timer
nextseqnum++
}
 else
refuse_data(data)
base=1
nextseqnum=1
timeout
start_timer
Wait
udt_send(sndpkt[base])
rdt_rcv(rcvpkt) udt_send(sndpkt[base+1])
&& corrupt(rcvpkt) …
udt_send(sndpkt[nextseqnum-1])
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
base = getacknum(rcvpkt)+1
If (base == nextseqnum)
stop_timer
else
start_timer
Transport Layer 3-48
GBN: receiver extended FSM
default
udt_send(sndpkt) rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
 && hasseqnum(rcvpkt,expectedseqnum)
expectedseqnum=1 Wait extract(rcvpkt,data)
sndpkt = deliver_data(data)
make_pkt(expectedseqnum,ACK,chksum) sndpkt = make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
expectedseqnum++

ACK-only: always send ACK for correctly-received


pkt with highest in-order seq #
▪ may generate duplicate ACKs
▪ need only remember expectedseqnum
❖ out-of-order pkt:
▪ discard (don’t buffer): no receiver buffering!
▪ re-ACK pkt with highest in-order seq #
Transport Layer 3-49
GBN in action
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
012345678 send pkt2 receive pkt0, send ack0
012345678 send pkt3 Xloss receive pkt1, send ack1
(wait)
receive pkt3, discard,
012345678 rcv ack0, send pkt4 (re)send ack1
012345678 rcv ack1, send pkt5 receive pkt4, discard,
(re)send ack1
ignore duplicate ACK receive pkt5, discard,
(re)send ack1
pkt 2 timeout
012345678 send pkt2
012345678 send pkt3
012345678 send pkt4 rcv pkt2, deliver, send ack2
012345678 send pkt5 rcv pkt3, deliver, send ack3
rcv pkt4, deliver, send ack4
rcv pkt5, deliver, send ack5

Transport Layer 3-50


Selective repeat
❖ receiver individually acknowledges all correctly
received pkts
▪ buffers pkts, as needed, for eventual in-order delivery
to upper layer
❖ sender only resends pkts for which ACK not
received
▪ sender timer for each unACKed pkt
❖ sender window
▪ N consecutive seq #’s
▪ limits seq #s of sent, unACKed pkts

Transport Layer 3-51


Selective repeat: sender, receiver windows

Transport Layer 3-52


Selective repeat
sender receiver
data from above: pkt n in [rcvbase, rcvbase+N-1]
❖ if next available seq # in ❖ send ACK(n)
window, send pkt ❖ out-of-order: buffer
timeout(n): ❖ in-order: deliver (also
❖ resend pkt n, restart deliver buffered, in-order
timer pkts), advance window to
next not-yet-received pkt
ACK(n) in [sendbase,sendbase+N]:
❖ mark pkt n as received
pkt n in [rcvbase-N,rcvbase-1]
❖ ACK(n)
❖ if n smallest unACKed
pkt, advance window base otherwise:
to next unACKed seq # ❖ ignore

Transport Layer 3-53


Selective repeat in action
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
012345678 send pkt2 receive pkt0, send ack0
012345678 send pkt3 Xloss receive pkt1, send ack1
(wait)
receive pkt3, buffer,
012345678 rcv ack0, send pkt4 send ack3
012345678 rcv ack1, send pkt5 receive pkt4, buffer,
send ack4
record ack3 arrived receive pkt5, buffer,
send ack5
pkt 2 timeout
012345678 send pkt2
012345678 record ack4 arrived
012345678 rcv pkt2; deliver pkt2,
record ack4 arrived
012345678 pkt3, pkt4, pkt5; send ack2

Q: what happens when ack2 arrives?

Transport Layer 3-54


sender window receiver window
Selective repeat: (after receipt) (after receipt)

dilemma 0123012 pkt0


pkt1
0123012 0123012
pkt2 0123012
example:
0123012
0123012
pkt3
seq #’s: 0, 1, 2, 3
0123012
❖ X
0123012
❖ window size=3 pkt0 will accept packet
with seq number 0
(a) no problem
❖ receiver sees no
difference in two receiver can’t see sender side.
scenarios! receiver behavior identical in both cases!
something’s (very) wrong!
❖ duplicate data
accepted as new in 0123012 pkt0
(b) 0123012 pkt1 0123012
0123012 pkt2 0123012
X 0123012
Q: what relationship X
between seq # size timeout
retransmit pkt0 X
and window size to 0123012 pkt0
will accept packet
avoid problem in (b)? with seq number 0
(b) oops!
Transport Layer 3-55
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and ▪ segment structure
demultiplexing ▪ reliable data transfer
3.3 connectionless ▪ flow control
transport: UDP ▪ connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-56


TCP: Overview RFCs: 793,1122,1323, 2018, 2581

❖ point-to-point: ❖ full duplex data:


▪ one sender, one receiver ▪ bi-directional data flow
❖ reliable, in-order byte in same connection
steam: ▪ MSS: maximum segment
size
▪ no “message
boundaries” ❖ connection-oriented:
❖ pipelined: ▪ handshaking (exchange
of control msgs) inits
▪ TCP congestion and sender, receiver state
flow control set window before data exchange
size
❖ flow controlled:
▪ sender will not
overwhelm receiver
Transport Layer 3-57
TCP segment structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by bytes
sequence number of data
ACK: ACK #
valid acknowledgement number (not segments!)
head not
PSH: push data now len used
UAP R S F receive window
(generally not used) # bytes
checksum Urg data pointer
rcvr willing
RST, SYN, FIN: to accept
options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)

Transport Layer 3-58


TCP seq. numbers, ACKs
outgoing segment from sender
sequence numbers: source port # dest port #
sequence number
▪byte stream “number” of acknowledgement number

first byte in segment’s rwnd

data
checksum urg pointer

window size
acknowledgements: N

▪seq # of next byte


expected from other side sender sequence number space
▪cumulative ACK
sent sent, not- usable not
Q: how receiver handles ACKed yet ACKed but not usable
out-of-order segments (“in-
flight”)
yet sent

▪A: TCP spec doesn’t say, incoming segment to sender


- up to implementor source port # dest port #
sequence number
acknowledgement number
A rwnd
checksum urg pointer

Transport Layer 3-59


TCP seq. numbers, ACKs
Host A Host B

User
types
‘C’ Seq=42, ACK=79, data = ‘C’
host ACKs
receipt of
‘C’, echoes
Seq=79, ACK=43, data = ‘C’ back ‘C’
host ACKs
receipt
of echoed
‘C’ Seq=43, ACK=80

simple telnet scenario

Transport Layer 3-60


TCP round trip time, timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value? ❖ SampleRTT: measured
time from segment
❖ longer than RTT transmission until ACK
▪ but RTT varies receipt
❖ too short: premature ▪ ignore retransmissions
timeout, unnecessary ❖ SampleRTT will vary, want
retransmissions estimated RTT “smoother”
▪ average several recent
❖ too long: slow reaction measurements, not just
to segment loss current SampleRTT

Transport Layer 3-61


TCP round trip time, timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
❖ exponential weighted moving average
❖ influence of past sample decreases exponentially fast
❖ typical value:  = 0.125 RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

350

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr


RTT (milliseconds)

300

250
RTT (milliseconds)

200

sampleRTT
150

EstimatedRTT

100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
time (seconds) Transport Layer 3-62
SampleRTT Estimated RTT
TCP round trip time, timeout
❖ timeout interval: EstimatedRTT plus “safety margin”
▪ large variation in EstimatedRTT -> larger safety margin
❖ estimate SampleRTT deviation from EstimatedRTT:
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically,  = 0.25)

TimeoutInterval = EstimatedRTT + 4*DevRTT

estimated RTT “safety margin”

Transport Layer 3-63


Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and ▪ segment structure
demultiplexing ▪ reliable data transfer
3.3 connectionless ▪ flow control
transport: UDP ▪ connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-64


TCP reliable data transfer
❖ TCP creates rdt service
on top of IP’s unreliable
service
▪ pipelined segments
▪ cumulative acks let’s initially consider
▪ single retransmission simplified TCP sender:
timer ▪ ignore duplicate acks
❖ retransmissions ▪ ignore flow control,
triggered by: congestion control
▪ timeout events
▪ duplicate acks

Transport Layer 3-65


TCP sender events:
data rcvd from app: timeout:
❖ create segment with ❖ retransmit segment
seq # that caused timeout
❖ seq # is byte-stream ❖ restart timer
number of first data ack rcvd:
byte in segment ❖ if ack acknowledges
❖ start timer if not previously unacked
already running segments
▪ think of timer as for ▪ update what is known
oldest unacked to be ACKed
segment
▪ start timer if there are
▪ expiration interval: still unacked segments
TimeOutInterval

Transport Layer 3-66


TCP sender (simplified)
data received from application above
create segment, seq. #: NextSeqNum
pass segment to IP (i.e., “send”)
NextSeqNum = NextSeqNum + length(data)
if (timer currently not running)
 start timer
NextSeqNum = InitialSeqNum wait
SendBase = InitialSeqNum for
event timeout
retransmit not-yet-acked segment
with smallest seq. #
start timer
ACK received, with ACK field value y
if (y > SendBase) {
SendBase = y
/* SendBase–1: last cumulatively ACKed byte */
if (there are currently not-yet-acked segments)
start timer
else stop timer
} Transport Layer 3-67
TCP: retransmission scenarios
Host A Host B Host A Host B

SendBase=92
Seq=92, 8 bytes of data Seq=92, 8 bytes of data

Seq=100, 20 bytes of data


timeout

timeout
ACK=100
X
ACK=100
ACK=120

Seq=92, 8 bytes of data Seq=92, 8


SendBase=100 bytes of data
SendBase=120
ACK=100
ACK=120

SendBase=120

lost ACK scenario premature timeout


Transport Layer 3-68
TCP: retransmission scenarios
Host A Host B

Seq=92, 8 bytes of data

Seq=100, 20 bytes of data


timeout

ACK=100
X
ACK=120

Seq=120, 15 bytes of data

cumulative ACK
Transport Layer 3-69
TCP ACK generation [RFC 1122, RFC 2581]

event at receiver TCP receiver action


arrival of in-order segment with delayed ACK. Wait up to 500ms
expected seq #. All data up to for next segment. If no next segment,
expected seq # already ACKed send ACK

arrival of in-order segment with immediately send single cumulative


expected seq #. One other ACK, ACKing both in-order segments
segment has ACK pending

arrival of out-of-order segment immediately send duplicate ACK,


higher-than-expect seq. # . indicating seq. # of next expected byte
Gap detected

arrival of segment that immediate send ACK, provided that


partially or completely fills gap segment starts at lower end of gap

Transport Layer 3-70


TCP fast retransmit
❖ time-out period often
relatively long: TCP fast retransmit
▪ long delay before if sender receives 3
resending lost packet ACKs for same data
❖ detect lost segments (“triple
(“triple duplicate
duplicate ACKs”),
ACKs”),
via duplicate ACKs. resend unacked
▪ sender often sends segment with smallest
many segments back- seq #
to-back
▪ likely that unacked
▪ if segment is lost, there segment lost, so don’t
will likely be many wait for timeout
duplicate ACKs.

Transport Layer 3-71


TCP fast retransmit
Host A Host B

Seq=92, 8 bytes of data


Seq=100, 20 bytes of data
X

ACK=100
timeout

ACK=100
ACK=100
ACK=100
Seq=100, 20 bytes of data

fast retransmit after sender


receipt of triple duplicate ACK
Transport Layer 3-72
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and ▪ segment structure
demultiplexing ▪ reliable data transfer
3.3 connectionless ▪ flow control
transport: UDP ▪ connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-73


TCP flow control
application
application may process
remove data from application
TCP socket buffers ….
TCP socket OS
receiver buffers
… slower than TCP
receiver is delivering
(sender is sending) TCP
code

IP
flow control code
receiver controls sender, so
sender won’t overflow
receiver’s buffer by transmitting from sender
too much, too fast
receiver protocol stack

Transport Layer 3-74


TCP flow control
❖ receiver “advertises” free
buffer space by including to application process
rwnd value in TCP header
of receiver-to-sender
segments RcvBuffer buffered data
▪ RcvBuffer size set via
socket options (typical default rwnd free buffer space
is 4096 bytes)
▪ many operating systems
autoadjust RcvBuffer TCP segment payloads
❖ sender limits amount of
unacked (“in-flight”) data to receiver-side buffering
receiver’s rwnd value
❖ guarantees receive buffer
will not overflow
Transport Layer 3-75
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and ▪ segment structure
demultiplexing ▪ reliable data transfer
3.3 connectionless ▪ flow control
transport: UDP ▪ connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-76


Connection Management
before exchanging data, sender/receiver “handshake”:
❖ agree to establish connection (each knowing the other willing
to establish connection)
❖ agree on connection parameters

application application

connection state: ESTAB connection state: ESTAB


connection variables: connection Variables:
seq # client-to-server seq # client-to-server
server-to-client server-to-client
rcvBuffer size rcvBuffer size
at server,client at server,client

network network

Socket clientSocket = Socket connectionSocket =


newSocket("hostname","port welcomeSocket.accept();
number");

Transport Layer 3-77


Agreeing to establish a connection

2-way handshake:
Q: will 2-way handshake
always work in
network?
Let’s talk
ESTAB ❖ variable delays
OK
ESTAB ❖ retransmitted messages
(e.g. req_conn(x)) due to
message loss
❖ message reordering
choose x
req_conn(x)
❖ can’t “see” other side
ESTAB
acc_conn(x)
ESTAB

Transport Layer 3-78


Agreeing to establish a connection
2-way handshake failure scenarios:

choose x choose x
req_conn(x) req_conn(x)
ESTAB ESTAB
retransmit acc_conn(x) retransmit acc_conn(x)
req_conn(x) req_conn(x)

ESTAB ESTAB
data(x+1) accept
req_conn(x)
retransmit data(x+1)
data(x+1)
connection connection
client x completes server x completes server
client
terminates forgets x terminates forgets x
req_conn(x)

ESTAB ESTAB
data(x+1) accept
half open connection! data(x+1)
(no client!)
Transport Layer 3-79
TCP 3-way handshake

client state server state


LISTEN LISTEN
choose init seq num, x
send TCP SYN msg
SYNSENT SYNbit=1, Seq=x
choose init seq num, y
send TCP SYNACK
msg, acking SYN SYN RCVD
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
received SYNACK(x)
ESTAB indicates server is live;
send ACK for SYNACK;
this segment may contain ACKbit=1, ACKnum=y+1
client-to-server data
received ACK(y)
indicates client is live
ESTAB

Transport Layer 3-80


TCP 3-way handshake: FSM

closed

Socket connectionSocket =
welcomeSocket.accept();

 Socket clientSocket =
SYN(x) newSocket("hostname","port
number");
SYNACK(seq=y,ACKnum=x+1)
create new socket for SYN(seq=x)
communication back to client listen

SYN SYN
rcvd sent

SYNACK(seq=y,ACKnum=x+1)
ESTAB ACK(ACKnum=y+1)
ACK(ACKnum=y+1)

Transport Layer 3-81


TCP: closing a connection
❖ client, server each close their side of connection
▪ send TCP segment with FIN bit = 1
❖ respond to received FIN with ACK
▪ on receiving FIN, ACK can be combined with own FIN
❖ simultaneous FIN exchanges can be handled

Transport Layer 3-82


TCP: closing a connection
client state server state
ESTAB ESTAB
clientSocket.close()
FIN_WAIT_1 can no longer FINbit=1, seq=x
send but can
receive data CLOSE_WAIT
ACKbit=1; ACKnum=x+1
can still
FIN_WAIT_2 wait for server send data
close

LAST_ACK
FINbit=1, seq=y
TIMED_WAIT can no longer
send data
ACKbit=1; ACKnum=y+1
timed wait
for 2*max CLOSED
segment lifetime

CLOSED

Transport Layer 3-83


Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and ▪ segment structure
demultiplexing ▪ reliable data transfer
3.3 connectionless ▪ flow control
transport: UDP ▪ connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-84


Principles of congestion control
congestion:
❖ informally: “too many sources sending too much
data too fast for network to handle”
❖ different from flow control!
❖ manifestations:
▪ lost packets (buffer overflow at routers)
▪ long delays (queueing in router buffers)
❖ a top-10 problem!

Transport Layer 3-85


Causes/costs of congestion: scenario 1
original data: in throughput: out
❖ two senders, two
receivers Host A

❖ one router, infinite unlimited shared


buffers output link buffers

❖ output link capacity: R


❖ no retransmission
Host B

R/2

delay
out

in R/2 in R/2


❖ maximum per-connection ❖ large delays as arrival rate, in,
throughput: R/2 approaches capacity
Transport Layer 3-86
Causes/costs of congestion: scenario 2
❖ one router, finite buffers
❖ sender retransmission of timed-out packet
▪ application-layer input = application-layer output: in =
out
▪ transport-layer input includes retransmissions : ‘in in

in : original data


out
'in: original data, plus
retransmitted data

Host A

finite shared output


Host B
link buffers
Transport Layer 3-87
Causes/costs of congestion: scenario 2
R/2
idealization: perfect
knowledge

out
❖ sender sends only when
router buffers available
in R/2

in : original data


out
copy 'in: original data, plus
retransmitted data

A free buffer space!

finite shared output


Host B
link buffers
Transport Layer 3-88
Causes/costs of congestion: scenario 2
Idealization: known loss
packets can be lost,
dropped at router due
to full buffers
❖ sender only resends if
packet known to be lost

in : original data


out
copy 'in: original data, plus
retransmitted data

A
no buffer space!

Host B
Transport Layer 3-89
Causes/costs of congestion: scenario 2
Idealization: known loss R/2
packets can be lost,
dropped at router due when sending at R/2,
some packets are

out
to full buffers retransmissions but

sender only resends if


asymptotic goodput
❖ is still R/2 (why?)
packet known to be lost in R/2

in : original data


out
'in: original data, plus
retransmitted data

A
free buffer space!

Host B
Transport Layer 3-90
Causes/costs of congestion: scenario 2
Realistic: duplicates R/2
❖ packets can be lost, dropped
at router due to full buffers when sending at R/2,
some packets are

out
❖ sender times out prematurely, retransmissions

sending two copies, both of including duplicated


that are delivered!
which are delivered in R/2

in
timeout
copy 'in out

A
free buffer space!

Host B
Transport Layer 3-91
Causes/costs of congestion: scenario 2
Realistic: duplicates R/2
❖ packets can be lost, dropped
at router due to full buffers when sending at R/2,
some packets are

out
❖ sender times out prematurely, retransmissions

sending two copies, both of including duplicated


that are delivered!
which are delivered in R/2

“costs” of congestion:
❖ more work (retrans) for given “goodput”
❖ unneeded retransmissions: link carries multiple copies of pkt
▪ decreasing goodput

Transport Layer 3-92


Causes/costs of congestion: scenario 3
❖ four senders Q: what happens as in and in’
increase ?
❖ multihop paths
A: as red in’ increases, all arriving
❖ timeout/retransmit blue pkts at upper queue are
dropped, blue throughput  0
Host A
in : original data out
Host B
'in: original data, plus
retransmitted data
finite shared output
link buffers

Host D
Host C

Transport Layer 3-93


Causes/costs of congestion: scenario 3

C/2
out

in’ C/2

another “cost” of congestion:


❖ when packet dropped, any “upstream
transmission capacity used for that packet was
wasted!

Transport Layer 3-94


Approaches towards congestion control

two broad approaches towards congestion control:

end-end congestion network-assisted


control: congestion control:
❖ no explicit feedback ❖ routers provide
from network feedback to end systems
❖ congestion inferred ▪ single bit indicating
from end-system congestion (SNA,
observed loss, delay DECbit, TCP/IP ECN,
❖ approach taken by ATM)
TCP ▪ explicit rate for
sender to send at

Transport Layer 3-95


Case study: ATM ABR congestion control

ABR: available bit rate: RM (resource management)


❖ “elastic service” cells:
❖ if sender’s path ❖ sent by sender, interspersed
“underloaded”: with data cells
▪ sender should use ❖ bits in RM cell set by switches
available bandwidth (“network-assisted”)
❖ if sender’s path ▪ NI bit: no increase in rate
congested: (mild congestion)
▪ sender throttled to ▪ CI bit: congestion
minimum guaranteed indication
rate ❖ RM cells returned to sender
by receiver, with bits intact

Transport Layer 3-96


Case study: ATM ABR congestion control

RM cell data cell

❖ two-byte ER (explicit rate) field in RM cell


▪ congested switch may lower ER value in cell
▪ senders’ send rate thus max supportable rate on path
❖ EFCI bit in data cells: set to 1 in congested switch
▪ if data cell preceding RM cell has EFCI set, receiver sets
CI bit in returned RM cell
Transport Layer 3-97
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and ▪ segment structure
demultiplexing ▪ reliable data transfer
3.3 connectionless ▪ flow control
transport: UDP ▪ connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-98


TCP congestion control: additive increase
multiplicative decrease
❖ approach: sender increases transmission rate (window
size), probing for usable bandwidth, until loss occurs
▪ additive increase: increase cwnd by 1 MSS every
RTT until loss detected
▪ multiplicative decrease: cut cwnd in half after loss
additively increase window size …
…. until loss occurs (then cut window in half)
congestion window size
cwnd: TCP sender

AIMD saw tooth


behavior: probing
for bandwidth

time
Transport Layer 3-99
TCP Congestion Control: details
sender sequence number space
cwnd TCP sending rate:
❖ roughly: send cwnd
bytes, wait RTT for
last byte last byte
ACKS, then send
ACKed sent, not-
yet ACKed
sent more bytes
(“in-
flight”) cwnd
❖ sender limits transmission: rate ~
~
RTT
bytes/sec

LastByteSent- < cwnd


LastByteAcked

❖ cwnd is dynamic, function


of perceived network
congestion
Transport Layer 3-100
TCP Slow Start
Host A Host B
❖ when connection begins,
increase rate
exponentially until first
loss event:

RTT
▪ initially cwnd = 1 MSS
▪ double cwnd every RTT
▪ done by incrementing
cwnd for every ACK
received
❖ summary: initial rate is
slow but ramps up
exponentially fast time

Transport Layer 3-101


TCP: detecting, reacting to loss
❖ loss indicated by timeout:
▪ cwnd set to 1 MSS;
▪ window then grows exponentially (as in slow start)
to threshold, then grows linearly
❖ loss indicated by 3 duplicate ACKs: TCP RENO
▪ dup ACKs indicate network capable of delivering
some segments
▪ cwnd is cut in half window then grows linearly
❖ TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)

Transport Layer 3-102


TCP: switching from slow start to CA
Q: when should the
exponential
increase switch to
linear?
A: when cwnd gets
to 1/2 of its value
before timeout.

Implementation:
❖ variable ssthresh
❖ on loss event, ssthresh
is set to 1/2 of cwnd just
before loss event

Transport Layer 3-103


Summary: TCP Congestion Control
New
New ACK!
ACK! new ACK
duplicate ACK
dupACKcount++ new ACK
.
cwnd = cwnd + MSS (MSS/cwnd)
dupACKcount = 0
cwnd = cwnd+MSS transmit new segment(s), as allowed
dupACKcount = 0
 transmit new segment(s), as allowed
cwnd = 1 MSS
ssthresh = 64 KB cwnd > ssthresh
dupACKcount = 0 slow  congestion
start timeout avoidance
ssthresh = cwnd/2
cwnd = 1 MSS duplicate ACK
timeout dupACKcount = 0 dupACKcount++
ssthresh = cwnd/2 retransmit missing segment
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
timeout
New
ACK!
ssthresh = cwnd/2
cwnd = 1 New ACK
dupACKcount = 0
cwnd = ssthresh dupACKcount == 3
dupACKcount == 3 retransmit missing segment dupACKcount = 0
ssthresh= cwnd/2 ssthresh= cwnd/2
cwnd = ssthresh + 3 cwnd = ssthresh + 3
retransmit missing segment retransmit missing segment
fast
recovery
duplicate ACK
cwnd = cwnd + MSS
transmit new segment(s), as allowed

Transport Layer 3-104


TCP throughput
❖ avg. TCP thruput as function of window size, RTT?
▪ ignore slow start, assume always data to send
❖ W: window size (measured in bytes) where loss occurs
▪ avg. window size (# in-flight bytes) is ¾ W
▪ avg. thruput is 3/4W per RTT
3 W
avg TCP thruput = bytes/sec
4 RTT

W/2

Transport Layer 3-105


TCP Futures: TCP over “long, fat pipes”

❖ example: 1500 byte segments, 100ms RTT, want


10 Gbps throughput
❖ requires W = 83,333 in-flight segments
❖ throughput in terms of segment loss probability, L
[Mathis 1997]:
1.22 . MSS
TCP throughput =
RTT L

➜ to achieve 10 Gbps throughput, need a loss rate of L


= 2·10-10 – a very small loss rate!
❖ new versions of TCP for high-speed

Transport Layer 3-106


TCP Fairness
fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K

TCP connection 1

bottleneck
router
capacity R
TCP connection 2

Transport Layer 3-107


Why is TCP fair?
two competing sessions:
❖ additive increase gives slope of 1, as throughout increases
❖ multiplicative decrease decreases throughput proportionally

R equal bandwidth share

loss: decrease window by factor of 2


congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase

Connection 1 throughput R
Transport Layer 3-108
Fairness (more)
Fairness and UDP Fairness, parallel TCP
❖ multimedia apps often connections
do not use TCP ❖ application can open
▪ do not want rate multiple parallel
throttled by congestion connections between two
control
hosts
❖ instead use UDP:
❖ web browsers do this
▪ send audio/video at
constant rate, tolerate ❖ e.g., link of rate R with 9
packet loss existing connections:
▪ new app asks for 1 TCP, gets rate
R/10
▪ new app asks for 11 TCPs, gets R/2

Transport Layer 3-109


Chapter 3: summary
❖ principles behind
transport layer services:
▪ multiplexing,
demultiplexing next:
❖ leaving the
▪ reliable data transfer
network “edge”
▪ flow control (application,
▪ congestion control transport layers)
❖ instantiation, ❖ into the network
implementation in the “core”
Internet
▪ UDP
▪ TCP
Transport Layer 3-110
Chapter 4
Network Layer

A note on the use of these ppt slides:


We’re making these slides freely available to all (faculty, students, readers). Computer
They’re in PowerPoint form so you see the animations; and can add, modify,
and delete slides (including this one) and slide content to suit your needs. Networking: A Top
They obviously represent a lot of work on our part. In return for use, we only
ask the following: Down Approach
❖ If you use these slides (e.g., in a class) that you mention their source
(after all, we’d like people to use our book!)
6th edition
❖ If you post any slides on a www site, that you note that they are adapted Jim Kurose, Keith Ross
from (or perhaps identical to) our slides, and note our copyright of this
material.
Addison-Wesley
March 2012
Thanks and enjoy! JFK/KWR

All material copyright 1996-2012


J.F Kurose and K.W. Ross, All Rights Reserved

Network Layer 4-1


Chapter 4: network layer
chapter goals:
❖ understand principles behind network layer
services:
▪ network layer service models
▪ forwarding versus routing
▪ how a router works
▪ routing (path selection)
▪ broadcast, multicast
❖ instantiation, implementation in the Internet

Network Layer 4-2


Chapter 4: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ datagram format ▪ RIP
▪ IPv4 addressing ▪ OSPF
▪ ICMP ▪ BGP
▪ IPv6 4.7 broadcast and multicast
routing

Network Layer 4-3


Network layer
application

❖ transport segment from transport


network

sending to receiving host data link


physical
network network

❖ on sending side network


data link
data link
physical
data link
physical

encapsulates segments physical network


data link
network
data link

into datagrams physical physical

❖ on receiving side, delivers network


data link
network
data link

segments to transport
physical physical
network
data link

layer network
physical
application
transport
❖ network layer protocols network
data link
physical
network
data link
network
data link

in every host, router data link


physical
physical physical

❖ router examines header


fields in all IP datagrams
passing through it
Network Layer 4-4
Two key network-layer functions
❖ forwarding: move packets analogy:
from router’s input to
appropriate router ❖ routing: process of
output planning trip from source
to dest
❖ routing: determine route
taken by packets from ❖ forwarding: process of
source to dest. getting through single
interchange
▪ routing algorithms

Network Layer 4-5


Interplay between routing and forwarding

routing algorithm routing algorithm determines


end-end-path through network

local forwarding table forwarding table determines


header value output link local forwarding at this router
0100 3
0101 2
0111 2
1001 1

value in arriving
packet’s header
0111 1

3 2

Network Layer 4-6


Connection setup
❖ 3rd important function in some network
architectures:
▪ ATM, frame relay, X.25
❖ before datagrams flow, two end hosts and
intervening routers establish virtual connection
▪ routers get involved
❖ network vs transport layer connection service:
▪ network: between two hosts (may also involve intervening
routers in case of VCs)
▪ transport: between two processes

Network Layer 4-7


Network service model
Q: What service model for “channel” transporting
datagrams from sender to receiver?
example services for example services for a flow
individual datagrams: of datagrams:
❖ guaranteed delivery ❖ in-order datagram
❖ guaranteed delivery with delivery
less than 40 msec delay ❖ guaranteed minimum
bandwidth to flow
❖ restrictions on changes in
inter-packet spacing

Network Layer 4-8


Network layer service models:
Guarantees ?
Network Service Congestion
Architecture Model Bandwidth Loss Order Timing feedback

Internet best effort none no no no no (inferred


via loss)
ATM CBR constant yes yes yes no
rate congestion
ATM VBR guaranteed yes yes yes no
rate congestion
ATM ABR guaranteed no yes no yes
minimum
ATM UBR none no yes no no

Network Layer 4-9


Chapter 4: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ datagram format ▪ RIP
▪ IPv4 addressing ▪ OSPF
▪ ICMP ▪ BGP
▪ IPv6 4.7 broadcast and multicast
routing

Network Layer 4-10


Connection, connection-less service
❖ datagram network provides network-layer
connectionless service
❖ virtual-circuit network provides network-layer
connection service
❖ analogous to TCP/UDP connecton-oriented /
connectionless transport-layer services, but:
▪ service: host-to-host
▪ no choice: network provides one or the other
▪ implementation: in network core

Network Layer 4-11


Virtual circuits
“source-to-dest path behaves much like telephone
circuit”
▪ performance-wise
▪ network actions along source-to-dest path

❖ call setup, teardown for each call before data can flow
❖ each packet carries VC identifier (not destination host
address)
❖ every router on source-dest path maintains “state” for
each passing connection
❖ link, router resources (bandwidth, buffers) may be
allocated to VC (dedicated resources = predictable
service)
Network Layer 4-12
VC implementation
a VC consists of:
1. path from source to destination
2. VC numbers, one number for each link along path
3. entries in forwarding tables in routers along path
❖ packet belonging to VC carries VC number
(rather than dest address)
❖ VC number can be changed on each link.
▪ new VC number comes from forwarding table

Network Layer 4-13


VC forwarding table
12 22 32

1 3
2
VC number
interface
forwarding table in number
northwest router:
Incoming interface Incoming VC # Outgoing interface Outgoing VC #

1 12 3 22
2 63 1 18
3 7 2 17
1 97 3 87
… … … …

VC routers maintain connection state information!


Network Layer 4-14
Virtual circuits: signaling protocols
❖ used to setup, maintain teardown VC
❖ used in ATM, frame-relay, X.25
❖ not used in today’s Internet

application application
transport 5. data flow begins 6. receive data
transport
network 4. call connected 3. accept call network
data link 1. initiate call 2. incoming call data link
physical physical

Network Layer 4-15


Datagram networks
❖ no call setup at network layer
❖ routers: no state about end-to-end connections
▪ no network-level concept of “connection”
❖ packets forwarded using destination host address

application application
transport transport
network 1. send datagrams
2. receive datagrams network
data link data link
physical physical

Network Layer 4-16


Datagram forwarding table
4 billion IP addresses, so
routing algorithm rather than list individual
destination address
local forwarding table
list range of addresses
dest address output link (aggregate table entries)
address-range 1 3
address-range 2 2
address-range 3 2
address-range 4 1

IP destination address in
arriving packet’s header
1
3 2

Network Layer 4-17


Datagram forwarding table
Destination Address Range
Link Interface
11001000 00010111 00010000 00000000
through 0
11001000 00010111 00010111 11111111

11001000 00010111 00011000 00000000


through 1
11001000 00010111 00011000 11111111

11001000 00010111 00011001 00000000


through 2
11001000 00010111 00011111 11111111

otherwise 3

Q: but what happens if ranges don’t divide up so nicely?


Network Layer 4-18
Longest prefix matching
longest prefix matching
when looking for forwarding table entry for given
destination address, use longest address prefix that
matches destination address.

Destination Address Range Link interface


11001000 00010111 00010*** ********* 0

11001000 00010111 00011000 ********* 1

11001000 00010111 00011*** ********* 2


3
otherwise

examples:
DA: 11001000 00010111 00010110 10100001 which interface?
DA: 11001000 00010111 00011000 10101010 which interface?
Network Layer 4-19
Datagram or VC network: why?
Internet (datagram) ATM (VC)
❖ data exchange among ❖ evolved from telephony
computers ❖ human conversation:
▪ “elastic” service, no strict ▪ strict timing, reliability
timing req. requirements
▪ need for guaranteed service
❖ many link types ❖ “dumb” end systems
▪ different characteristics ▪ telephones
▪ uniform service difficult ▪ complexity inside
❖ “smart” end systems network
(computers)
▪ can adapt, perform control,
error recovery
▪ simple inside network,
complexity at “edge”

Network Layer 4-20


Chapter 4: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ datagram format ▪ RIP
▪ IPv4 addressing ▪ OSPF
▪ ICMP ▪ BGP
▪ IPv6 4.7 broadcast and multicast
routing

Network Layer 4-21


Router architecture overview
two key router functions:
❖ run routing algorithms/protocol (RIP, OSPF, BGP)
❖ forwarding datagrams from incoming to outgoing link

forwarding tables computed, routing


pushed to input ports routing, management
processor
control plane (software)

forwarding data
plane (hardware)

high-seed
switching
fabric

router input ports router output ports


Network Layer 4-22
Input port functions
lookup,
link forwarding
line layer switch
termination protocol fabric
(receive)
queueing

physical layer:
bit-level reception
data link layer: decentralized switching:
e.g., Ethernet ❖ given datagram dest., lookup output port
see chapter 5 using forwarding table in input port
memory
❖ goal: complete input port processing at
‘line speed’
❖ queuing: if datagrams arrive faster than
forwarding rate into switch fabric
Network Layer 4-23
Switching fabrics
❖ transfer packet from input buffer to appropriate
output buffer
❖ switching rate: rate at which packets can be
transferred from inputs to outputs
▪ often measured as multiple of input/output line rate
▪ N inputs: switching rate N times line rate desirable
❖ three types of switching fabrics

memory

memory bus crossbar

Network Layer 4-24


Switching via memory
first generation routers:
❖ traditional computers with switching under direct control
of CPU
❖ packet copied to system’s memory
❖ CPU extracts dest address from packet’s header, looks up
output port in forwarding table, copies to output port
❖ speed limited by memory bandwidth (2 bus crossings per
datagram)
❖ one packet at a time

input output
port memory port
(e.g., (e.g.,
Ethernet) Ethernet)

system bus

Network Layer 4-25


Switching via a bus
❖ datagram from input port memory
to output port memory via a
shared bus
❖ bus contention: switching speed
limited by bus bandwidth
❖ one packet a time bus
❖ 32 Gbps bus, Cisco 5600: sufficient
speed for access and enterprise
routers

Network Layer 4-26


Switching via interconnection network
❖ forwards multiple packets in parallel
❖ banyan networks, crossbar, other
interconnection nets initially
developed to connect processors in A

multiprocessor B

❖ When packet from port A needs to C

forwarded to port Y, controller crossbar


closes cross point at intersection of
two buses X Y Z
❖ advanced design: fragmenting
datagram into fixed length cells,
switch cells through the fabric.

Network Layer 4-27


Output ports

datagram
switch buffer link
fabric layer line
protocol termination
(send)
queueing

❖ buffering required when datagrams arrive from


fabric faster than the transmission rate
❖ scheduling discipline chooses among queued
datagrams for transmission

Network Layer 4-28


Output port queueing

switch
switch
fabric
fabric

at t, packets more one packet time later


from input to output

❖ suppose Rswitch is N times faster than Rline


❖ still have output buffering when multiple inputs
send to same output
❖ queueing (delay) and loss due to output port buffer
overflow! Network Layer 4-29
How much buffering?
❖ RFC 3439 rule of thumb: average buffering equal
to “typical” RTT (say 250 msec) times link
capacity C
▪ e.g., C = 10 Gpbs link: 2.5 Gbit buffer
❖ recent recommendation: with N flows, buffering
equal to
RTT . C
N

Network Layer 4-30


Input port queuing
❖ fabric slower than input ports combined queuing may
occur at input queues
▪ queuing delay and loss due to input buffer overflow!
❖ Head-of-the-Line (HOL) blocking: queued datagram at front
of queue prevents others in queue from moving forward

switch switch
fabric fabric

output port contention: one packet time later:


only one red datagram can be green packet
transferred. experiences HOL
lower red packet is blocked blocking

Network Layer 4-31


Chapter 4: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ datagram format ▪ RIP
▪ IPv4 addressing ▪ OSPF
▪ ICMP ▪ BGP
▪ IPv6 4.7 broadcast and multicast
routing

Network Layer 4-32


The Internet network layer
host, router network layer functions:

transport layer: TCP, UDP

routing protocols IP protocol


• path selection • addressing conventions
• RIP, OSPF, BGP • datagram format
network • packet handling conventions
layer forwarding
table
ICMP protocol
• error reporting
• router
“signaling”
link layer

physical layer

Network Layer 4-33


IP datagram format
IP protocol version
number 32 bits total datagram
header length head. type of length (bytes)
(bytes) ver length
len service for
“type” of data fragment fragmentation/
16-bit identifier flgs
offset reassembly
max number time to upper header
remaining hops live layer checksum
(decremented at
each router) 32 bit source IP address

upper layer protocol 32 bit destination IP address


to deliver payload to e.g. timestamp,
options (if any)
record route
how much overhead? data taken, specify
(variable length, list of routers
❖ 20 bytes of TCP
typically a TCP to visit.
❖ 20 bytes of IP
or UDP segment)
❖ = 40 bytes + app
layer overhead

Network Layer 4-34


IP fragmentation, reassembly
❖ network links have MTU
(max.transfer size) -
largest possible link-level fragmentation:
frame


in: one large datagram
▪ different link types, out: 3 smaller datagrams
different MTUs
❖ large IP datagram divided
(“fragmented”) within net reassembly
▪ one datagram becomes
several datagrams
▪ “reassembled” only at …
final destination
▪ IP header bits used to
identify, order related
fragments
Network Layer 4-35
IP fragmentation, reassembly
length ID fragflag offset
example: =4000 =x =0 =0
❖ 4000 byte datagram
one large datagram becomes
❖ MTU = 1500 bytes several smaller datagrams

1480 bytes in length ID fragflag offset


data field =1500 =x =1 =0

offset = length ID fragflag offset


1480/8 =1500 =x =1 =185

length ID fragflag offset


=1040 =x =0 =370

Network Layer 4-36


Chapter 4: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ datagram format ▪ RIP
▪ IPv4 addressing ▪ OSPF
▪ ICMP ▪ BGP
▪ IPv6 4.7 broadcast and multicast
routing

Network Layer 4-37


IP addressing: introduction
223.1.1.1
❖ IP address: 32-bit
identifier for host, router
223.1.2.1

interface 223.1.1.2
223.1.1.4 223.1.2.9
❖ interface: connection
between host/router and 223.1.3.27
physical link 223.1.1.3
223.1.2.2
▪ routers typically have
multiple interfaces
▪ host typically has one
active interface (e.g., wired 223.1.3.1 223.1.3.2

Ethernet, wireless 802.11)


❖ one IP address associated
with each interface 223.1.1.1 = 11011111 00000001 00000001 00000001

223 1 1 1

Network Layer 4-38


IP addressing: introduction
223.1.1.1
Q: how are interfaces
actually connected?
223.1.2.1

A: we’ll learn about that 223.1.1.2


223.1.1.4 223.1.2.9

in chapter 5, 6.
223.1.3.27
223.1.1.3
223.1.2.2

A: wired Ethernet interfaces


connected by Ethernet switches
223.1.3.1 223.1.3.2

For now: don’t need to worry


about how one interface is
connected to another (with no
A: wireless WiFi interfaces
intervening router)
connected by WiFi base station

Network Layer 4-39


Subnets
❖ IP address: 223.1.1.1
▪subnet part - high order
bits 223.1.1.2 223.1.2.1
223.1.1.4 223.1.2.9
▪host part - low order
bits 223.1.2.2
❖ what ’s a subnet ? 223.1.1.3 223.1.3.27

▪device interfaces with subnet


same subnet part of IP
address 223.1.3.1 223.1.3.2

▪can physically reach


each other without
intervening router network consisting of 3 subnets

Network Layer 4-40


Subnets
223.1.1.0/24
223.1.2.0/24
recipe 223.1.1.1

❖ to determine the 223.1.1.2 223.1.2.1


subnets, detach each 223.1.1.4 223.1.2.9

interface from its host 223.1.2.2


or router, creating 223.1.1.3 223.1.3.27

islands of isolated subnet


networks
❖ each isolated network 223.1.3.1 223.1.3.2

is called a subnet
223.1.3.0/24

subnet mask: /24


Network Layer 4-41
Subnets 223.1.1.2

how many? 223.1.1.1 223.1.1.4

223.1.1.3

223.1.9.2 223.1.7.0

223.1.9.1 223.1.7.1
223.1.8.1 223.1.8.0

223.1.2.6 223.1.3.27

223.1.2.1 223.1.2.2 223.1.3.1 223.1.3.2

Network Layer 4-42


IP addressing: CIDR
CIDR: Classless InterDomain Routing
▪ subnet portion of address of arbitrary length
▪ address format: a.b.c.d/x, where x is # bits in
subnet portion of address

subnet host
part part
11001000 00010111 00010000 00000000
200.23.16.0/23

Network Layer 4-43


IP addresses: how to get one?
Q: how does network get subnet part of IP addr?
A: gets allocated portion of its provider ISP’s address
space

ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20

Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23


Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23
Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23
... ….. …. ….
Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23

Network Layer 4-44


Hierarchical addressing: route aggregation
hierarchical addressing allows efficient advertisement of routing
information:

Organization 0
200.23.16.0/23
Organization 1
“Send me anything
200.23.18.0/23 with addresses
Organization 2 beginning
200.23.20.0/23 . Fly-By-Night-ISP 200.23.16.0/20”
.
. . Internet
.
Organization 7 .
200.23.30.0/23
“Send me anything
ISPs-R-Us
with addresses
beginning
199.31.0.0/16”

Network Layer 4-45


Hierarchical addressing: more specific routes

ISPs-R-Us has a more specific route to Organization 1

Organization 0
200.23.16.0/23

“Send me anything
with addresses
Organization 2 beginning
200.23.20.0/23 . Fly-By-Night-ISP 200.23.16.0/20”
.
. . Internet
.
Organization 7 .
200.23.30.0/23
“Send me anything
ISPs-R-Us
with addresses
Organization 1 beginning 199.31.0.0/16
or 200.23.18.0/23”
200.23.18.0/23

Network Layer 4-46


IP addressing: how to get a block?

Q: how does an ISP get block of addresses?


A: ICANN: Internet Corporation for Assigned
Names and Numbers http://www.icann.org/
▪ allocates addresses
▪ manages DNS
▪ assigns domain names, resolves disputes

Network Layer 4-47


IP addresses: how to get one?
Q: How does a host get IP address?

❖ hard-coded by system admin in a file


▪ Windows: control-panel->network->configuration-
>tcp/ip->properties
▪ UNIX: /etc/rc.config
❖ DHCP: Dynamic Host Configuration Protocol:
dynamically get address from as server
▪ “plug-and-play”

Network Layer 4-48


DHCP: Dynamic Host Configuration Protocol
goal: allow host to dynamically obtain its IP address from network
server when it joins network
▪ can renew its lease on address in use
▪ allows reuse of addresses (only hold address while
connected/“on”)
▪ support for mobile users who want to join network (more
shortly)
DHCP overview:
▪ host broadcasts “DHCP discover” msg [optional]
▪ DHCP server responds with “DHCP offer” msg [optional]
▪ host requests IP address: “DHCP request” msg
▪ DHCP server sends address: “DHCP ack” msg

Network Layer 4-49


DHCP client-server scenario

DHCP
223.1.1.0/24
server
223.1.1.1 223.1.2.1

223.1.1.2 arriving DHCP


223.1.1.4 223.1.2.9
client needs
address in this
223.1.3.27
223.1.2.2 network
223.1.1.3

223.1.2.0/24

223.1.3.1 223.1.3.2

223.1.3.0/24

Network Layer 4-50


DHCP client-server scenario
DHCP server: 223.1.2.5 DHCP discover arriving
client
src : 0.0.0.0, 68
dest.: 255.255.255.255,67
yiaddr: 0.0.0.0
transaction ID: 654

DHCP offer
src: 223.1.2.5, 67
dest: 255.255.255.255, 68
yiaddrr: 223.1.2.4
transaction ID: 654
lifetime: 3600 secs
DHCP request
src: 0.0.0.0, 68
dest:: 255.255.255.255, 67
yiaddrr: 223.1.2.4
transaction ID: 655
lifetime: 3600 secs

DHCP ACK
src: 223.1.2.5, 67
dest: 255.255.255.255, 68
yiaddrr: 223.1.2.4
transaction ID: 655
lifetime: 3600 secs

Network Layer 4-51


DHCP: more than IP addresses
DHCP returns:
▪ IP address
▪ address of first-hop router for client
▪ name and IP address of DNS sever
▪ network mask (indicating network versus host portion
of address)

Network Layer 4-52


DHCP: example
DHCP DHCP ❖ connecting laptop needs
DHCP UDP its IP address, addr of
IP
first-hop router, addr of
DHCP

DHCP Eth
Phy DNS server: use DHCP
DHCP request encapsulated
DHCP

in UDP, encapsulated in IP,
DHCP DHCP 168.1.1.1 encapsulated in 802.3
DHCP UDP Ethernet
IP
Ethernet frame broadcast
DHCP

DHCP Eth router with DHCP
Phy server built into (dest: FFFFFFFFFFFF) on LAN,
router received at router running
DHCP server
❖ Ethernet demuxed to IP
demuxed, UDP demuxed to
DHCP

Network Layer 4-53


DHCP: example
DHCP DHCP ❖ DCP server formulates
DHCP UDP DHCP ACK containing
DHCP IP client’s IP address, IP
DHCP Eth address of first-hop
Phy router for client, name &
IP address of DNS server
❖ encapsulation of DHCP
DHCP DHCP server, frame forwarded
DHCP UDP to client, demuxing up to
DHCP IP DHCP at client
DHCP Eth router with DHCP
DHCP
Phy server built into ❖ client now knows its IP
router address, name and IP
address of DSN server, IP
address of its first-hop
router

Network Layer 4-54


DHCP: Wireshark Message type: Boot Reply (2)
reply
output (home LAN) Hardware type: Ethernet
Hardware address length: 6
Hops: 0
Transaction ID: 0x6b3a11b7
Seconds elapsed: 0
Message type: Boot Request (1) Bootp flags: 0x0000 (Unicast)
Hardware type: Ethernet Client IP address: 192.168.1.101 (192.168.1.101)
Hardware address length: 6 Your (client) IP address: 0.0.0.0 (0.0.0.0)
Hops: 0
Transaction ID: 0x6b3a11b7
request Next server IP address: 192.168.1.1 (192.168.1.1)
Relay agent IP address: 0.0.0.0 (0.0.0.0)
Seconds elapsed: 0 Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a)
Bootp flags: 0x0000 (Unicast) Server host name not given
Client IP address: 0.0.0.0 (0.0.0.0) Boot file name not given
Your (client) IP address: 0.0.0.0 (0.0.0.0) Magic cookie: (OK)
Next server IP address: 0.0.0.0 (0.0.0.0) Option: (t=53,l=1) DHCP Message Type = DHCP ACK
Relay agent IP address: 0.0.0.0 (0.0.0.0) Option: (t=54,l=4) Server Identifier = 192.168.1.1
Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a) Option: (t=1,l=4) Subnet Mask = 255.255.255.0
Server host name not given Option: (t=3,l=4) Router = 192.168.1.1
Boot file name not given Option: (6) Domain Name Server
Magic cookie: (OK) Length: 12; Value: 445747E2445749F244574092;
Option: (t=53,l=1) DHCP Message Type = DHCP Request IP Address: 68.87.71.226;
Option: (61) Client identifier IP Address: 68.87.73.242;
Length: 7; Value: 010016D323688A; IP Address: 68.87.64.146
Hardware type: Ethernet Option: (t=15,l=20) Domain Name = "hsd1.ma.comcast.net."
Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a)
Option: (t=50,l=4) Requested IP Address = 192.168.1.101
Option: (t=12,l=5) Host Name = "nomad"
Option: (55) Parameter Request List
Length: 11; Value: 010F03062C2E2F1F21F92B
1 = Subnet Mask; 15 = Domain Name
3 = Router; 6 = Domain Name Server
44 = NetBIOS over TCP/IP Name Server
……

Network Layer 4-55


NAT: network address translation
rest of local network
Internet (e.g., home network)
10.0.0/24 10.0.0.1

10.0.0.4
10.0.0.2
138.76.29.7

10.0.0.3

all datagrams leaving local datagrams with source or


network have same single destination in this network
source NAT IP address: have 10.0.0/24 address for
138.76.29.7,different source source, destination (as usual)
port numbers
Network Layer 4-56
NAT: network address translation
motivation: local network uses just one IP address as far
as outside world is concerned:
▪ range of addresses not needed from ISP: just one
IP address for all devices
▪ can change addresses of devices in local network
without notifying outside world
▪ can change ISP without changing addresses of
devices in local network
▪ devices inside local net not explicitly addressable,
visible by outside world (a security plus)

Network Layer 4-57


NAT: network address translation
implementation: NAT router must:

▪ outgoing datagrams: replace (source IP address, port #) of


every outgoing datagram to (NAT IP address, new port #)
. . . remote clients/servers will respond using (NAT IP
address, new port #) as destination addr

▪ remember (in NAT translation table) every (source IP address,


port #) to (NAT IP address, new port #) translation pair

▪ incoming datagrams: replace (NAT IP address, new port #) in


dest fields of every incoming datagram with corresponding
(source IP address, port #) stored in NAT table

Network Layer 4-58


NAT: network address translation
NAT translation table 1: host 10.0.0.1
2: NAT router WAN side addr LAN side addr
changes datagram sends datagram to
source addr from 138.76.29.7, 5001 10.0.0.1, 3345 128.119.40.186, 80
10.0.0.1, 3345 to …… ……
138.76.29.7, 5001,
updates table S: 10.0.0.1, 3345
D: 128.119.40.186, 80
10.0.0.1
1
S: 138.76.29.7, 5001
2 D: 128.119.40.186, 80 10.0.0.4
10.0.0.2
138.76.29.7 S: 128.119.40.186, 80
D: 10.0.0.1, 3345
4
S: 128.119.40.186, 80
D: 138.76.29.7, 5001 3 10.0.0.3
4: NAT router
3: reply arrives changes datagram
dest. address: dest addr from
138.76.29.7, 5001 138.76.29.7, 5001 to 10.0.0.1, 3345

Network Layer 4-59


NAT: network address translation
❖ 16-bit port-number field:
▪ 60,000 simultaneous connections with a single
LAN-side address!
❖ NAT is controversial:
▪ routers should only process up to layer 3
▪ violates end-to-end argument
• NAT possibility must be taken into account by app
designers, e.g., P2P applications
▪ address shortage should instead be solved by
IPv6

Network Layer 4-60


NAT traversal problem
❖ client wants to connect to
server with address 10.0.0.1
▪ server address 10.0.0.1 local to 10.0.0.1
client
LAN (client can’t use it as
destination addr) ?
▪ only one externally visible NATed 10.0.0.4
address: 138.76.29.7
❖ solution1: statically configure 138.76.29.7 NAT
NAT to forward incoming router
connection requests at given
port to server
▪ e.g., (123.76.29.7, port 25000)
always forwarded to 10.0.0.1 port
25000

Network Layer 4-61


NAT traversal problem
❖ solution 2: Universal Plug and Play
(UPnP) Internet Gateway Device
(IGD) Protocol. Allows NATed 10.0.0.1
host to: IGD
❖ learn public IP address
(138.76.29.7)
❖ add/remove port mappings
(with lease times) NAT
router

i.e., automate static NAT port


map configuration

Network Layer 4-62


NAT traversal problem
❖ solution 3: relaying (used in Skype)
▪ NATed client establishes connection to relay
▪ external client connects to relay
▪ relay bridges packets between to connections

2. connection to
relay initiated 1. connection to 10.0.0.1
by client relay initiated
by NATed host
3. relaying
client established
138.76.29.7 NAT
router

Network Layer 4-63


Chapter 4: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ datagram format ▪ RIP
▪ IPv4 addressing ▪ OSPF
▪ ICMP ▪ BGP
▪ IPv6 4.7 broadcast and multicast
routing

Network Layer 4-64


ICMP: internet control message protocol

❖ used by hosts & routers


to communicate network- Type Code description
0 0 echo reply (ping)
level information 3 0 dest. network unreachable
▪ error reporting: 3 1 dest host unreachable
unreachable host, network, 3 2 dest protocol unreachable
port, protocol 3 3 dest port unreachable
▪ echo request/reply (used by 3 6 dest network unknown
ping) 3 7 dest host unknown
❖ network-layer “above” IP: 4 0 source quench (congestion
▪ ICMP msgs carried in IP control - not used)
datagrams 8 0 echo request (ping)
9 0 route advertisement
❖ ICMP message: type, code 10 0 router discovery
plus first 8 bytes of IP 11 0 TTL expired
datagram causing error 12 0 bad IP header

Network Layer 4-65


Traceroute and ICMP
❖ source sends series of ❖ when ICMP messages
UDP segments to dest arrives, source records
▪ first set has TTL =1 RTTs
▪ second set has TTL=2, etc.
▪ unlikely port number stopping criteria:
❖ when nth set of datagrams ❖ UDP segment eventually
arrives to nth router: arrives at destination host
▪ router discards datagrams ❖ destination returns ICMP
▪ and sends source ICMP “port unreachable”
messages (type 11, code 0) message (type 3, code 3)
▪ ICMP messages includes
❖ source stops
name of router & IP address

3 probes 3 probes

3 probes
Network Layer 4-66
IPv6: motivation
❖ initial motivation: 32-bit address space soon to be
completely allocated.
❖ additional motivation:
▪ header format helps speed processing/forwarding
▪ header changes to facilitate QoS

IPv6 datagram format:


▪ fixed-length 40 byte header
▪ no fragmentation allowed

Network Layer 4-67


IPv6 datagram format
priority: identify priority among datagrams in flow
flow Label: identify datagrams in same “flow.”
(concept of“flow” not well defined).
next header: identify upper layer protocol for data
ver pri flow label
payload len next hdr hop limit
source address
(128 bits)
destination address
(128 bits)

data

32 bits
Network Layer 4-68
Other changes from IPv4
❖ checksum: removed entirely to reduce processing
time at each hop
❖ options: allowed, but outside of header, indicated
by “Next Header” field
❖ ICMPv6: new version of ICMP
▪ additional message types, e.g. “Packet Too Big”
▪ multicast group management functions

Network Layer 4-69


Transition from IPv4 to IPv6
❖ not all routers can be upgraded simultaneously
▪ no “flag days”
▪ how will network operate with mixed IPv4 and
IPv6 routers?
❖ tunneling: IPv6 datagram carried as payload in IPv4
datagram among IPv4 routers

IPv4 header fields IPv6 header fields


IPv4 payload
IPv4 source, dest addr IPv6 source dest addr
UDP/TCP payload

IPv6 datagram
IPv4 datagram
Network Layer 4-70
Tunneling
A B IPv4 tunnel E F
connecting IPv6 routers
logical view:
IPv6 IPv6 IPv6 IPv6

A B C D E F
physical view:
IPv6 IPv6 IPv4 IPv4 IPv6 IPv6

Network Layer 4-71


Tunneling
A B IPv4 tunnel E F
connecting IPv6 routers
logical view:
IPv6 IPv6 IPv6 IPv6

A B C D E F
physical view:
IPv6 IPv6 IPv4 IPv4 IPv6 IPv6

flow: X src:B src:B flow: X


src: A dest: E src: A
dest: F
dest: E
dest: F
Flow: X Flow: X
Src: A Src: A
data Dest: F Dest: F data

data data

A-to-B: E-to-F:
IPv6 B-to-C: B-to-C: IPv6
IPv6 inside IPv6 inside
IPv4 IPv4 Network Layer 4-72
Chapter 4: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ datagram format ▪ RIP
▪ IPv4 addressing ▪ OSPF
▪ ICMP ▪ BGP
▪ IPv6 4.7 broadcast and multicast
routing

Network Layer 4-73


Interplay between routing, forwarding
routing algorithm determines
routing algorithm
end-end-path through network
forwarding table determines
local forwarding table
local forwarding at this router
dest address output link
address-range 1 3
address-range 2 2
address-range 3 2
address-range 4 1

IP destination address in
arriving packet’s header
1
3 2

Network Layer 4-74


Graph abstraction
5

v 3 w
2 5
u 2 z
1
3
1
x y 2
graph: G = (N,E) 1

N = set of routers = { u, v, w, x, y, z }

E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }

aside: graph abstraction is useful in other network contexts, e.g.,


P2P, where N is set of peers and E is set of TCP connections

Network Layer 4-75


Graph abstraction: costs
5 c(x,x’) = cost of link (x,x’)
3 e.g., c(w,z) = 5
v w
2 5
u cost could always be 1, or
2 1 z inversely related to bandwidth,
3
1 or inversely related to
x y 2
congestion
1

cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)

key question: what is the least-cost path between u and z ?


routing algorithm: algorithm that finds that least cost path

Network Layer 4-76


Routing algorithm classification
Q: global or decentralized Q: static or dynamic?
information?
static:
global: ❖ routes change slowly over
❖ all routers have complete time
topology, link cost info dynamic:
❖ “link state” algorithms ❖ routes change more
decentralized: quickly
❖ router knows physically- ▪ periodic update
connected neighbors, link ▪ in response to link
costs to neighbors cost changes
❖ iterative process of
computation, exchange of
info with neighbors
❖ “distance vector” algorithms
Network Layer 4-77
Chapter 4: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ datagram format ▪ RIP
▪ IPv4 addressing ▪ OSPF
▪ ICMP ▪ BGP
▪ IPv6 4.7 broadcast and multicast
routing

Network Layer 4-78


A Link-State Routing Algorithm
Dijkstra’s algorithm notation:
❖ net topology, link costs ❖ c(x,y): link cost from
known to all nodes node x to y; = ∞ if not
▪ accomplished via “link state direct neighbors
broadcast” ❖ D(v): current value of
▪ all nodes have same info cost of path from source
❖ computes least cost paths to dest. v
from one node (‘source”) ❖ p(v): predecessor node
to all other nodes along path from source to
▪ gives forwarding table for v
that node ❖ N': set of nodes whose
❖ iterative: after k least cost path definitively
iterations, know least cost known
path to k destinations
Network Layer 4-79
Dijsktra’s Algorithm
1 Initialization:
2 N' = {u}
3 for all nodes v
4 if v adjacent to u
5 then D(v) = c(u,v)
6 else D(v) = ∞
7
8 Loop
9 find w not in N' such that D(w) is a minimum
10 add w to N'
11 update D(v) for all v adjacent to w and not in N' :
12 D(v) = min( D(v), D(w) + c(w,v) )
13 /* new cost to v is either old cost to v or known
14 shortest path cost to w plus cost from w to v */
15 until all nodes in N'

Network Layer 4-80


Dijkstra’s algorithm: example
D(v) D(w) D(x) D(y) D(z)
Step N' p(v) p(w) p(x) p(y) p(z)
0 u 7,u 3,u 5,u ∞ ∞
1 uw 6,w 5,u 11,w ∞ e.g., D(v) = min( D(v), D( w) + c( w, v))
2 uwx 6,w 11,w 14,x = min{7,3 + 3} = 6
3 uwxv 10,v 14,x
4 uwxvy 12,y
5 uwxvyz x
9
notes: 5 7
❖ construct shortest path tree by 4
tracing predecessor nodes 8
❖ ties can exist (can be broken u 3 w y z
arbitrarily) 2
3
7 4
v
Network Layer 4-81
Dijkstra’s algorithm: example

x resulting forwarding
9
table in u:
5 7
4 destination link
8 v (u,w)
u 3 w y z x (u,x)
2 y (u,w)
3 w (u,w)
7 4
z (u,w)
v

Network Layer 4-82


Dijkstra’s algorithm, discussion
algorithm complexity: n nodes
❖ each iteration: need to check all nodes, w, not in N
❖ n(n+1)/2 comparisons: O(n2)
❖ more efficient implementations possible: O(nlogn)
oscillations possible:
❖ e.g., support link cost equals amount of carried traffic:

1
A 1+e A A A
2+e 0 0 2+e 2+e 0
D 0 0 B D 1+e 1 B D B D 1+e 1 B
0 0
0 e 0 0
C 0 1 1+e 0
C C C
1 1
e given these costs, given these costs, given these costs,
initially find new routing…. find new routing…. find new routing….
resulting in new costs resulting in new costs resulting in new costs
Network Layer 4-83
Chapter 4: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ datagram format ▪ RIP
▪ IPv4 addressing ▪ OSPF
▪ ICMP ▪ BGP
▪ IPv6 4.7 broadcast and multicast
routing

Network Layer 4-84


Distance vector algorithm
Bellman-Ford equation (dynamic programming)

let
dx(y) := cost of least-cost path from x to y
then
dx(y) = min
v
{c(x,v) + dv (y) }

cost from neighbor v to destination y


cost to neighbor v

min taken over all neighbors v of x


Network Layer 4-85
Bellman-Ford example
5
clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3
v 3 w
2 5
u 2 z B-F equation says:
1
3
1 du(z) = min { c(u,v) + dv(z),
x y 2
1 c(u,x) + dx(z),
c(u,w) + dw(z) }
= min {2 + 5,
1 + 3,
5 + 3} = 4
node achieving minimum is next
hop in shortest path, used in forwarding table
Network Layer 4-86
Distance vector algorithm
❖ Dx(y) = estimate of least cost from x to y
▪ x maintains distance vector Dx = [Dx(y): y є N ]
❖ node x:
▪ knows cost to each neighbor v: c(x,v)
▪ maintains its neighbors’ distance vectors. For
each neighbor v, x maintains
Dv = [Dv(y): y є N ]

Network Layer 4-87


Distance vector algorithm
key idea:
❖ from time-to-time, each node sends its own
distance vector estimate to neighbors
❖ when x receives new DV estimate from neighbor,
it updates its own DV using B-F equation:
Dx(y) ← minv{c(x,v) + Dv(y)} for each node y ∊ N

❖ under minor, natural conditions, the estimate Dx(y)


converge to the actual least cost dx(y)

Network Layer 4-88


Distance vector algorithm
iterative, asynchronous: each node:
each local iteration
caused by:
❖ local link cost change wait for (change in local link
cost or msg from neighbor)
❖ DV update message from
neighbor
distributed: recompute estimates
❖ each node notifies
neighbors only when its
DV changes if DV to any dest has
▪ neighbors then notify their changed, notify neighbors
neighbors if necessary

Network Layer 4-89


Dx(z) = min{c(x,y) +
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2 Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
node x cost to cost to
table x y z x y z
x 0 2 7 x 0 2 3

from
from

y ∞∞ ∞ y 2 0 1
z ∞∞ ∞ z 7 1 0

node y cost to
table x y z y
2 1
x ∞ ∞ ∞
x z
from

y 2 0 1 7
z ∞∞ ∞

node z cost to
table x y z
x ∞∞ ∞
from

y ∞∞ ∞
z 7 1 0
time
Network Layer 4-90
Dx(z) = min{c(x,y) +
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2 Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
node x cost to cost to cost to
table x y z x y z x y z
x 0 2 7 x 0 2 3 x 0 2 3

from
from

y ∞∞ ∞ y 2 0 1

from
y 2 0 1
z ∞∞ ∞ z 7 1 0 z 3 1 0
node y cost to cost to cost to
table x y z x y z x y z y
2 1
x ∞ ∞ ∞ x 0 2 7 x 0 2 3 x z
from

from

y 2 0 1 y 2 0 1

from
y 2 0 1 7
z ∞∞ ∞ z 7 1 0 z 3 1 0

node z cost to cost to cost to


table x y z x y z x y z

x ∞∞ ∞ x 0 2 7 x 0 2 3
from

from

y 2 0 1 y 2 0 1
from

y ∞∞ ∞
z 7 1 0 z 3 1 0 z 3 1 0
time
Network Layer 4-91
Distance vector: link cost changes
link cost changes:
1
❖ node detects local link cost change y
4 1
❖ updates routing info, recalculates x z
distance vector 50
❖ if DV changes, notify neighbors

“good t0 : y detects link-cost change, updates its DV, informs its


news neighbors.
travels t1 : z receives update from y, updates its table, computes new
fast” least cost to x , sends its neighbors its DV.

t2 : y receives z’s update, updates its distance table. y’s least costs
do not change, so y does not send a message to z.

Network Layer 4-92


Distance vector: link cost changes
link cost changes:
60
❖ node detects local link cost change y
4 1
❖ bad news travels slow - “count to x z
infinity” problem! 50
❖ 44 iterations before algorithm
stabilizes: see text
poisoned reverse:
❖ If Z routes through Y to get to X :
▪ Z tells Y its (Z’s) distance to X is infinite (so Y won’t route
to X via Z)
❖ will this completely solve count to infinity problem?

Network Layer 4-93


Comparison of LS and DV algorithms
message complexity robustness: what happens if
❖ LS: with n nodes, E links, O(nE) router malfunctions?
msgs sent LS:
❖ DV: exchange between neighbors ▪ node can advertise incorrect
only link cost
▪ convergence time varies ▪ each node computes only its
own table
speed of convergence DV:
❖ LS:O(n2) algorithm requires
O(nE) msgs ▪ DV node can advertise
incorrect path cost
▪ may have oscillations
▪ each node’s table used by
❖ DV: convergence time varies others
▪ may be routing loops • error propagate thru
▪ count-to-infinity problem network

Network Layer 4-94


Chapter 4: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ datagram format ▪ RIP
▪ IPv4 addressing ▪ OSPF
▪ ICMP ▪ BGP
▪ IPv6 4.7 broadcast and multicast
routing

Network Layer 4-95


Hierarchical routing
our routing study thus far - idealization
❖ all routers identical
❖ network “flat”
… not true in practice

scale: with 600 million administrative autonomy


destinations: ❖ internet = network of
❖ can’t store all dest’s in networks
routing tables! ❖ each network admin may
❖ routing table exchange want to control routing in
would swamp links! its own network

Network Layer 4-96


Hierarchical routing
❖ collect routers into ❖ routers in same AS run
regions, “autonomous same routing protocol
systems” (AS) ▪ “intra-AS” routing
❖ Each AS within an ISP protocol
▪ ISP may consist of one ▪ routers in different AS
or more ASes can run different intra-
AS routing protocol
gateway router:
❖ at “edge” of its own AS
❖ has link to router in
another AS

Network Layer 4-97


Interconnected ASes

3c
3a 2c
3b 2a
AS3 2b
1c AS2
1a 1b AS1
1d ❖ forwarding table
configured by both intra-
and inter-AS routing
Intra-AS Inter-AS algorithm
Routing Routing
algorithm algorithm ▪ intra-AS sets entries
Forwarding
for internal dests
table ▪ inter-AS & intra-AS
sets entries for
external dests
Network Layer 4-98
Inter-AS tasks
❖ suppose router in AS1 AS1 must:
receives datagram 1. learn which dests are
destined outside of AS1: reachable through AS2,
▪ router should forward which through AS3
packet to gateway 2. propagate this
router, but which one? reachability info to all
routers in AS1
job of inter-AS routing!

3c
3a
3b
AS3 2c other
1c 2a networks
other 1a 2b
networks 1b AS2
AS1 1d

Network Layer 4-99


Example: setting forwarding table in router 1d
❖ suppose AS1 learns (via inter-AS protocol) that subnet x
reachable via AS3 (gateway 1c), but not via AS2
▪ inter-AS protocol propagates reachability info to all internal
routers
❖ router 1d determines from intra-AS routing info that its
interface I is on the least cost path to 1c
▪ installs forwarding table entry (x,I)

3c
x
3a
3b
AS3 2c other
1c 2a networks
other 1a 2b
networks 1b AS2
AS1 1d

Network Layer 4-100


Example: choosing among multiple ASes

❖ now suppose AS1 learns from inter-AS protocol that subnet


x is reachable from AS3 and from AS2.
❖ to configure forwarding table, router 1d must determine
towards which gateway it should forward packets for dest x
▪ this is also job of inter-AS routing protocol!

3c
x
3a
3b
AS3 2c other
1c 2a networks
other 1a 2b
networks 1b AS2
AS1 1d
?
Network Layer 4-101
Example: choosing among multiple ASes
❖ now suppose AS1 learns from inter-AS protocol that subnet
x is reachable from AS3 and from AS2.
❖ to configure forwarding table, router 1d must determine
towards which gateway it should forward packets for dest x
▪ this is also job of inter-AS routing protocol!
❖ hot potato routing: send packet towards closest of two
routers.

use routing info determine from


learn from inter-AS hot potato routing: forwarding table the
from intra-AS
protocol that subnet choose the gateway interface I that leads
protocol to determine
x is reachable via that has the to least-cost gateway.
costs of least-cost
multiple gateways smallest least cost Enter (x,I) in
paths to each
of the gateways forwarding table

Network Layer 4-102


Chapter 4: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ datagram format ▪ RIP
▪ IPv4 addressing ▪ OSPF
▪ ICMP ▪ BGP
▪ IPv6 4.7 broadcast and multicast
routing

Network Layer 4-103


Intra-AS Routing
❖ also known as interior gateway protocols (IGP)
❖ most common intra-AS routing protocols:
▪ RIP: Routing Information Protocol
▪ OSPF: Open Shortest Path First
▪ IGRP: Interior Gateway Routing Protocol
(Cisco proprietary)

Network Layer 4-104


RIP ( Routing Information Protocol)
❖ included in BSD-UNIX distribution in 1982
❖ distance vector algorithm
▪ distance metric: # hops (max = 15 hops), each link has cost 1
▪ DVs exchanged with neighbors every 30 sec in response message (aka
advertisement)
▪ each advertisement: list of up to 25 destination subnets (in IP addressing
sense)

from router A to destination subnets:


u v subnet hops
w u 1
A B
v 2
w 2
x x 3
z C D y 3
y z 2
Network Layer 4-105
RIP: example

z
w x y
A D B

C
routing table in router D
destination subnet next router # hops to dest
w A 2
y B 2
z B 7
x -- 1
…. …. ....
Network Layer 4-106
RIP: example
A-to-D advertisement
dest next hops
w - 1
x - 1
z C 4
…. … ... z
w x y
A D B

C
routing table in router D
destination subnet next router # hops to dest
w A 2
y B 2
A 5
z B 7
x -- 1
…. …. ....
Network Layer 4-107
RIP: link failure, recovery
if no advertisement heard after 180 sec -->
neighbor/link declared dead
▪ routes via neighbor invalidated
▪ new advertisements sent to neighbors
▪ neighbors in turn send out new advertisements (if tables
changed)
▪ link failure info quickly (?) propagates to entire net
▪ poison reverse used to prevent ping-pong loops (infinite
distance = 16 hops)

Network Layer 4-108


RIP table processing
❖ RIP routing tables managed by application-level
process called route-d (daemon)
❖ advertisements sent in UDP packets, periodically
repeated
routed routed

transport transprt
(UDP) (UDP)
network forwarding forwarding network
(IP) table table (IP)
link link
physical physical

Network Layer 4-109


OSPF (Open Shortest Path First)
❖ “open”: publicly available
❖ uses link state algorithm
▪ LS packet dissemination
▪ topology map at each node
▪ route computation using Dijkstra’s algorithm
❖ OSPF advertisement carries one entry per neighbor
❖ advertisements flooded to entire AS
▪ carried in OSPF messages directly over IP (rather than
TCP or UDP)
❖ IS-IS routing protocol: nearly identical to OSPF

Network Layer 4-110


OSPF “advanced” features (not in RIP)
❖ security: all OSPF messages authenticated (to prevent
malicious intrusion)
❖ multiple same-cost paths allowed (only one path in
RIP)
❖ for each link, multiple cost metrics for different TOS
(e.g., satellite link cost set “low” for best effort ToS;
high for real time ToS)
❖ integrated uni- and multicast support:
▪ Multicast OSPF (MOSPF) uses same topology data
base as OSPF
❖ hierarchical OSPF in large domains.

Network Layer 4-111


Hierarchical OSPF
boundary router
backbone router

backbone
area
border
routers

area 3

internal
routers
area 1
area 2

Network Layer 4-112


Hierarchical OSPF
❖ two-level hierarchy: local area, backbone.
▪ link-state advertisements only in area
▪ each nodes has detailed area topology; only know
direction (shortest path) to nets in other areas.
❖ area border routers: “summarize” distances to nets in
own area, advertise to other Area Border routers.
❖ backbone routers: run OSPF routing limited to
backbone.
❖ boundary routers: connect to other AS’s.

Network Layer 4-113


Internet inter-AS routing: BGP
❖ BGP (Border Gateway Protocol): the de facto
inter-domain routing protocol
▪ “glue that holds the Internet together”
❖ BGP provides each AS a means to:
▪ obtain subnet reachability information from
neighboring AS’s: eBGP
▪ propagate reachability information to all AS-internal
routers: iBGP
▪ determine “good” routes to other networks based on
reachability information and policy.
❖ allows subnet to advertise its existence to rest of
Internet: “I am here”

Network Layer 4-114


BGP basics
❖ BGP session: two BGP routers (“peers”) exchange BGP
messages:
▪ advertising paths to different destination network prefixes (“path vector”
protocol)
▪ exchanged over semi-permanent TCP connections

❖ when AS3 advertises a prefix to AS1:


▪ AS3 promises it will forward datagrams towards that prefix
▪ AS3 can aggregate prefixes in its advertisement

3c
BGP
3a message
3b
AS3 2c other
1c 2a networks
other 1a 2b
networks 1b AS2
AS1 1d

Network Layer 4-115


BGP basics: distributing path information
❖ using eBGP session between 3a and 1c, AS3 sends prefix
reachability info to AS1.
▪ 1c can then use iBGP do distribute new prefix info to all routers
in AS1
▪ 1b can then re-advertise new reachability info to AS2 over 1b-to-
2a eBGP session
❖ when router learns of new prefix, it creates entry for
prefix in its forwarding table.

eBGP session
3a iBGP session
3b
AS3 2c other
1c 2a networks
other 1a 2b
networks 1b AS2
AS1 1d

Network Layer 4-116


Path attributes and BGP routes
❖ advertised prefix includes BGP attributes
▪ prefix + attributes = “route”
❖ two important attributes:
▪ AS-PATH: contains ASs through which prefix
advertisement has passed: e.g., AS 67, AS 17
▪ NEXT-HOP: the IP address of the router interface that
begins the AS PATH.
❖ gateway router receiving route advertisement uses
import policy to accept/decline
▪ e.g., never route through AS x
▪ policy-based routing

Network Layer 4-117


BGP route selection
❖ router may learn about more than one route to
destination AS, selects route based on:
1. local preference value attribute: policy decision
2. shortest AS-PATH
3. closest NEXT-HOP router: hot potato routing
4. additional criteria

Network Layer 4-118


BGP messages
❖ BGP messages exchanged between peers over TCP
connection
❖ BGP messages:
▪ OPEN: opens TCP connection to peer and authenticates
sender
▪ UPDATE: advertises new path (or withdraws old)
▪ KEEPALIVE: keeps connection alive in absence of
UPDATES; also ACKs OPEN request
▪ NOTIFICATION: reports errors in previous msg; also
used to close connection

Network Layer 4-119


Putting it Altogether:
How Does an Entry Get Into a
Router’s Forwarding Table?
❖ Answer is complicated!

❖ Ties together hierarchical routing (Section 4.5.3)


with BGP (4.6.3) and OSPF (4.6.2).

❖ Provides nice overview of BGP!


How does entry get in forwarding table?

routing algorithms

Assume prefix is
local forwarding table in another AS.
entry prefix output port
138.16.64/22 3
124.12/16 2
212/8 4
………….. …

Dest IP
1

3 2
How does entry get in forwarding table?

High-level overview
1. Router becomes aware of prefix
2. Router determines output port for prefix
3. Router enters prefix-port in forwarding table
Router becomes aware of prefix

3c
BGP
3a message
3b
AS3 2c other
1c 2a networks
other 1a 2b
networks 1b AS2
AS1 1d

❖ BGP message contains “routes”


❖ “route” is a prefix and attributes: AS-PATH, NEXT-
HOP,…
❖ Example: route:
❖ Prefix:138.16.64/22 ; AS-PATH: AS3 AS131 ;
NEXT-HOP: 201.44.13.125
Router may receive multiple routes

3c
BGP
3a message
3b
AS3 2c other
1c 2a networks
other 1a 2b
networks 1b AS2
AS1 1d

❖ Router may receive multiple routes for same prefix


❖ Has to select one route
Select best BGP route to prefix
❖ Router selects route based on shortest AS-PATH

❖ Example: select

❖ AS2 AS17 to 138.16.64/22


❖ AS3 AS131 AS201 to 138.16.64/22

❖ What if there is a tie? We’ll come back to that!


Find best intra-route to BGP route
❖ Use selected route’s NEXT-HOP attribute
▪ Route’s NEXT-HOP attribute is the IP address of the
router interface that begins the AS PATH.
❖ Example:
❖ AS-PATH: AS2 AS17 ; NEXT-HOP: 111.99.86.55
❖ Router uses OSPF to find shortest path from 1c to
111.99.86.55

3c
3a 111.99.86.55
3b
AS3 2c other
1c 2a networks
other 1a 2b
networks 1b AS2
AS1 1d
Router identifies port for route

❖ Identifies port along the OSPF shortest path


❖ Adds prefix-port entry to its forwarding table:
▪ (138.16.64/22 , port 4)

3c router
3a port
3b
AS3 1 2c other
1c 4 2a networks
2 3
other 1a 2b
networks 1b AS2
AS1 1d
Hot Potato Routing
❖ Suppose there two or more best inter-routes.
❖ Then choose route with closest NEXT-HOP
▪ Use OSPF to determine which gateway is closest
▪ Q: From 1c, chose AS3 AS131 or AS2 AS17?
▪ A: route AS3 AS201 since it is closer

3c
3a
3b
AS3 2c other
1c 2a networks
other 1a 2b
networks 1b AS2
AS1 1d
How does entry get in forwarding table?

Summary
1. Router becomes aware of prefix
▪ via BGP route advertisements from other routers
2. Determine router output port for prefix
▪ Use BGP route selection to find best inter-AS route
▪ Use OSPF to find best intra-AS route leading to best
inter-AS route
▪ Router identifies router port for that best route
3. Enter prefix-port entry in forwarding table
BGP routing policy
legend: provider
B network
X
W A
customer
C network:

❖ A,B,C are provider networks


❖ X,W,Y are customer (of provider networks)
❖ X is dual-homed: attached to two networks
▪ X does not want to route from B via X to C
▪ .. so X will not advertise to B a route to C

Network Layer 4-130


BGP routing policy (2)
legend: provider
B network
X
W A
customer
C network:

❖ A advertises path AW to B
❖ B advertises path BAW to X
❖ Should B advertise path BAW to C?
▪ No way! B gets no “revenue” for routing CBAW since neither W nor
C are B’s customers
▪ B wants to force C to route to w via A
▪ B wants to route only to/from its customers!

Network Layer 4-131


Why different Intra-, Inter-AS routing ?
policy:
❖ inter-AS: admin wants control over how its traffic
routed, who routes through its net.
❖ intra-AS: single admin, so no policy decisions needed
scale:
❖ hierarchical routing saves table size, reduced update
traffic
performance:
❖ intra-AS: can focus on performance
❖ inter-AS: policy may dominate over performance

Network Layer 4-132


Chapter 4: outline
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state
datagram networks ▪ distance vector
4.3 what’s inside a router ▪ hierarchical routing
4.4 IP: Internet Protocol 4.6 routing in the Internet
▪ datagram format ▪ RIP
▪ IPv4 addressing ▪ OSPF
▪ ICMP ▪ BGP
▪ IPv6 4.7 broadcast and multicast
routing

Network Layer 4-133


Broadcast routing
❖ deliver packets from source to all other nodes
❖ source duplication is inefficient:
duplicate
duplicate R1 creation/transmission R1
duplicate
R2 R2

R3 R4 R3 R4

source in-network
duplication duplication

❖ source duplication: how does source determine


recipient addresses?
Network Layer 4-134
In-network duplication
❖ flooding: when node receives broadcast packet,
sends copy to all neighbors
▪ problems: cycles & broadcast storm
❖ controlled flooding: node only broadcasts pkt if it
hasn’t broadcast same packet before
▪ node keeps track of packet ids already broadacsted
▪ or reverse path forwarding (RPF): only forward packet
if it arrived on shortest path between node and source
❖ spanning tree:
▪ no redundant packets received by any node

Network Layer 4-135


Spanning tree
❖ first construct a spanning tree
❖ nodes then forward/make copies only along
spanning tree

A A

B B
c c

D D
F E F E

G G
(a) broadcast initiated at A (b) broadcast initiated at D

Network Layer 4-136


Spanning tree: creation
❖ center node
❖ each node sends unicast join message to center
node
▪ message forwarded until it arrives at a node already
belonging to spanning tree

A A
3
B B
c c
4
2
D D
F E F E
1 5
G G
(a) stepwise construction of (b) constructed spanning
spanning tree (center: E) tree
Network Layer 4-137
Multicast routing: problem statement
goal: find a tree (or trees) connecting routers having
local mcast group members legend
❖ tree: not all paths between routers used group
❖ shared-tree: same tree used by all group members member
not group
❖ source-based: different tree from each sender to rcvrs member
router
with a
group
member
router
without
group
member

shared tree source-based trees


Network Layer 4-138
Approaches for building mcast trees
approaches:
❖ source-based tree: one tree per source
▪ shortest path trees
▪ reverse path forwarding
❖ group-shared tree: group uses one tree
▪ minimal spanning (Steiner)
▪ center-based trees

…we first look at basic approaches, then specific protocols


adopting these approaches

Network Layer 4-139


Shortest path tree
❖ mcast forwarding tree: tree of shortest path
routes from source to all receivers
▪ Dijkstra’s algorithm

s: source LEGEND
R1 2 router with attached
1 R4
group member
R2 5 router with no attached
3 4 group member
R5
i link used for forwarding,
R3 6
i indicates order link
R6 R7 added by algorithm

Network Layer 4-140


Reverse path forwarding

❖ rely on router’s knowledge of unicast shortest


path from it to sender
❖ each router has simple forwarding behavior:

if (mcast datagram received on incoming link on


shortest path back to center)
then flood datagram onto all outgoing links
else ignore datagram

Network Layer 4-141


Reverse path forwarding: example
s: source LEGEND
R1
R4 router with attached
group member
R2
router with no attached
R5 group member
R3 datagram will be forwarded
R6 R7
datagram will not be
forwarded

❖ result is a source-specific reverse SPT


▪ may be a bad choice with asymmetric links

Network Layer 4-142


Reverse path forwarding: pruning
❖ forwarding tree contains subtrees with no mcast group
members
▪ no need to forward datagrams down subtree
▪ “prune” msgs sent upstream by router with no
downstream group members
s: source
LEGEND
R1
R4
router with attached
group member
R2
P
router with no attached
R5 group member
P
R3 P prune message
R6 links with multicast
R7 forwarding

Network Layer 4-143


Shared-tree: steiner tree

❖ steiner tree: minimum cost tree connecting all


routers with attached group members
❖ problem is NP-complete
❖ excellent heuristics exists
❖ not used in practice:
▪ computational complexity
▪ information about entire network needed
▪ monolithic: rerun whenever a router needs to
join/leave

Network Layer 4-144


Center-based trees
❖ single delivery tree shared by all
❖ one router identified as “center” of tree
❖ to join:
▪ edge router sends unicast join-msg addressed to center
router
▪ join-msg “processed” by intermediate routers and
forwarded towards center
▪ join-msg either hits existing tree branch for this center,
or arrives at center
▪ path taken by join-msg becomes new branch of tree for
this router

Network Layer 4-145


Center-based trees: example
suppose R6 chosen as center:

LEGEND

R1 router with attached


R4
3 group member

R2 router with no attached


2 group member
1
R5 path order in which join
messages generated
R3
1 R6
R7

Network Layer 4-146


Internet Multicasting Routing: DVMRP

❖ DVMRP: distance vector multicast routing


protocol, RFC1075
❖ flood and prune: reverse path forwarding, source-
based tree
▪ RPF tree based on DVMRP’s own routing tables
constructed by communicating DVMRP routers
▪ no assumptions about underlying unicast
▪ initial datagram to mcast group flooded everywhere
via RPF
▪ routers not wanting group: send upstream prune msgs

Network Layer 4-147


DVMRP: continued…
❖ soft state: DVMRP router periodically (1 min.)
“forgets” branches are pruned:
▪ mcast data again flows down unpruned branch
▪ downstream router: reprune or else continue to receive
data
❖ routers can quickly regraft to tree
▪ following IGMP join at leaf
❖ odds and ends
▪ commonly implemented in commercial router

Network Layer 4-148


Tunneling
Q: how to connect “islands” of multicast routers in a
“sea” of unicast routers?

physical topology logical topology

❖ mcast datagram encapsulated inside “normal” (non-


multicast-addressed) datagram
❖ normal IP datagram sent thru “tunnel” via regular IP unicast
to receiving mcast router (recall IPv6 inside IPv4 tunneling)
❖ receiving mcast router unencapsulates to get mcast
datagram
Network Layer 4-149
PIM: Protocol Independent Multicast
❖ not dependent on any specific underlying unicast
routing algorithm (works with all)
❖ two different multicast distribution scenarios :

dense: sparse:
❖ group members densely ❖ # networks with group
packed, in “close” members small wrt #
proximity. interconnected networks
❖ bandwidth more plentiful ❖ group members “widely
dispersed”
❖ bandwidth not plentiful

Network Layer 4-150


Consequences of sparse-dense dichotomy:
dense sparse:
❖ group membership by ❖ no membership until routers
routers assumed until explicitly join
routers explicitly prune ❖ receiver- driven construction
❖ data-driven construction on of mcast tree (e.g., center-
mcast tree (e.g., RPF) based)
❖ bandwidth and non-group- ❖ bandwidth and non-group-
router processing profligate router processing conservative

Network Layer 4-151


PIM- dense mode
flood-and-prune RPF: similar to DVMRP but…
❖ underlying unicast protocol provides RPF info
for incoming datagram
❖ less complicated (less efficient) downstream
flood than DVMRP reduces reliance on
underlying routing algorithm
❖ has protocol mechanism for router to detect it
is a leaf-node router

Network Layer 4-152


PIM - sparse mode
❖ center-based approach
❖ router sends join msg to R1
R4
rendezvous point (RP) join
▪ intermediate routers R2
update state and join
forward join R5
join
❖ after joining via RP, router R3
can switch to source- R6
specific tree R7
all data multicast rendezvous
▪ increased from rendezvous point
performance: less point
concentration, shorter
paths

Network Layer 4-153


PIM - sparse mode
sender(s):
R1
❖ unicast data to RP, R4
which distributes join

down RP-rooted tree R2


join
❖ RP can extend mcast R5
tree upstream to R3
join

source R6
❖ RP can send stop msg all data multicast R7
rendezvous
if no attached from rendezvous
point
point
receivers
▪ “no one is listening!”

Network Layer 4-154


Chapter 4: done!
4.1 introduction 4.5 routing algorithms
4.2 virtual circuit and ▪ link state, distance vector,
datagram networks hierarchical routing
4.3 what’s inside a router 4.6 routing in the Internet
▪ RIP, OSPF, BGP
4.4 IP: Internet Protocol
▪ datagram format, IPv4
4.7 broadcast and multicast
addressing, ICMP, IPv6 routing

❖ understand principles behind network layer services:


▪ network layer service models, forwarding versus routing
how a router works, routing (path selection), broadcast,
multicast
❖ instantiation, implementation in the Internet

Network Layer 4-155


Chapter 5
Link Layer

A note on the use of these ppt slides:


We’re making these slides freely available to all (faculty, students, readers). Computer
They’re in PowerPoint form so you see the animations; and can add, modify,
and delete slides (including this one) and slide content to suit your needs. Networking: A Top
They obviously represent a lot of work on our part. In return for use, we only
ask the following: Down Approach
❖ If you use these slides (e.g., in a class) that you mention their source
(after all, we’d like people to use our book!)
6th edition
❖ If you post any slides on a www site, that you note that they are adapted Jim Kurose, Keith Ross
from (or perhaps identical to) our slides, and note our copyright of this
material.
Addison-Wesley
March 2012
Thanks and enjoy! JFK/KWR

All material copyright 1996-2012


J.F Kurose and K.W. Ross, All Rights Reserved

Link Layer 5-1


Chapter 5: Link layer
our goals:
❖ understand principles behind link layer
services:
▪ error detection, correction
▪ sharing a broadcast channel: multiple access
▪ link layer addressing
▪ local area networks: Ethernet, VLANs
❖ instantiation, implementation of various link
layer technologies

Link Layer 5-2


Link layer, LANs: outline
5.1 introduction, services 5.5 link virtualization:
5.2 error detection, MPLS
correction 5.6 data center
5.3 multiple access networking
protocols 5.7 a day in the life of a
5.4 LANs web request
▪ addressing, ARP
▪ Ethernet
▪ switches
▪ VLANS

Link Layer 5-3


Link layer: introduction
terminology:
❖ hosts and routers: nodes
❖ communication channels that global ISP

connect adjacent nodes along


communication path: links
▪ wired links
▪ wireless links
▪ LANs
❖ layer-2 packet: frame,
encapsulates datagram

data-link layer has responsibility of


transferring datagram from one node
to physically adjacent node over a link
Link Layer 5-4
Link layer: context
❖ datagram transferred by transportation analogy:
different link protocols over ❖ trip from Princeton to Lausanne
different links: ▪ limo: Princeton to JFK
▪ e.g., Ethernet on first link, ▪ plane: JFK to Geneva
frame relay on ▪ train: Geneva to Lausanne
intermediate links, 802.11 ❖ tourist = datagram
on last link ❖ transport segment =
❖ each link protocol provides communication link
different services ❖ transportation mode = link
▪ e.g., may or may not layer protocol
provide rdt over link ❖ travel agent = routing
algorithm

Link Layer 5-5


Link layer services
❖ framing, link access:
▪ encapsulate datagram into frame, adding header, trailer
▪ channel access if shared medium
▪ “MAC” addresses used in frame headers to identify
source, dest
• different from IP address!
❖ reliable delivery between adjacent nodes
▪ we learned how to do this already (chapter 3)!
▪ seldom used on low bit-error link (fiber, some twisted
pair)
▪ wireless links: high error rates
• Q: why both link-level and end-end reliability?

Link Layer 5-6


Link layer services (more)
❖ flow control:
▪ pacing between adjacent sending and receiving nodes
❖ error detection:
▪ errors caused by signal attenuation, noise.
▪ receiver detects presence of errors:
• signals sender for retransmission or drops frame
❖ error correction:
▪ receiver identifies and corrects bit error(s) without resorting to
retransmission
❖ half-duplex and full-duplex
▪ with half duplex, nodes at both ends of link can transmit, but not
at same time

Link Layer 5-7


Where is the link layer implemented?
❖ in each and every host
❖ link layer implemented in
“adaptor” (aka network
interface card NIC) or on a
chip application
▪ Ethernet card, 802.11 transport
network cpu memory
card; Ethernet chipset link

▪ implements link, physical host


layer controller
bus
(e.g., PCI)
attaches into host’s system
link
❖ physical
buses
physical
transmission

❖ combination of hardware,
software, firmware network adapter
card

Link Layer 5-8


Adaptors communicating

datagram datagram

controller controller

sending host receiving host


datagram

frame

❖ sending side: ❖ receiving side


▪ encapsulates datagram in ▪ looks for errors, rdt,
frame flow control, etc
▪ adds error checking bits, ▪ extracts datagram, passes
rdt, flow control, etc. to upper layer at
receiving side
Link Layer 5-9
Link layer, LANs: outline
5.1 introduction, services 5.5 link virtualization:
5.2 error detection, MPLS
correction 5.6 data center
5.3 multiple access networking
protocols 5.7 a day in the life of a
5.4 LANs web request
▪ addressing, ARP
▪ Ethernet
▪ switches
▪ VLANS

Link Layer 5-10


Error detection
EDC= Error Detection and Correction bits (redundancy)
D = Data protected by error checking, may include header fields

• Error detection not 100% reliable!


• protocol may miss some errors, but rarely
• larger EDC field yields better detection and correction

otherwise

Link Layer 5-11


Parity checking
single bit parity: two-dimensional bit parity:
❖ detect single bit ❖ detect and correct single bit errors
errors

0 0

Link Layer 5-12


Internet checksum (review)
goal: detect “errors” (e.g., flipped bits) in transmitted packet
(note: used at transport layer only)

sender: receiver:
❖ treat segment contents ❖ compute checksum of
as sequence of 16-bit received segment
integers ❖ check if computed
❖ checksum: addition (1’s checksum equals checksum
complement sum) of field value:
segment contents ▪ NO - error detected
❖ sender puts checksum ▪ YES - no error detected.
value into UDP But maybe errors
checksum field nonetheless?

Link Layer 5-13


Cyclic redundancy check
❖ more powerful error-detection coding
❖ view data bits, D, as a binary number
❖ choose r+1 bit pattern (generator), G
❖ goal: choose r CRC bits, R, such that
▪ <D,R> exactly divisible by G (modulo 2)
▪ receiver knows G, divides <D,R> by G. If non-zero remainder:
error detected!
▪ can detect all burst errors less than r+1 bits
❖ widely used in practice (Ethernet, 802.11 WiFi, ATM)

Link Layer 5-14


CRC example
want: G D r=3
D.2r XOR R = nG 101000
equivalently: 1001 101110000
1001
D.2r = nG XOR R 101
equivalently: 000
if we divide D.2r by 1010
G, want remainder R 1001
to satisfy: 010
000
100
D.2r 000
R = remainder[ ] R 1000
G
0000
1000

Link Layer 5-15


Link layer, LANs: outline
5.1 introduction, services 5.5 link virtualization:
5.2 error detection, MPLS
correction 5.6 data center
5.3 multiple access networking
protocols 5.7 a day in the life of a
5.4 LANs web request
▪ addressing, ARP
▪ Ethernet
▪ switches
▪ VLANS

Link Layer 5-16


Multiple access links, protocols
two types of “links”:
❖ point-to-point
▪ PPP for dial-up access
▪ point-to-point link between Ethernet switch, host
❖ broadcast (shared wire or medium)
▪ old-fashioned Ethernet
▪ upstream HFC
▪ 802.11 wireless LAN

shared wire (e.g., shared RF shared RF humans at a


cabled Ethernet) (e.g., 802.11 WiFi) (satellite) cocktail party
(shared air, acoustical)

Link Layer 5-17


Multiple access protocols
❖ single shared broadcast channel
❖ two or more simultaneous transmissions by nodes:
interference
▪ collision if node receives two or more signals at the same
time

multiple access protocol


❖ distributed algorithm that determines how nodes share
channel, i.e., determine when node can transmit
❖ communication about channel sharing must use channel itself!
▪ no out-of-band channel for coordination

Link Layer 5-18


An ideal multiple access protocol
given: broadcast channel of rate R bps
desiderata:
1. when one node wants to transmit, it can send at rate R.
2. when M nodes want to transmit, each can send at average
rate R/M
3. fully decentralized:
• no special node to coordinate transmissions
• no synchronization of clocks, slots
4. simple

Link Layer 5-19


MAC protocols: taxonomy
three broad classes:
❖ channel partitioning
▪ divide channel into smaller “pieces” (time slots, frequency, code)
▪ allocate piece to node for exclusive use
❖ random access
▪ channel not divided, allow collisions
▪ “recover” from collisions
❖ “taking turns”
▪ nodes take turns, but nodes with more to send can take longer
turns

Link Layer 5-20


Channel partitioning MAC protocols: TDMA
TDMA: time division multiple access
❖ access to channel in "rounds"
❖ each station gets fixed length slot (length = pkt
trans time) in each round
❖ unused slots go idle
❖ example: 6-station LAN, 1,3,4 have pkt, slots
2,5,6 idle

6-slot 6-slot
frame frame
1 3 4 1 3 4

Link Layer 5-21


Channel partitioning MAC protocols: FDMA
FDMA: frequency division multiple access
❖ channel spectrum divided into frequency bands
❖ each station assigned fixed frequency band
❖ unused transmission time in frequency bands go idle
❖ example: 6-station LAN, 1,3,4 have pkt, frequency bands 2,5,6
idle

frequency bands

FDM cable

Link Layer 5-22


Random access protocols
❖ when node has packet to send
▪ transmit at full channel data rate R.
▪ no a priori coordination among nodes
❖ two or more transmitting nodes ➜ “collision”,
❖ random access MAC protocol specifies:
▪ how to detect collisions
▪ how to recover from collisions (e.g., via delayed
retransmissions)
❖ examples of random access MAC protocols:
▪ slotted ALOHA
▪ ALOHA
▪ CSMA, CSMA/CD, CSMA/CA

Link Layer 5-23


Slotted ALOHA
assumptions: operation:
❖ all frames same size ❖ when node obtains fresh
❖ time divided into equal size frame, transmits in next slot
slots (time to transmit 1 ▪ if no collision: node can send
frame) new frame in next slot
❖ nodes start to transmit ▪ if collision: node retransmits
only slot beginning frame in each subsequent
❖ nodes are synchronized slot with prob. p until
❖ if 2 or more nodes transmit success
in slot, all nodes detect
collision

Link Layer 5-24


Slotted ALOHA
node 1 1 1 1 1

node 2 2 2 2

node 3 3 3 3

C E C S E C E S S

Pros: Cons:
❖ single active node can ❖ collisions, wasting slots
continuously transmit at ❖ idle slots
full rate of channel
❖ nodes may be able to
❖ highly decentralized: only detect collision in less
slots in nodes need to be
in sync than time to transmit
packet
❖ simple
❖ clock synchronization
Link Layer 5-25
Slotted ALOHA: efficiency

efficiency: long-run ❖ max efficiency: find p* that


fraction of successful slots maximizes
(many nodes, all with many Np(1-p)N-1
frames to send) ❖ for many nodes, take limit
of Np*(1-p*)N-1 as N goes
❖ suppose: N nodes with to infinity, gives:
many frames to send, each max efficiency = 1/e = .37
transmits in slot with
probability p

!
❖ prob that given node has at best: channel
success in a slot = p(1- used for useful
p)N-1 transmissions 37%
❖ prob that any node has a of time!
success = Np(1-p)N-1
Link Layer 5-26
Pure (unslotted) ALOHA
❖ unslotted Aloha: simpler, no synchronization
❖ when frame first arrives
▪ transmit immediately
❖ collision probability increases:
▪ frame sent at t0 collides with other frames sent in [t0-
1,t0+1]

Link Layer 5-27


Pure ALOHA efficiency
P(success by given node) = P(node transmits) .
P(no other node transmits in [t0-1,t0] .
P(no other node transmits in [t0-1,t0]

= p . (1-p)N-1 . (1-p)N-1
= p . (1-p)2(N-1)
… choosing optimum p and then letting n
= 1/(2e) = .18

even worse than slotted Aloha!

Link Layer 5-28


CSMA (carrier sense multiple access)

CSMA: listen before transmit:


if channel sensed idle: transmit entire frame
❖ if channel sensed busy, defer transmission

❖ human analogy: don’t interrupt others!

Link Layer 5-29


CSMA collisions spatial layout of nodes

❖ collisions can still occur:


propagation delay means
two nodes may not hear
each other’s
transmission
❖ collision: entire packet
transmission time
wasted
▪ distance & propagation
delay play role in in
determining collision
probability

Link Layer 5-30


CSMA/CD (collision detection)
CSMA/CD: carrier sensing, deferral as in CSMA
▪ collisions detected within short time
▪ colliding transmissions aborted, reducing channel wastage
❖ collision detection:
▪ easy in wired LANs: measure signal strengths, compare
transmitted, received signals
▪ difficult in wireless LANs: received signal strength
overwhelmed by local transmission strength
❖ human analogy: the polite conversationalist

Link Layer 5-31


CSMA/CD (collision detection)
spatial layout of nodes

Link Layer 5-32


Ethernet CSMA/CD algorithm
1. NIC receives datagram 4. If NIC detects another
from network layer, transmission while
creates frame transmitting, aborts and
2. If NIC senses channel sends jam signal
idle, starts frame 5. After aborting, NIC
transmission. If NIC enters binary (exponential)
senses channel busy, backoff:
waits until channel idle, ▪ after mth collision, NIC
then transmits. chooses K at random
3. If NIC transmits entire from {0,1,2, …, 2m-1}.
frame without detecting NIC waits K·512 bit
another transmission, times, returns to Step 2
NIC is done with frame ! ▪ longer backoff interval
with more collisions
Link Layer 5-33
CSMA/CD efficiency
❖ Tprop = max prop delay between 2 nodes in LAN
❖ ttrans = time to transmit max-size frame

1
efficiency =
1 + 5t prop /t trans
❖ efficiency goes to 1
▪ as tprop goes to 0
▪ as ttrans goes to infinity
❖ better performance than ALOHA: and simple, cheap,
decentralized!

Link Layer 5-34


“Taking turns” MAC protocols
channel partitioning MAC protocols:
▪ share channel efficiently and fairly at high load
▪ inefficient at low load: delay in channel access, 1/N
bandwidth allocated even if only 1 active node!
random access MAC protocols
▪ efficient at low load: single node can fully utilize
channel
▪ high load: collision overhead
“taking turns” protocols
look for best of both worlds!

Link Layer 5-35


“Taking turns” MAC protocols
polling:
❖ master node “invites”
slave nodes to transmit data
in turn poll
❖ typically used with
“dumb” slave devices master
data
❖ concerns:
▪ polling overhead
▪ latency
▪ single point of slaves
failure (master)

Link Layer 5-36


“Taking turns” MAC protocols
token passing:
T
❖ control token passed
from one node to next
sequentially.
❖ token message (nothing
❖ concerns: to send)
▪ token overhead T
▪ latency
▪ single point of failure
(token)

data
Link Layer 5-37
Cable access network
Internet frames,TV channels, control transmitted
downstream at different frequencies

cable headend

CMTS

cable
cable modem … splitter
modem
termination system

ISP upstream Internet frames, TV control, transmitted


upstream at different frequencies in time slots

❖ multiple 40Mbps downstream (broadcast) channels


▪ single CMTS transmits into channels
❖ multiple 30 Mbps upstream channels
▪ multiple access: all users contend for certain upstream
channel time slots (others assigned)
Cable access network
cable headend MAP frame for
Interval [t1, t2]

Downstream channel i
CMTS
Upstream channel j

t1 t2 Residences with cable modems

Minislots containing Assigned minislots containing cable modem


minislots request frames upstream data frames

DOCSIS: data over cable service interface spec


❖ FDM over upstream, downstream frequency channels
❖ TDM upstream: some slots assigned, some have contention
▪ downstream MAP frame: assigns upstream slots
▪ request for upstream slots (and data) transmitted
random access (binary backoff) in selected slots
Link Layer 5-39
Summary of MAC protocols
❖ channel partitioning, by time, frequency or code
▪ Time Division, Frequency Division
❖ random access (dynamic),
▪ ALOHA, S-ALOHA, CSMA, CSMA/CD
▪ carrier sensing: easy in some technologies (wire), hard
in others (wireless)
▪ CSMA/CD used in Ethernet
▪ CSMA/CA used in 802.11
❖ taking turns
▪ polling from central site, token passing
▪ bluetooth, FDDI, token ring

Link Layer 5-40


Link layer, LANs: outline
5.1 introduction, services 5.5 link virtualization:
5.2 error detection, MPLS
correction 5.6 data center
5.3 multiple access networking
protocols 5.7 a day in the life of a
5.4 LANs web request
▪ addressing, ARP
▪ Ethernet
▪ switches
▪ VLANS

Link Layer 5-41


MAC addresses and ARP
❖ 32-bit IP address:
▪ network-layer address for interface
▪ used for layer 3 (network layer) forwarding
❖ MAC (or LAN or physical or Ethernet) address:
▪ function: used ‘locally” to get frame from one interface to
another physically-connected interface (same network, in IP-
addressing sense)
▪ 48 bit MAC address (for most LANs) burned in NIC
ROM, also sometimes software settable
▪ e.g.: 1A-2F-BB-76-09-AD
hexadecimal (base 16) notation
(each “number” represents 4 bits)

Link Layer 5-42


LAN addresses and ARP
each adapter on LAN has unique LAN address

1A-2F-BB-76-09-AD

LAN
(wired or adapter
wireless)
71-65-F7-2B-08-53
58-23-D7-FA-20-B0

0C-C4-11-6F-E3-98

Link Layer 5-43


LAN addresses (more)
❖ MAC address allocation administered by IEEE
❖ manufacturer buys portion of MAC address space
(to assure uniqueness)
❖ analogy:
▪ MAC address: like Social Security Number
▪ IP address: like postal address
❖ MAC flat address ➜ portability
▪ can move LAN card from one LAN to another
❖ IP hierarchical address not portable
▪ address depends on IP subnet to which node is
attached

Link Layer 5-44


ARP: address resolution protocol
Question: how to determine
interface’s MAC address,
knowing its IP address? ARP table: each IP node (host,
router) on LAN has table
137.196.7.78
▪ IP/MAC address
mappings for some LAN
1A-2F-BB-76-09-AD
nodes:
137.196.7.23
137.196.7.14 < IP address; MAC address; TTL>
▪ TTL (Time To Live):
LAN time after which address
71-65-F7-2B-08-53 mapping will be
forgotten (typically 20
58-23-D7-FA-20-B0

min)
0C-C4-11-6F-E3-98
137.196.7.88

Link Layer 5-45


ARP protocol: same LAN
❖ A wants to send datagram
to B
▪ B’s MAC address not in ❖ A caches (saves) IP-to-
A’s ARP table. MAC address pair in its
❖ A broadcasts ARP query ARP table until
packet, containing B's IP information becomes old
address (times out)
▪ dest MAC address = FF-FF- ▪ soft state: information that
FF-FF-FF-FF times out (goes away)
▪ all nodes on LAN receive unless refreshed
ARP query ❖ ARP is “plug-and-play”:
❖ B receives ARP packet, ▪ nodes create their ARP
replies to A with its (B's) tables without intervention
from net administrator
MAC address
▪ frame sent to A’s MAC
address (unicast)
Link Layer 5-46
Addressing: routing to another LAN
walkthrough: send datagram from A to B via R
▪ focus on addressing – at IP (datagram) and MAC layer (frame)
▪ assume A knows B’s IP address
▪ assume A knows IP address of first hop router, R (how?)
▪ assume A knows R’s MAC address (how?)

A B
R
111.111.111.111
222.222.222.222
74-29-9C-E8-FF-55
49-BD-D2-C7-56-2A
222.222.222.220
1A-23-F9-CD-06-9B

111.111.111.112 111.111.111.110 222.222.222.221


CC-49-DE-D0-AB-7D E6-E9-00-17-BB-4B 88-B2-2F-54-1A-0F

Link Layer 5-47


Addressing: routing to another LAN
❖ A creates IP datagram with IP source A, destination B
❖ A creates link-layer frame with R's MAC address as dest, frame
contains A-to-B IP datagram
MAC src: 74-29-9C-E8-FF-55
MAC dest: E6-E9-00-17-BB-4B
IP src: 111.111.111.111
IP dest: 222.222.222.222

IP
Eth
Phy

A B
R
111.111.111.111
222.222.222.222
74-29-9C-E8-FF-55
49-BD-D2-C7-56-2A
222.222.222.220
1A-23-F9-CD-06-9B

111.111.111.112 111.111.111.110 222.222.222.221


CC-49-DE-D0-AB-7D E6-E9-00-17-BB-4B 88-B2-2F-54-1A-0F

Link Layer 5-48


Addressing: routing to another LAN
❖ frame sent from A to R
❖ frame received at R, datagram removed, passed up to IP

MAC src: 74-29-9C-E8-FF-55


MAC dest: E6-E9-00-17-BB-4B
IP src: 111.111.111.111
IP dest: 222.222.222.222
IP src: 111.111.111.111
IP dest: 222.222.222.222

IP IP
Eth Eth
Phy Phy

A B
R
111.111.111.111
222.222.222.222
74-29-9C-E8-FF-55
49-BD-D2-C7-56-2A
222.222.222.220
1A-23-F9-CD-06-9B

111.111.111.112 111.111.111.110 222.222.222.221


CC-49-DE-D0-AB-7D E6-E9-00-17-BB-4B 88-B2-2F-54-1A-0F

Link Layer 5-49


Addressing: routing to another LAN
❖ R forwards datagram with IP source A, destination B
❖ R creates link-layer frame with B's MAC address as dest, frame
contains A-to-B IP datagram

MAC src: 1A-23-F9-CD-06-9B


MAC dest: 49-BD-D2-C7-56-2A
IP src: 111.111.111.111
IP dest: 222.222.222.222
IP
IP Eth
Eth Phy
Phy

A B
R
111.111.111.111
222.222.222.222
74-29-9C-E8-FF-55
49-BD-D2-C7-56-2A
222.222.222.220
1A-23-F9-CD-06-9B

111.111.111.112 111.111.111.110 222.222.222.221


CC-49-DE-D0-AB-7D E6-E9-00-17-BB-4B 88-B2-2F-54-1A-0F

Link Layer 5-50


Addressing: routing to another LAN
❖ R forwards datagram with IP source A, destination B
❖ R creates link-layer frame with B's MAC address as dest, frame
contains A-to-B IP datagram

MAC src: 1A-23-F9-CD-06-9B


MAC dest: 49-BD-D2-C7-56-2A
IP src: 111.111.111.111
IP dest: 222.222.222.222
IP
IP Eth
Eth Phy
Phy

A B
R
111.111.111.111
222.222.222.222
74-29-9C-E8-FF-55
49-BD-D2-C7-56-2A
222.222.222.220
1A-23-F9-CD-06-9B

111.111.111.112 111.111.111.110 222.222.222.221


CC-49-DE-D0-AB-7D E6-E9-00-17-BB-4B 88-B2-2F-54-1A-0F

Link Layer 5-51


Addressing: routing to another LAN
❖ R forwards datagram with IP source A, destination B
❖ R creates link-layer frame with B's MAC address as dest, frame
contains A-to-B IP datagram
MAC src: 1A-23-F9-CD-06-9B
MAC dest: 49-BD-D2-C7-56-2A
IP src: 111.111.111.111
IP dest: 222.222.222.222

IP
Eth
Phy

A B
R
111.111.111.111
222.222.222.222
74-29-9C-E8-FF-55
49-BD-D2-C7-56-2A
222.222.222.220
1A-23-F9-CD-06-9B

111.111.111.112 111.111.111.110 222.222.222.221


CC-49-DE-D0-AB-7D E6-E9-00-17-BB-4B 88-B2-2F-54-1A-0F

Link Layer 5-52


Link layer, LANs: outline
5.1 introduction, services 5.5 link virtualization:
5.2 error detection, MPLS
correction 5.6 data center
5.3 multiple access networking
protocols 5.7 a day in the life of a
5.4 LANs web request
▪ addressing, ARP
▪ Ethernet
▪ switches
▪ VLANS

Link Layer 5-53


Ethernet
“dominant” wired LAN technology:
❖ cheap $20 for NIC
❖ first widely used LAN technology
❖ simpler, cheaper than token LANs and ATM
❖ kept up with speed race: 10 Mbps – 10 Gbps

Metcalfe’s Ethernet sketch


Link Layer 5-54
Ethernet: physical topology
❖ bus: popular through mid 90s
▪ all nodes in same collision domain (can collide with each
other)
❖ star: prevails today
▪ active switch in center
▪ each “spoke” runs a (separate) Ethernet protocol (nodes
do not collide with each other)

switch
star
bus: coaxial cable
Link Layer 5-55
Ethernet frame structure
sending adapter encapsulates IP datagram (or other
network layer protocol packet) in Ethernet frame
type
dest. source data
preamble address address (payload) CRC

preamble:
❖ 7 bytes with pattern 10101010 followed by one
byte with pattern 10101011
❖ used to synchronize receiver, sender clock rates

Link Layer 5-56


Ethernet frame structure (more)
❖ addresses: 6 byte source, destination MAC addresses
▪ if adapter receives frame with matching destination
address, or with broadcast address (e.g. ARP packet), it
passes data in frame to network layer protocol
▪ otherwise, adapter discards frame
❖ type: indicates higher layer protocol (mostly IP but
others possible, e.g., Novell IPX, AppleTalk)
❖ CRC: cyclic redundancy check at receiver
▪ error detected: frame is dropped
type
dest. source data
preamble address address (payload) CRC

Link Layer 5-57


Ethernet: unreliable, connectionless
❖ connectionless: no handshaking between sending and
receiving NICs
❖ unreliable: receiving NIC doesnt send acks or nacks
to sending NIC
▪ data in dropped frames recovered only if initial
sender uses higher layer rdt (e.g., TCP), otherwise
dropped data lost
❖ Ethernet’s MAC protocol: unslotted CSMA/CD wth
binary backoff

Link Layer 5-58


802.3 Ethernet standards: link & physical layers

❖ many different Ethernet standards


▪ common MAC protocol and frame format
▪ different speeds: 2 Mbps, 10 Mbps, 100 Mbps, 1Gbps,
10G bps
▪ different physical layer media: fiber, cable

MAC protocol
application and frame format
transport
network 100BASE-TX 100BASE-T2 100BASE-FX
link 100BASE-T4 100BASE-SX 100BASE-BX
physical

copper (twister fiber physical layer


pair) physical layer
Link Layer 5-59
Link layer, LANs: outline
5.1 introduction, services 5.5 link virtualization:
5.2 error detection, MPLS
correction 5.6 data center
5.3 multiple access networking
protocols 5.7 a day in the life of a
5.4 LANs web request
▪ addressing, ARP
▪ Ethernet
▪ switches
▪ VLANS

Link Layer 5-60


Ethernet switch
❖ link-layer device: takes an active role
▪ store, forward Ethernet frames
▪ examine incoming frame’s MAC address,
selectively forward frame to one-or-more
outgoing links when frame is to be forwarded on
segment, uses CSMA/CD to access segment
❖ transparent
▪ hosts are unaware of presence of switches
❖ plug-and-play, self-learning
▪ switches do not need to be configured

Link Layer 5-61


Switch: multiple simultaneous transmissions
❖ hosts have dedicated, direct A
connection to switch
B
❖ switches buffer packets C’

❖ Ethernet protocol used on each 6 1 2


incoming link, but no collisions;
full duplex 5 4 3
▪ each link is its own collision
C
domain B’

❖ switching: A-to-A’ and B-to-B’


can transmit simultaneously, A’
without collisions switch with six interfaces
(1,2,3,4,5,6)

Link Layer 5-62


Switch forwarding table

Q: how does switch know A’ A


reachable via interface 4, B’ B
reachable via interface 5? C’

❖ A: each switch has a switch 6 1 2


table, each entry:
5 4 3
▪ (MAC address of host, interface to
reach host, time stamp) B’ C

▪ looks like a routing table!


A’
Q: how are entries created, switch with six interfaces
maintained in switch table? (1,2,3,4,5,6)
▪ something like a routing protocol?

Link Layer 5-63


Switch: self-learning Source: A
Dest: A’

A A A’
❖ switch learns which hosts
can be reached through B
which interfaces C’

▪ when frame received, 6 1 2


switch “learns”
location of sender: 5 4 3
incoming LAN segment
▪ records sender/location B’ C
pair in switch table
A’

MAC addr interface TTL


A 1 60 Switch table
(initially empty)

Link Layer 5-64


Switch: frame filtering/forwarding
when frame received at switch:

1. record incoming link, MAC address of sending host


2. index switch table using MAC destination address
3. if entry found for destination
then {
if destination on segment from which frame arrived
then drop frame
else forward frame on interface indicated by entry
}
else flood /* forward on all interfaces except arriving
interface */

Link Layer 5-65


Self-learning, forwarding: example Source: A
Dest: A’

A A A’
❖ frame destination, A’,
locaton unknown: flood C’ B

❖ destination A location 6 1 2

known: selectively send A A’


5 4 3
on just one link B’ C
A’ A

A’

MAC addr interface TTL


A 1 60 switch table
A’ 4 60 (initially empty)

Link Layer 5-66


Interconnecting switches
❖ switches can be connected together
S4

S1
S3
A S2
F
D I
B C
G H
E

Q: sending from A to G - how does S1 know to


forward frame destined to F via S4 and S3?
❖ A: self learning! (works exactly the same as in
single-switch case!)
Link Layer 5-67
Self-learning multi-switch example
Suppose C sends frame to I, I responds to C

S4

S1
S3
A S2
F
D I
B C
G H
E

❖ Q: show switch tables and packet forwarding in S1, S2, S3, S4

Link Layer 5-68


Institutional network
mail server
to external
network
router web server

IP subnet

Link Layer 5-69


Switches vs. routers
application
transport
both are store-and-forward:
datagram network
▪ routers: network-layer frame link
devices (examine network- physical link frame
layer headers) physical
▪ switches: link-layer devices
(examine link-layer switch
headers)
network datagram
both have forwarding tables: link frame
physical
▪ routers: compute tables
using routing algorithms, IP application
addresses transport
▪ switches: learn forwarding network
table using flooding, link
learning, MAC addresses physical

Link Layer 5-70


VLANs: motivation
consider:
❖ CS user moves office to
EE, but wants connect to
CS switch?
❖ single broadcast domain:
▪ all layer-2 broadcast
traffic (ARP, DHCP,
Computer
unknown location of
Science Electrical
Computer
Engineering
destination MAC
Engineering address) must cross
entire LAN
▪ security/privacy,
efficiency issues

Link Layer 5-71


port-based VLAN: switch ports
VLANs grouped (by switch management
software) so that single physical
switch ……
Virtual Local
1 7 9 15
Area Network 2 8 10 16

switch(es) supporting
VLAN capabilities can … …
be configured to Electrical Engineering Computer Science
(VLAN ports 9-15)
define multiple virtual (VLAN ports 1-8)

LANS over single … operates as multiple virtual switches


physical LAN
infrastructure. 1 7 9 15
2 8 10 16

… …

Electrical Engineering Computer Science


(VLAN ports 1-8) (VLAN ports 9-16)

Link Layer 5-72


Port-based VLAN
router
❖ traffic isolation: frames to/from
ports 1-8 can only reach ports
1-8
▪ can also define VLAN based on
MAC addresses of endpoints,
rather than switch port 1 7 9 15

2 8 10 16

❖ dynamic membership: ports


can be dynamically assigned … …
among VLANs Electrical Engineering Computer Science
(VLAN ports 1-8) (VLAN ports 9-15)

❖ forwarding between VLANS: done via


routing (just as with separate
switches)
▪ in practice vendors sell combined
switches plus routers

Link Layer 5-73


VLANS spanning multiple switches
1 7 9 15 1 3 5 7
2 8 10 16 2 4 6 8

… …

Electrical Engineering Computer Science Ports 2,3,5 belong to EE VLAN


(VLAN ports 1-8) (VLAN ports 9-15) Ports 4,6,7,8 belong to CS VLAN

❖ trunk port: carries frames between VLANS defined over


multiple physical switches
▪ frames forwarded within VLAN between switches can’t be vanilla
802.1 frames (must carry VLAN ID info)
▪ 802.1q protocol adds/removed additional header fields for frames
forwarded between trunk ports

Link Layer 5-74


802.1Q VLAN frame format
type

preamble dest. source data (payload) CRC


address address 802.1 frame

type
dest. source
preamble
address address
data (payload) CRC 802.1Q frame

2-byte Tag Protocol Identifier Recomputed


(value: 81-00) CRC

Tag Control Information (12 bit VLAN ID field,


3 bit priority field like IP TOS)

Link Layer 5-75


Link layer, LANs: outline
5.1 introduction, services 5.5 link virtualization:
5.2 error detection, MPLS
correction 5.6 data center
5.3 multiple access networking
protocols 5.7 a day in the life of a
5.4 LANs web request
▪ addressing, ARP
▪ Ethernet
▪ switches
▪ VLANS

Link Layer 5-76


Multiprotocol label switching (MPLS)
❖ initial goal: high-speed IP forwarding using fixed
length label (instead of IP address)
▪ fast lookup using fixed length identifier (rather than
shortest prefix matching)
▪ borrowing ideas from Virtual Circuit (VC) approach
▪ but IP datagram still keeps IP address!

PPP or Ethernet
MPLS header IP header remainder of link-layer frame
header

label Exp S TTL

20 3 1 5
Link Layer 5-77
MPLS capable routers
❖ a.k.a. label-switched router
❖ forward packets to outgoing interface based only on
label value (don’t inspect IP address)
▪ MPLS forwarding table distinct from IP forwarding tables
❖ flexibility: MPLS forwarding decisions can differ from
those of IP
▪ use destination and source addresses to route flows to
same destination differently (traffic engineering)
▪ re-route flows quickly if link fails: pre-computed backup
paths (useful for VoIP)

Link Layer 5-78


MPLS versus IP paths

R6
D
R4 R3
R5
A
R2

❖ IP routing: path to destination determined


IP router
by destination address alone

Link Layer 5-79


MPLS versus IP paths
entry router (R4) can use different MPLS
routes to A based, e.g., on source address
R6
D
R4 R3
R5
A
R2

❖ IP routing: path to destination determined IP-only


by destination address alone router

❖ MPLS routing: path to destination can be MPLS and


IP router
based on source and dest. address
▪ fast reroute: precompute backup routes in
case of link failure
Link Layer 5-80
MPLS signaling
❖ modify OSPF, IS-IS link-state flooding protocols to
carry info used by MPLS routing,
▪ e.g., link bandwidth, amount of “reserved” link bandwidth
❖ entry MPLS router uses RSVP-TE signaling protocol to set
up MPLS forwarding at downstream routers

RSVP-TE
R6
D
R4
R5 modified
link state A
flooding

Link Layer 5-81


MPLS forwarding tables
in out out
label label dest interface
10 A 0 in out out
12 D 0 label label dest interface

8 A 1 10 6 A 1
12 9 D 0

R6
0 0
D
1 1
R4 R3
R5
0 0
A
R2 in outR1 out
label label dest interface
in out out
label label dest interface 6 - A 0
8 6 A 0
Link Layer 5-82
Link layer, LANs: outline
5.1 introduction, services 5.5 link virtualization:
5.2 error detection, MPLS
correction 5.6 data center
5.3 multiple access networking
protocols 5.7 a day in the life of a
5.4 LANs web request
▪ addressing, ARP
▪ Ethernet
▪ switches
▪ VLANS

Link Layer 5-83


Data center networks
❖ 10’s to 100’s of thousands of hosts, often closely
coupled, in close proximity:
▪ e-business (e.g. Amazon)
▪ content-servers (e.g., YouTube, Akamai, Apple, Microsoft)
▪ search engines, data mining (e.g., Google)
❖ challenges:
▪ multiple applications, each
serving massive numbers of
clients
▪ managing/balancing load,
avoiding processing,
networking, data bottlenecks
Inside a 40-ft Microsoft container,
Chicago data center
Link Layer 5-84
Data center networks
load balancer: application-layer routing
▪ receives external client requests
▪ directs workload within data center
▪ returns results to external client (hiding data
Internet center internals from client)

Border router
Load Load
balancer Access router
balancer

Tier-1 switches
B

A C Tier-2 switches

TOR switches

Server racks

1 2 3 4 5 6 7 8
Link Layer 5-85
Data center networks
❖ rich interconnection among switches, racks:
▪ increased throughput between racks (multiple routing
paths possible)
▪ increased reliability via redundancy

Tier-1 switches

Tier-2 switches

TOR switches

Server racks

1 2 3 4 5 6 7 8
Link layer, LANs: outline
5.1 introduction, services 5.5 link virtualization:
5.2 error detection, MPLS
correction 5.6 data center
5.3 multiple access networking
protocols 5.7 a day in the life of a
5.4 LANs web request
▪ addressing, ARP
▪ Ethernet
▪ switches
▪ VLANS

Link Layer 5-87


Synthesis: a day in the life of a web request
❖ journey down protocol stack complete!
▪ application, transport, network, link
❖ putting-it-all-together: synthesis!
▪ goal: identify, review, understand protocols (at all
layers) involved in seemingly simple scenario:
requesting www page
▪ scenario: student attaches laptop to campus network,
requests/receives www.google.com

Link Layer 5-88


A day in the life: scenario

browser DNS server


Comcast network
68.80.0.0/13

school network
68.80.2.0/24

web page

web server Google’s network


64.233.169.105 64.233.160.0/19

Link Layer 5-89


A day in the life… connecting to the Internet
DHCP DHCP ❖ connecting laptop needs to
DHCP UDP
IP
get its own IP address, addr
of first-hop router, addr of
DHCP

DHCP Eth
Phy DNS server: use DHCP
DHCP

❖ DHCP request encapsulated


in UDP, encapsulated in IP,
DHCP
DHCP
DHCP UDP
encapsulated in 802.3
DHCP IP Ethernet
DHCP Eth router
Phy (runs DHCP) ❖ Ethernet frame broadcast
(dest: FFFFFFFFFFFF) on LAN,
received at router running
DHCP server
❖ Ethernet demuxed to IP
demuxed, UDP demuxed to
DHCP
Link Layer 5-90
A day in the life… connecting to the Internet
DHCP DHCP ❖ DHCP server formulates
DHCP UDP DHCP ACK containing
DHCP IP client’s IP address, IP
DHCP Eth address of first-hop router
Phy for client, name & IP
address of DNS server
❖ encapsulation at DHCP
DHCP DHCP server, frame forwarded
DHCP UDP (switch learning) through
DHCP IP LAN, demultiplexing at
DHCP Eth router client
(runs DHCP)
DHCP
Phy ❖ DHCP client receives
DHCP ACK reply

Client now has IP address, knows name & addr of DNS


server, IP address of its first-hop router

Link Layer 5-91


A day in the life… ARP (before DNS, before HTTP)
DNS DNS ❖ before sending HTTP request, need
DNS UDP IP address of www.google.com:
DNS
ARP
IP DNS
ARP query Eth
Phy ❖ DNS query created, encapsulated in
UDP, encapsulated in IP,
encapsulated in Eth. To send frame
ARP
to router, need MAC address of
ARP reply Eth
Phy router interface: ARP
router ❖ ARP query broadcast, received by
(runs DHCP) router, which replies with ARP
reply giving MAC address of
router interface
❖ client now knows MAC address
of first hop router, so can now
send frame containing DNS
query
Link Layer 5-92
A day in the life… using DNS DNS
DNS UDP DNS server
DNS IP
DNS DNS DNS Eth
DNS UDP DNS Phy
DNS IP
DNS Eth
Phy
DNS
Comcast network
68.80.0.0/13

router
❖ IP datagram forwarded from
(runs DHCP) campus network into comcast
❖ IP datagram containing DNS network, routed (tables created
query forwarded via LAN by RIP, OSPF, IS-IS and/or BGP
switch from client to 1st hop routing protocols) to DNS server
router ❖ demux’ed to DNS server
❖ DNS server replies to client
with IP address of
www.google.com Link Layer 5-93
A day in the life…TCP connection carrying HTTP
HTTP
HTTP
SYNACK
SYN TCP
SYNACK
SYN IP
SYNACK
SYN Eth
Phy

❖ to send HTTP request,


client first opens TCP socket
to web server
router ❖ TCP SYN segment (step 1 in 3-
(runs DHCP)
SYNACK
SYN TCP way handshake) inter-domain
SYNACK
SYN IP routed to web server
SYNACK
SYN Eth
Phy ❖ web server responds with TCP
SYNACK (step 2 in 3-way
web server handshake)
64.233.169.105 ❖ TCP connection established!

Link Layer 5-94


A day in the life… HTTP request/reply
HTTP
HTTP HTTP ❖ web page finally (!!!) displayed
HTTP
HTTP TCP
HTTP
HTTP IP
HTTP
HTTP Eth
Phy

❖ HTTP request sent into TCP


socket
router ❖ IP datagram containing HTTP
HTTP
HTTP
HTTP TCP
(runs DHCP) request routed to
HTTP IP www.google.com
HTTP Eth ❖ web server responds with
Phy HTTP reply (containing web
page)
web server
64.233.169.105
❖ IP datagram containing HTTP
reply routed back to client
Link Layer 5-95
Chapter 5: Summary
❖ principles behind data link layer services:
▪ error detection, correction
▪ sharing a broadcast channel: multiple access
▪ link layer addressing
❖ instantiation and implementation of various link
layer technologies
▪ Ethernet
▪ switched LANS, VLANs
▪ virtualized networks as a link layer: MPLS
❖ synthesis: a day in the life of a web request

Link Layer 5-96


Chapter 5: let’s take a breath
❖ journey down protocol stack complete (except
PHY)
❖ solid understanding of networking principles,
practice
❖ ….. could stop here …. but lots of interesting
topics!
▪ wireless
▪ multimedia
▪ security
▪ network management

Link Layer 5-97

You might also like