Our
Course
Lectures
twice
a
week
Switch
and
Router
Architectures
Tuesdays,
12:00-13:45,
Rothberg
A-510
Wednesday
10:00-11:45,
Rothberg
A-510
Prof.
David
Hay
Rothberg
A-411
dhay@cs.huji.ac.il
50%
-
Home
assignments
50%
-
Final
Project
Introduction
1-1
Introduction
Calendar
1-2
Syllabus
March
Algorithms
at
the
end
of
the
wire
Sunday
Monday
Tuesday
Wednesday
Thursday
10
11
12
15
16
17
18
19
Design
principles
Scheduling
algorithms
for
packet
switches
Packet
classicaVon
(algorithms/hardware)
AcVve
queue
management
(=buer
management)
April
Passover Vacation
12
13
14
15
16
19
20
21
22
23
26
27
28
29
30
Introduction
Switch
and
router
architectures
Middleboxes
So_ware
Dened
Networks/Network
FuncVon
VirtualizaVon
1-3
Introduction
1-4
Syllabus
Course
Material
Focuses
on
the
a
single
device
(mostly)
Mostly,
only
in
scienVc
papers
(very
recent
stu)
Some
material
appears
in
Switches
Routers
Firewalls
Network
Intrusion
DetecVon
Systems
(NIDS)
Computer
Networking:
A
Top
Down
Approach.
Jim
Kurose,
Keith
Ross,
2007.
Network
Algorithmics:
An
Interdisciplinary
Approach
to
Designing
Fast
Networked
Devices.
George
Varghese,
2004.
Lecture
notes
at
Stanford
University
hip://www.stanford.edu/class/ee384x/
Interdisciplinary
subject
System
design
Hardware
Algorithms
Queuing
Networking
Introduction
1-6
Routers
process
headers
001100
011001
110101
11011
Data
Head
Head
http://www.adalovelace.com
http://www.adalovelace.com
Ada Lovelace
DeniVons
What
a
Big
Router
Looks
Like
Cisco GSR 12816
Juniper T640
Capacity: 640Gb/s
Power: 5kW
0.5 m
Capacity: 320Gb/s
Power: 3kW
0.5 m
3
4
5
6
7
1.8 m
N = number of linecards. Typically 8-32 per chassis
For
R=40
RGb/s,
packet=
1Gb/s,
40b:
2.5Gb/s, 10Gb/s, 40Gb/s, 100Gb/s
= line-rate.
Packets
per
second
=
Capacity
109
of router = N x R
Time
for
each
packet
=
1
ns
0.9 m
0.6 m
0.75 m
What
MulVrack
Routers
Looks
Like?
Cisco CRS-1
Juniper T1600 + TX Matrix
Data
Header
01000111100010101001110100011001
1. Internet Address
2. Age
3. Checksum to protect header
Barebones
Router
Lookup internet address
Check and update checksum
Check and update age
Barebones
Router
Barebones
Router
Boilenecks
Early
days:
Modied
Computer
Memory,
memory,
Must run at rate N x R
R
R
R
R
R
R
R
R
3
Bottlenecks
2nd
GeneraVon
Router
Early
days:
Modied
Computer
R
R
R
R
FuncVon
more
important
than
speed
1993
(WWW)
changed
everything
We
badly
needed
Some
new
architecture
Some
theory
Some
higher
performance
3rd
GeneraVon
Router:
Switch
Nx
R
1xR
Arbiter Arbiter Arbiter Arbiter
Arbiter Arbiter Arbiter Arbiter
Arbiter
Arbiter
44thth
GGeneration
eneraVon
Router
Router
More
4th
GeneraVon
Routers
Mul$rack;
op$cs
inside
Multirack;
optics
inside
Alcatel 7670 RSP
Juniper TX
TX
Optical links
Avici TSR
Cisco CRS-1
100s
of metres
Linecards
Switch
Power
consumpVon
per
chassis
th
GeneraVon
Router?
(Future)
4th5Generation
Router
Op$cal
switch
fabricinside
Multirack;
optics
16
14
Power (kW)
12
Optical links
10
8
6
4
2
100s
of metres
0
1990 1993 1996 1999 2002 2003 2004
Linecards
Optical
Switch
Backbone
Router
Capacity
Trend
Does
Not
Seem
to
Stop
10 Tb/s
Larger
Bandwidth
100 Gb/s
1 Gb/s
x 5 every 3 years
Bandwidth-Hungry
Applications
10 Mb/s
1987
1990
1993
1996
1999
2002
2005
2008
29
Why
it
is
dicult
to
build
faster
routers?
Link
speeds
followed
Moores
Law
User
demand
doubled
every
year
30
Routers
are
even
more
complex
End
of
the
Internet
end-to-end
model
+
Uncertainty
+
Lack
of
compeVVon
Router
capacity
limited
by
memory
speed
DRAM
is
no
faster
now
than
10
years
ago
a
Explosion
of
complexity
in
routers
a
Routers
should
have
fallen
behind
aPower-hungry,
expensive,
unreliable
aMany
people
predict
signicant
change
in
next
few
years
(So_ware
Dened
Networks)
IPv6,
mulVcast,
ACLs,
rewall,
virtual
rouVng,
MPLS,
Diserv,
IntServ,
RSVP,
ATM,
IP
Tunnels,
IPSec,
VPNs,
Virtual
rouVng,
Calea,
Boilenecks
Packet
Processing
Examples
Memory,
memory,
Address
Lookup
(IP/Ethernet)
Where
to
send
an
incoming
packet?
Use
output-port
3,
to
send
packets
to
MAC
address
01:23:45:67:89:ab
Exact
Match
Use
output-port
4,
to
send
packets
to
des$na$on
network
111.15/16
-
(Longest
Prex
Match)
Firewall,
ACL
Which
packet
to
accept
or
deny?
Drop
all
packets
from
evil
source
network
66.66/16
on
ports
6-666
Usually
needs
5
elds:
source-address,
dest-address,
source-
port,
dest-port,
protocol
34
Packet
Processing
Examples
Packet
Processing
Rate
Intrusion
DetecVon
Schemes
Deep
Packet
inspecVon
(DPI)
Drop
all
packets
that
contains
the
string
EvilWorm
anywhere
within
the
packet
Line
40B
packets
(Mpkt/s)
622Mb/s
2.5Gb/s
10Gb/s
40Gb/s
1.94
7.81
31.25
125
SNORT
rule
set
1. Lookup mechanism must be simple and easy to implement
2. (Surprise?) Memory access time is the long-term bottleneck
35
Memory
Technology
(not
up
to
date)
Simplest
Task:
Exact
Matching
Technology
Single
chip
density
$/chip
($/MByte)
Access
speed
Watts/
chip
Mostly
in
Layer
2
(bridges/switches)
Networking
DRAM
64 MB
$30-$50
($0.50-$0.75)
40-80ns
0.5-2W
SRAM
4 MB
$20-$30
($5-$8)
4-8ns
1-3W
TCAM
1 MB
$200-$250
($200-$250)
4-8ns
15-30W
Connects
two
Ethernet
networks
Wire-speed
forwarding:
Each
Vme
a
packet
arrives
at
a
switch,
forward
it
according
to
the
desVnaVon
MAC
address
Store/update
also
the
source
MAC
address
(learning)
Should
be
done
at
wire
speed
Note: Price, speed and power are manufacturer and market dependent.
Numbers are a bit outdated but give the general idea
SoluVon
1:
Binary
Search
Scaling
using
Hashing
MAC
addresses
have
values
which
can
be
sorted
Thus,
when
keeping
them
sorted,
one
can
perform
a
binary
search
on
the
array
and
nd
the
right
MAC
address
However,
each
iteraVon
is
a
memory
access
log
N
memory
accesses
works
ne
(even
using
DRAM)
for
small
speed,
N
(around
10Mb/s,
8K
values)
but
doesnt
scale
for
large
N/higher
speeds
(not
even
for
100
Mb/s,
64K
values)
Using
faster
hardware
(SRAM)
wont
really
solve
the
problem
(and
it
is
more
expensive)
Hashing
is
much
faster
than
binary
search
on
average,
however
much
slower
on
the
worst
case
(up
to
linear
Vme)
However,
one
can
choose
(pre-compute)
good
hash
funcVons,
so
the
number
of
collision
can
be
small
and
bounded
PrecomputaVon
takes
a
lot
of
Vme,
but
addresses
are
not
added
in
rapid
rate
Applying
the
hash
funcVons
is
done
on
wire-speed
More
sophisVcated
data
structure/hashing
techniques
can
also
be
applied
(e.g.
to
reduce
memory)
Bloom
Filters,
ngerprinVng,
etc.
10
Example
(Gigaswitch,
1994)
IP
Addressing
N
=
64K;
binary
search
takes
16
memory
accesses
For
each
48-bit
address
addr,
we
rst
apply
h(addr),
to
get
48-bit
value:
11111111 00010001 10000111 00000000
255
16
LSB
are
the
hash-table
entry
index
(64K
entries)
Each
entry
is
a
balanced
binary
tree
of
height
at
most
3,
sorted
by
the
remaining
32
MSB
The
hash
funcVon
should
guarantee
that
no
more
than
8
addresses
are
in
the
same
tree,
and
that
we
can
disambiguate
between
addresses
using
the
32
MSB
17
134
255.17.134.0
Dotted quad notation
Solve
corner-cases
separately
(CAM);
rehashing
4
memory
accesses
IPv4 addresses have 32 bits
From Griffin Tutorial in Sigcomm 2001
ExponenVal
Growth
in
RouVng
Table
Sizes
Class A
0nnnnnnn hhhhhhhh hhhhhhhh hhhhhhhh
Class B
10nnnnnn nnnnnnnn hhhhhhhh hhhhhhhh
Class C
110nnnnn nnnnnnnn nnnnnnnn hhhhhhhh
n = network address bit
Number of BGP routes advertised
Classful
Addresses
h = host identifier bit
Leads to a rigid, flat, inefficient use of address space
44
11
CIDR:
Hierarchal
Address
AllocaVon
Prefixes are key to Internet scalability
Address allocated in contiguous chunks (prefixes)
Routing protocols and packet forwarding based on prefixes
Today, routing tables contain ~150,000-200,000 prefixes
12.0.0.0/16
12.1.0.0/16
12.2.0.0/16
12.3.0.0/16
12.0.0.0/8
:
:
:
12.254.0.0/16
:
:
:
12.3.0.0/24
12.3.1.0/24
:
:
12.3.254.0/24
Hierarchical
addressing:
route
aggregaVon
Hierarchical
addressing
allows
ecient
adverVsement
of
rouVng
informaVon:
Organization 0
200.23.16.0/23
Organization 1
200.23.18.0/23
Organization 2
200.23.20.0/23
Organization 7
12.253.0.0/19
12.253.32.0/19
12.253.64.0/19
12.253.96.0/19
12.253.128.0/19
12.253.160.0/19
.
.
.
.
.
.
Fly-By-Night-ISP
Send me anything
with addresses
beginning
200.23.16.0/20
Internet
200.23.30.0/23
ISPs-R-Us
RFC
1519:
Classless
Inter-Domain
RouVng
(CIDR)
Send me anything
with addresses
beginning
199.31.0.0/16
Classless
Forwarding
Use two 32 bit numbers to represent a network.
Network number = IP address + Mask
IP Address : 12.4.0.0
Address
IP Mask: 255.254.0.0
00001100 00000100 00000000 00000000
Destination =12.5.9.16
------------------------------payload
Prefix
Mask
11111111 11111110 00000000 00000000
Network Prefix
Usually written as 12.4.0.0/15
for hosts
OK
better
Next Hop
Interface
0.0.0.0/0
10.14.11.33
Output-port 1
12.0.0.0/8
10.14.22.19
Output-port 2
even better
12.4.0.0/15 10.1.3.77
Output-port 3
best!
12.5.8.0/23 10.1.3.23
Output-port 4
IP Forwarding Table
12
Longest
Prex
Match
is
Harder
than
Exact
Match
The
desVnaVon
address
of
an
arriving
packet
does
not
carry
with
it
the
informaVon
to
determine
the
length
of
the
longest
matching
prex
Hence,
one
needs
to
search
among
the
space
of
all
prex
lengths;
as
well
as
the
space
of
all
prexes
of
a
given
length
Problem
DeniVon
192.2.2/24, R3
192.2.0/22, R2
Current
PracVcal
Data
Caching
works
poorly
in
backbone
routers
250,000
concurrent
ows
Wire
speed
lookup
needed
for
40-byte
packets
50%
are
TCP
acks
32
nsec/packet
in
10
Gbs
and
8
nsec/packet
in
40
Gbs
Lookup
dominated
by
memory
accesses
speed
is
measured
by
memory
accesses
Prex
length
8-32
Today
150,000
prexes
with
growth
1
million
prexes
Higher
speeds
need
SRAM
Worth
minimizing
memory
LPM
in
IPv4
Use
32
exact
match
algorithms
for
LPM!
Exact match
against prefixes
of length 1
192.2.2/24
192.2.0/22 200.11.0/22
Network Address
200.11.0/22, R4
Exact match
against prefixes
of length 2
192.2.0.1 192.2.2.100200.11.0.33
LPM: Find the most specific route, or the longest
matching prefix among all the prefixes matching the
destination address of an incoming packet
Priority
Encode
and pick
Port
Exact match
against prefixes
of length 32
We
can
start
with
prex
length
8
13