1
P2P
Systems
and
Distributed
Hash
Tables
Sec7on
9.4.2
COS
461:
Computer
Networks
Spring
2011
Mike
Freedman
hIp://www.cs.princeton.edu/courses/archive/spring11/cos461/
2
P2P
as
Overlay
Networking
• P2P
applica7ons
need
to:
– Track
iden77es
&
IP
addresses
of
peers
• May
be
many
and
may
have
significant
churn
– Route
messages
among
peers
• If
you
don’t
keep
track
of
all
peers,
this
is
“mul7‐hop”
• Overlay
network
– Peers
doing
both
naming
and
rou7ng
– IP
becomes
“just”
the
low‐level
transport
3
Early
P2P
4
Early
P2P
I:
Client‐Server
• Napster
– Client‐server
search
1.
insert
– “P2P”
file
xfer
xyz.mp3
2.
search
xyz.mp3
?
3.
transfer
5
Early
P2P
II:
Flooding
on
Overlays
search
xyz.mp3
xyz.mp3
?
Flooding
6
Early
P2P
II:
Flooding
on
Overlays
xyz.mp3
xyz.mp3
?
Flooding
search
7
Early
P2P
II:
Flooding
on
Overlays
transfer
8
Early
P2P
II:
“Ultra/super
peers”
• Ultra‐peers
can
be
installed
(KaZaA)
or
self‐
promoted
(Gnutella)
– Also
useful
for
NAT
circumven7on,
e.g.,
in
Skype
9
Lessons
and
Limita7ons
• Client‐Server
performs
well
– But
not
always
feasible:
Performance
not
ocen
key
issue!
• Things
that
flood‐based
systems
do
well
– Organic
scaling
– Decentraliza7on
of
visibility
and
liability
– Finding
popular
stuff
– Fancy
local
queries
• Things
that
flood‐based
systems
do
poorly
– Finding
unpopular
stuff
– Fancy
distributed
queries
– Vulnerabili7es:
data
poisoning,
tracking,
etc.
– Guarantees
about
anything
(answer
quality,
privacy,
etc.)
10
Structured
Overlays:
Distributed
Hash
Tables
11
Basic
Hashing
for
Par77oning?
• Consider
problem
of
data
par77on:
– Given
document
X,
choose
one
of
k
servers
to
use
• Suppose
we
use
modulo
hashing
– Number
servers
1..k
– Place
X
on
server
i
=
(X
mod
k)
• Problem?
Data
may
not
be
uniformly
distributed
– Place
X
on
server
i
=
hash
(X)
mod
k
• Problem?
– What
happens
if
a
server
fails
or
joins
(k
k±1)?
– What
is
different
clients
has
different
es7mate
of
k?
– Answer:
All
entries
get
remapped
to
new
nodes!
12
Consistent
Hashing
insert(key 1,value)
lookup(key 1)
key1=value
key1 key2 key3
• Consistent
hashing
par77ons
key‐space
among
nodes
• Contact
appropriate
node
to
lookup/store
key
– Blue
node
determines
red
node
is
responsible
for
key1
– Blue
node
sends
lookup
or
insert
to
red
node
13
Consistent
Hashing
0000 0010 0110 1010 1100 1110 1111
URL
00011 URL
01002 URL
10113
• Par77oning
key‐space
among
nodes
– Nodes
choose
random
iden7fiers:
e.g.,
hash(IP)
– Keys
randomly
distributed
in
ID‐space:
e.g.,
hash(URL)
– Keys
assigned
to
node
“nearest”
in
ID‐space
– Spreads
ownership
of
keys
evenly
across
nodes
14
Consistent
Hashing
0
• Construc7on
14
– Assign
n
hash
buckets
to
random
points
on
mod
2k
circle;
hash
key
size
=
k
12 Bucket 4
– Map
object
to
random
posi7on
on
circle
– Hash
of
object
=
closest
clockwise
bucket
8
– successor
(key)
bucket
• Desired
features
– Balanced:
No
bucket
has
dispropor7onate
number
of
objects
– Smoothness:
Addi7on/removal
of
bucket
does
not
cause
movement
among
exis7ng
buckets
(only
immediate
buckets)
– Spread
and
load:
Small
set
of
buckets
that
lie
near
object
15
Consistent
hashing
and
failures
0
• Consider
network
of
n
nodes
14
• If
each
node
has
1
bucket
12 Bucket 4
– Owns
1/nth
of
keyspace
in
expecta<on
– Says
nothing
of
request
load
per
bucket
8
• If
a
node
fails:
– Its
successor
takes
over
bucket
– Achieves
smoothness
goal:
Only
localized
shic,
not
O(n)
– But
now
successor
owns
2
buckets:
keyspace
of
size
2/n
• Instead,
if
each
node
maintains
v
random
nodeIDs,
not
1
– “Virtual”
nodes
spread
over
ID
space,
each
of
size
1
/
vn
– Upon
failure,
v
successors
take
over,
each
now
stores
(v+1)
/
vn
16
Consistent
hashing
vs.
DHTs
Consistent
Distributed
Hashing
Hash
Tables
Rou7ng
table
size
O(n)
O(log
n)
Lookup
/
Rou7ng
O(1)
O(log
n)
Join/leave:
O(n)
O(log
n)
Rou7ng
updates
Join/leave:
O(1)
O(1)
Key
Movement
17
Distributed
Hash
Table
0000 0010 0110 1010 1100 1110 1111
0001 0100 1011
• Nodes’
neighbors
selected
from
par7cular
distribu7on
- Visual
keyspace
as
a
tree
in
distance
from
a
node
18
Distributed
Hash
Table
0000 0010 0110 1010 1100 1110 1111
• Nodes’
neighbors
selected
from
par7cular
distribu7on
- Visual
keyspace
as
a
tree
in
distance
from
a
node
- At
least
one
neighbor
known
per
subtree
of
increasing
size
/
distance
from
node
19
Distributed
Hash
Table
0000 0010 0110 1010 1100 1110 1111
• Nodes’
neighbors
selected
from
par7cular
distribu7on
- Visual
keyspace
as
a
tree
in
distance
from
a
node
- At
least
one
neighbor
known
per
subtree
of
increasing
size
/
distance
from
node
• Route
greedily
towards
desired
key
via
overlay
hops
20
The
Chord
DHT
• Chord
ring:
ID
space
mod
2160
– nodeid
=
SHA1
(IP
address,
i)
for
i=1..v
virtual
IDs
– keyid
=
SHA1
(name)
• Rou7ng
correctness:
– Each
node
knows
successor
and
predecessor
on
ring
• Rou7ng
efficiency:
– Each
node
knows
O(log
n)
well‐
distributed
neighbors
21
Basic
lookup
in
Chord
lookup (id):
if ( id > pred.id &&
id <= my.id )
return my.id;
else
Rou7ng
return succ.lookup(id);
• Route
hop
by
hop
via
successors
– O(n)
hops
to
find
des7na7on
id
22
Efficient
lookup
in
Chord
lookup (id):
if ( id > pred.id &&
id <= my.id )
return my.id;
else
Rou7ng
// fingers() by decreasing distance
for finger in fingers():
if id <= finger.id
return finger.lookup(id);
return succ.lookup(id);
• Route
greedily
via
distant
“finger”
nodes
– O(log
n)
hops
to
find
des7na7on
id
23
Building
rou7ng
tables
Rou7ng
Tables
Rou7ng
For
i
in
1...log
n:
finger[i]
=
successor
(
(my.id
+
2i
)
mod
2160
)
24
Joining
and
managing
rou7ng
• Join:
– Choose
nodeid
– Lookup
(my.id)
to
find
place
on
ring
– During
lookup,
discover
future
successor
– Learn
predecessor
from
successor
– Update
succ
and
pred
that
you
joined
– Find
fingers
by
lookup
((my.id
+
2i
)
mod
2160
)
• Monitor:
– If
doesn’t
respond
for
some
7me,
find
new
• Leave:
Just
go,
already!
– (Warn
your
neighbors
if
you
feel
like
it)
25
DHT
Design
Goals
• An
“overlay”
network
with:
– Flexible
mapping
of
keys
to
physical
nodes
– Small
network
diameter
– Small
degree
(fanout)
– Local
rou7ng
decisions
– Robustness
to
churn
– Rou7ng
flexibility
– Decent
locality
(low
“stretch”)
• Different
“storage”
mechanisms
considered:
– Persistence
w/
addi7onal
mechanisms
for
fault
recovery
– Best
effort
caching
and
maintenance
via
soc
state
26
Storage
models
• Store
only
on
key’s
immediate
successor
– Churn,
rou7ng
issues,
packet
loss
make
lookup
failure
more
likely
• Store
on
k
successors
– When
nodes
detect
succ/pred
fail,
re‐replicate
• Cache
along
reverse
lookup
path
– Provided
data
is
immutable
– …and
performing
recursive
responses
27
Summary
• Peer‐to‐peer
systems
– Unstructured
systems
• Finding
hay,
performing
keyword
search
– Structured
systems
(DHTs)
• Finding
needles,
exact
match
• Distributed
hash
tables
– Based
around
consistent
hashing
with
views
of
O(log
n)
– Chord,
Pastry,
CAN,
Koorde,
Kademlia,
Tapestry,
Viceroy,
…
• Lots
of
systems
issues
– Heterogeneity,
storage
models,
locality,
churn
management,
underlay
issues,
…
– DHTs
deployed
in
wild:
Vuze
(Kademlia)
has
1M+
ac7ve
users