0% found this document useful (0 votes)
20 views56 pages

Ds Lecture 10 11 11

Uploaded by

saikp3973
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views56 pages

Ds Lecture 10 11 11

Uploaded by

saikp3973
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Distributed Systems

Principles and Paradigms

Maarten van Steen

VU Amsterdam, Dept. Computer Science


Room R4.20, steen@cs.vu.nl

Chapter 07: Consistency & Replication


Version: November 22, 2010
Contents

Chapter
01: Introduction
02: Architectures
03: Processes
04: Communication
05: Naming
06: Synchronization
07: Consistency & Replication
08: Fault Tolerance
09: Security
10: Distributed Object-Based Systems
11: Distributed File Systems
12: Distributed Web-Based Systems
13: Distributed Coordination-Based Systems

2 / 42
Consistency & Replication

Consistency & replication

Introduction (what’s it all about)


Data-centric consistency
Client-centric consistency
Replica management
Consistency protocols

3 / 42
Consistency & Replication 7.1 Introduction

Performance and scalability

Main issue
To keep replicas consistent, we generally need to ensure that all conflicting
operations are done in the the same order everywhere

Conflicting operations
From the world of transactions:
Read–write conflict: a read operation and a write operation act
concurrently
Write–write conflict: two concurrent write operations

Issue
Guaranteeing global ordering on conflicting operations may be a costly
operation, downgrading scalability Solution: weaken consistency
requirements so that hopefully global synchronization can be avoided

4 / 42
Consistency & Replication 7.2 Data-Centric Consistency Models

Data-centric consistency models

Consistency model
A contract between a (distributed) data store and processes, in which the
data store specifies precisely what the results of read and write operations
are in the presence of concurrency.

Essential
A data store is a distributed collection of storages:
Process Process Process

Local copy

Distributed data store

5 / 42
Consistency & Replication 7.2 Data-Centric Consistency Models

Continuous Consistency

Observation
We can actually talk a about a degree of consistency:
replicas may differ in their numerical value
replicas may differ in their relative staleness
there may be differences with respect to (number and order) of
performed update operations

Conit
Consistency unit ⇒ specifies the data unit over which consistency is to
be measured.

6 / 42
Consistency & Replication 7.2 Data-Centric Consistency Models

Example: Conit
Replica A Replica B

Conit Conit
x = 6; y = 3 x = 2; y = 5

Operation Result Operation Result


< 5, B> x := x + 2 [x=2] < 5, B> x := x + 2 [x=2]

< 8, A> y := y + 2 [y=2] <10, B> y := y + 5 [y=5]

<12, A> y := y + 1 [y=3]

<14, A> x := y * 2 [x=6]

Vector clock A = (15, 5) Vector clock B = (0, 11)


Order deviation =3 Order deviation =2
Numerical deviation = (1, 5) Numerical deviation = (3, 6)

Conit (contains the variables x and y)


Each replica has a vector clock: ([known] time @ A, [known] time @ B)
B sends A operation [h5, Bi: x := x + 2]; A has made this operation
permanent (cannot be rolled back)
7 / 42
Consistency & Replication 7.2 Data-Centric Consistency Models

Example: Conit
Replica A Replica B

Conit Conit
x = 6; y = 3 x = 2; y = 5

Operation Result Operation Result


< 5, B> x := x + 2 [x=2] < 5, B> x := x + 2 [x=2]

< 8, A> y := y + 2 [y=2] <10, B> y := y + 5 [y=5]

<12, A> y := y + 1 [y=3]

<14, A> x := y * 2 [x=6]

Vector clock A = (15, 5) Vector clock B = (0, 11)


Order deviation =3 Order deviation =2
Numerical deviation = (1, 5) Numerical deviation = (3, 6)

Conit (contains the variables x and y)


A has three pending operations ⇒ order deviation = 3
A has missed one operation from B, yielding a max diff of 5 units ⇒ (1, 5)

8 / 42
Consistency & Replication 7.2 Data-Centric Consistency Models

Sequential consistency

Definition
The result of any execution is the same as if the operations of all
processes were executed in some sequential order, and the operations
of each individual process appear in this sequence in the order
specified by its program.
P1: W(x)a P1: W(x)a
P2: W(x)b P2: W(x)b
P3: R(x)b R(x)a P3: R(x)b R(x)a
P4: R(x)b R(x)a P4: R(x)a R(x)b

(a) (b)

9 / 42
Consistency & Replication 7.2 Data-Centric Consistency Models

Causal consistency
Definition
Writes that are potentially causally related must be seen by all
processes in the same order. Concurrent writes may be seen in a
different order by different processes.

P1: W(x)a
P2: R(x)a W(x)b
P3: R(x)b R(x)a
P4: R(x)a R(x)b
(a)

P1: W(x)a
P2: W(x)b
P3: R(x)b R(x)a
P4: R(x)a R(x)b
(b)
10 / 42
Consistency & Replication 7.2 Data-Centric Consistency Models

Grouping operations

Definition
Accesses to synchronization variables are sequentially consistent.
No access to a synchronization variable is allowed to be
performed until all previous writes have completed everywhere.
No data access is allowed to be performed until all previous
accesses to synchronization variables have been performed.

Basic idea
You don’t care that reads and writes of a series of operations are
immediately known to other processes. You just want the effect of the
series itself to be known.

11 / 42
Consistency & Replication 7.2 Data-Centric Consistency Models

Grouping operations

Definition
Accesses to synchronization variables are sequentially consistent.
No access to a synchronization variable is allowed to be
performed until all previous writes have completed everywhere.
No data access is allowed to be performed until all previous
accesses to synchronization variables have been performed.

Basic idea
You don’t care that reads and writes of a series of operations are
immediately known to other processes. You just want the effect of the
series itself to be known.

11 / 42
Consistency & Replication 7.2 Data-Centric Consistency Models

Grouping operations

P1: Acq(Lx) W(x)a Acq(Ly) W(y)b Rel(Lx) Rel(Ly)


P2: Acq(Lx) R(x)a R(y) NIL
P3: Acq(Ly) R(y)b

Observation
Weak consistency implies that we need to lock and unlock data
(implicitly or not).

Question
What would be a convenient way of making this consistency more or
less transparent to programmers?

12 / 42
Consistency & Replication 7.3 Client-Centric Consistency Models

Client-centric consistency models

Overview
System model
Monotonic reads
Monotonic writes
Read-your-writes
Write-follows-reads

Goal
Show how we can perhaps avoid systemwide consistency, by
concentrating on what specific clients want, instead of what should be
maintained by servers.

13 / 42
Consistency & Replication 7.3 Client-Centric Consistency Models

Consistency for mobile users

Example
Consider a distributed database to which you have access through
your notebook. Assume your notebook acts as a front end to the
database.
At location A you access the database doing reads and updates.
At location B you continue your work, but unless you access the
same server as the one at location A, you may detect
inconsistencies:
your updates at A may not have yet been propagated to B
you may be reading newer entries than the ones available at A
your updates at B may eventually conflict with those at A

14 / 42
Consistency & Replication 7.3 Client-Centric Consistency Models

Consistency for mobile users

Note
The only thing you really want is that the entries you updated and/or
read at A, are in B the way you left them in A. In that case, the
database will appear to be consistent to you.

15 / 42
Consistency & Replication 7.3 Client-Centric Consistency Models

Basic architecture

Client moves to other location


and (transparently) connects to
other replica

Replicas need to maintain


client-centric consistency

Wide-area network

Distributed and replicated database


Read and write operations
Portable computer

16 / 42
Consistency & Replication 7.3 Client-Centric Consistency Models

Monotonic reads

Definition
If a process reads the value of a data item x, any successive read
operation on x by that process will always return that same or a more
recent value.

L1: WS( x 1) R( x 1)

L2: WS( x 1;x 2) R( x 2)

L1: WS( x 1) R( x 1)

L2: WS( x 2) R( x 2)

17 / 42
Consistency & Replication 7.3 Client-Centric Consistency Models

Client-centric consistency: notation

Notation
WS(xi [t]) is the set of write operations (at Li ) that lead to version
xi of x (at time t)
WS(xi [t1 ]; xj [t2 ]) indicates that it is known that WS(xi [t1 ]) is part of
WS(xj [t2 ]).
Note: Parameter t is omitted from figures.

18 / 42
Consistency & Replication 7.3 Client-Centric Consistency Models

Monotonic reads

Example
Automatically reading your personal calendar updates from different
servers. Monotonic Reads guarantees that the user sees all updates,
no matter from which server the automatic reading takes place.

Example
Reading (not modifying) incoming mail while you are on the move.
Each time you connect to a different e-mail server, that server fetches
(at least) all the updates from the server you previously visited.

19 / 42
Consistency & Replication 7.3 Client-Centric Consistency Models

Monotonic writes

Definition
A write operation by a process on a data item x is completed before
any successive write operation on x by the same process.

L1: W( x 1)

L2: WS( x 1) W(x 2 )

L1: W( x 1)

L2: W(x 2 )

20 / 42
Consistency & Replication 7.3 Client-Centric Consistency Models

Monotonic writes

Example
Updating a program at server S2 , and ensuring that all components on
which compilation and linking depends, are also placed at S2 .

Example
Maintaining versions of replicated files in the correct order everywhere
(propagate the previous version to the server where the newest
version is installed).

21 / 42
Consistency & Replication 7.3 Client-Centric Consistency Models

Read your writes

Definition
The effect of a write operation by a process on data item x, will always
be seen by a successive read operation on x by the same process.

L1: W( x 1) Example
L2: WS( x 1;x 2) R( x 2) Updating your Web page
and guaranteeing that your
Web browser shows the
L1: W( x 1)
newest version instead of its
L2: WS( x 2) R( x 2) cached copy.

22 / 42
Consistency & Replication 7.3 Client-Centric Consistency Models

Read your writes

Definition
The effect of a write operation by a process on data item x, will always
be seen by a successive read operation on x by the same process.

L1: W( x 1) Example
L2: WS( x 1;x 2) R( x 2) Updating your Web page
and guaranteeing that your
Web browser shows the
L1: W( x 1)
newest version instead of its
L2: WS( x 2) R( x 2) cached copy.

22 / 42
Consistency & Replication 7.3 Client-Centric Consistency Models

Writes follow reads

Definition
A write operation by a process on a data item x following a previous
read operation on x by the same process, is guaranteed to take place
on the same or a more recent value of x that was read.

L1: WS( x 1) R( x 1) Example


L2: WS( x 1;x 2) W( x 2) See reactions to posted
articles only if you have the
original posting (a read
L1: WS( x 1) R( x 1) “pulls in” the corresponding
L2: WS( x 2) W( x 2) write operation).

23 / 42
Consistency & Replication 7.3 Client-Centric Consistency Models

Writes follow reads

Definition
A write operation by a process on a data item x following a previous
read operation on x by the same process, is guaranteed to take place
on the same or a more recent value of x that was read.

L1: WS( x 1) R( x 1) Example


L2: WS( x 1;x 2) W( x 2) See reactions to posted
articles only if you have the
original posting (a read
L1: WS( x 1) R( x 1) “pulls in” the corresponding
L2: WS( x 2) W( x 2) write operation).

23 / 42
Consistency & Replication 7.4 Replica Management

Distribution protocols

Replica server placement


Content replication and placement
Content distribution

24 / 42
Consistency & Replication 7.4 Replica Management

Replica placement

Essence
Figure out what the best K places are out of N possible locations.
Select best location out of N − K for which the average distance to
clients is minimal. Then choose the next best server. (Note: The
first chosen location minimizes the average distance to all clients.)
Computationally expensive.
Select the K -th largest autonomous system and place a server at
the best-connected host. Computationally expensive.
Position nodes in a d-dimensional geometric space, where
distance reflects latency. Identify the K regions with highest
density and place a server in every one. Computationally cheap.

25 / 42
Consistency & Replication 7.4 Replica Management

Replica placement

Essence
Figure out what the best K places are out of N possible locations.
Select best location out of N − K for which the average distance to
clients is minimal. Then choose the next best server. (Note: The
first chosen location minimizes the average distance to all clients.)
Computationally expensive.
Select the K -th largest autonomous system and place a server at
the best-connected host. Computationally expensive.
Position nodes in a d-dimensional geometric space, where
distance reflects latency. Identify the K regions with highest
density and place a server in every one. Computationally cheap.

25 / 42
Consistency & Replication 7.4 Replica Management

Replica placement

Essence
Figure out what the best K places are out of N possible locations.
Select best location out of N − K for which the average distance to
clients is minimal. Then choose the next best server. (Note: The
first chosen location minimizes the average distance to all clients.)
Computationally expensive.
Select the K -th largest autonomous system and place a server at
the best-connected host. Computationally expensive.
Position nodes in a d-dimensional geometric space, where
distance reflects latency. Identify the K regions with highest
density and place a server in every one. Computationally cheap.

25 / 42
Consistency & Replication 7.4 Replica Management

Replica placement

Essence
Figure out what the best K places are out of N possible locations.
Select best location out of N − K for which the average distance to
clients is minimal. Then choose the next best server. (Note: The
first chosen location minimizes the average distance to all clients.)
Computationally expensive.
Select the K -th largest autonomous system and place a server at
the best-connected host. Computationally expensive.
Position nodes in a d-dimensional geometric space, where
distance reflects latency. Identify the K regions with highest
density and place a server in every one. Computationally cheap.

25 / 42
Consistency & Replication 7.4 Replica Management

Content replication

Distinguish different processes


A process is capable of hosting a replica of an object or data:
Permanent replicas: Process/machine always having a replica
Server-initiated replica: Process that can dynamically host a
replica on request of another server in the data store
Client-initiated replica: Process that can dynamically host a
replica on request of a client (client cache)

26 / 42
Consistency & Replication 7.4 Replica Management

Content replication

Server-initiated replication
Client-initiated replication

Permanent
replicas
Server-initiated replicas

Client-initiated replicas

Clients

27 / 42
Consistency & Replication 7.4 Replica Management

Server-initiated replicas

C2
Server without
copy of file F

P
Client Server with
Q copy of F

C1
File F

Server Q counts access from C1 and


C2 as if they would come from P

Keep track of access counts per file, aggregated by considering


server closest to requesting clients
Number of accesses drops below threshold D ⇒ drop file
Number of accesses exceeds threshold R ⇒ replicate file
Number of access between D and R ⇒ migrate file
28 / 42
Consistency & Replication 7.4 Replica Management

Content distribution

Model
Consider only a client-server combination:
Propagate only notification/invalidation of update (often used for
caches)
Transfer data from one copy to another (distributed databases):
passive replication
Propagate the update operation to other copies: active replication

Note
No single approach is the best, but depends highly on available
bandwidth and read-to-write ratio at replicas.

29 / 42
Consistency & Replication 7.4 Replica Management

Content distribution: client/server system

Pushing updates: server-initiated approach, in which update is


propagated regardless whether target asked for it.
Pulling updates: client-initiated approach, in which client requests
to be updated.

Issue Push-based Pull-based


1: List of client caches None
2: Update (and possibly fetch update) Poll and update
3: Immediate (or fetch-update time) Fetch-update time
1: State at server
2: Messages to be exchanged
3: Response time at the client

30 / 42
Consistency & Replication 7.4 Replica Management

Content distribution

Observation
We can dynamically switch between pulling and pushing using leases:
A contract in which the server promises to push updates to the client
until the lease expires.

31 / 42
Consistency & Replication 7.4 Replica Management

Content distribution

Issue
Make lease expiration time dependent on system’s behavior (adaptive
leases):
Age-based leases: An object that hasn’t changed for a long time, will not
change in the near future, so provide a long-lasting lease
Renewal-frequency based leases: The more often a client requests a
specific object, the longer the expiration time for that client (for that
object) will be
State-based leases: The more loaded a server is, the shorter the
expiration times become

Question
Why are we doing all this?

32 / 42
Consistency & Replication 7.4 Replica Management

Content distribution

Issue
Make lease expiration time dependent on system’s behavior (adaptive
leases):
Age-based leases: An object that hasn’t changed for a long time, will not
change in the near future, so provide a long-lasting lease
Renewal-frequency based leases: The more often a client requests a
specific object, the longer the expiration time for that client (for that
object) will be
State-based leases: The more loaded a server is, the shorter the
expiration times become

Question
Why are we doing all this?

32 / 42
Consistency & Replication 7.4 Replica Management

Content distribution

Issue
Make lease expiration time dependent on system’s behavior (adaptive
leases):
Age-based leases: An object that hasn’t changed for a long time, will not
change in the near future, so provide a long-lasting lease
Renewal-frequency based leases: The more often a client requests a
specific object, the longer the expiration time for that client (for that
object) will be
State-based leases: The more loaded a server is, the shorter the
expiration times become

Question
Why are we doing all this?

32 / 42
Consistency & Replication 7.4 Replica Management

Content distribution

Issue
Make lease expiration time dependent on system’s behavior (adaptive
leases):
Age-based leases: An object that hasn’t changed for a long time, will not
change in the near future, so provide a long-lasting lease
Renewal-frequency based leases: The more often a client requests a
specific object, the longer the expiration time for that client (for that
object) will be
State-based leases: The more loaded a server is, the shorter the
expiration times become

Question
Why are we doing all this?

32 / 42
Consistency & Replication 7.4 Replica Management

Content distribution

Issue
Make lease expiration time dependent on system’s behavior (adaptive
leases):
Age-based leases: An object that hasn’t changed for a long time, will not
change in the near future, so provide a long-lasting lease
Renewal-frequency based leases: The more often a client requests a
specific object, the longer the expiration time for that client (for that
object) will be
State-based leases: The more loaded a server is, the shorter the
expiration times become

Question
Why are we doing all this?

32 / 42
Consistency & Replication 7.5 Consistency Protocols

Consistency protocols

Consistency protocol
Describes the implementation of a specific consistency model.
Continuous consistency
Primary-based protocols
Replicated-write protocols

33 / 42
Consistency & Replication 7.5 Consistency Protocols

Continuous consistency: Numerical errors

Principal operation
Every server Si has a log, denoted as log(Si ).
Consider a data item x and let weight(W ) denote the numerical
change in its value after a write operation W . Assume that

∀W : weight(W ) > 0

W is initially forwarded to one of the N replicas, denoted as


origin(W ). TW [i, j] are the writes executed by server Si that
originated from Sj :

TW [i, j] = ∑{weight(W )|origin(W ) = Sj & W ∈ log(Si )}

34 / 42
Consistency & Replication 7.5 Consistency Protocols

Continuous consistency: Numerical errors

Note
Actual value v (t) of x:
N
v (t) = vinit + ∑ TW [k, k]
k=1

value vi of x at replica i:
N
vi = vinit + ∑ TW [i, k ]
k=1

35 / 42
Consistency & Replication 7.5 Consistency Protocols

Continuous consistency: Numerical errors

Problem
We need to ensure that v (t) − vi < δi for every server Si .

Approach
Let every server Sk maintain a view TWk [i, j] of what it believes is the
value of TW [i, j]. This information can be gossiped when an update is
propagated.

Note
0 ≤ TWk [i, j] ≤ TW [i, j] ≤ TW [j, j]

36 / 42
Consistency & Replication 7.5 Consistency Protocols

Continuous consistency: Numerical errors

Problem
We need to ensure that v (t) − vi < δi for every server Si .

Approach
Let every server Sk maintain a view TWk [i, j] of what it believes is the
value of TW [i, j]. This information can be gossiped when an update is
propagated.

Note
0 ≤ TWk [i, j] ≤ TW [i, j] ≤ TW [j, j]

36 / 42
Consistency & Replication 7.5 Consistency Protocols

Continuous consistency: Numerical errors

Problem
We need to ensure that v (t) − vi < δi for every server Si .

Approach
Let every server Sk maintain a view TWk [i, j] of what it believes is the
value of TW [i, j]. This information can be gossiped when an update is
propagated.

Note
0 ≤ TWk [i, j] ≤ TW [i, j] ≤ TW [j, j]

36 / 42
Consistency & Replication 7.5 Consistency Protocols

Continuous consistency: Numerical errors

Solution
Sk sends operations from its log to Si when it sees that TWk [i, k ] is
getting too far from TW [k , k], in particular, when

TW [k, k] − TWk [i, k ] > δi /(N − 1)

Question
To what extent are we being pessimistic here: where does δi /(N − 1)
come from?

Note
Staleness can be done analogously, by essentially keeping track of
what has been seen last from Si (see book).

37 / 42
Consistency & Replication 7.5 Consistency Protocols

Continuous consistency: Numerical errors

Solution
Sk sends operations from its log to Si when it sees that TWk [i, k ] is
getting too far from TW [k , k], in particular, when

TW [k, k] − TWk [i, k ] > δi /(N − 1)

Question
To what extent are we being pessimistic here: where does δi /(N − 1)
come from?

Note
Staleness can be done analogously, by essentially keeping track of
what has been seen last from Si (see book).

37 / 42
Consistency & Replication 7.5 Consistency Protocols

Continuous consistency: Numerical errors

Solution
Sk sends operations from its log to Si when it sees that TWk [i, k ] is
getting too far from TW [k , k], in particular, when

TW [k, k] − TWk [i, k ] > δi /(N − 1)

Question
To what extent are we being pessimistic here: where does δi /(N − 1)
come from?

Note
Staleness can be done analogously, by essentially keeping track of
what has been seen last from Si (see book).

37 / 42
Consistency & Replication 7.5 Consistency Protocols

Primary-based protocols

Primary-backup protocol
Client Client
Primary server
for item x Backup server
W1 W5 R1 R2

W4 W4

W3 W3 Data store

W2 W3
W4

W1. Write request R1. Read request


W2. Forward request to primary R2. Response to read
W3. Tell backups to update
W4. Acknowledge update
W5. Acknowledge write completed

38 / 42
Consistency & Replication 7.5 Consistency Protocols

Primary-based protocols

Example primary-backup protocol


Traditionally applied in distributed databases and file systems that
require a high degree of fault tolerance. Replicas are often placed on
same LAN.

39 / 42
Consistency & Replication 7.5 Consistency Protocols

Primary-based protocols

Primary-backup protocol with local writes


Client Client
Old primary New primary
for item x for item x Backup server
R1 R2 W1 W3

W5 W5

W4 W4 Data store
W5 W2
W4

W1. Write request R1. Read request


W2. Move item x to new primary R2. Response to read
W3. Acknowledge write completed
W4. Tell backups to update
W5. Acknowledge update

40 / 42
Consistency & Replication 7.5 Consistency Protocols

Primary-based protocols

Example primary-backup protocol with local writes


Mobile computing in disconnected mode (ship all relevant files to user
before disconnecting, and update later on).

41 / 42
Consistency & Replication 7.5 Consistency Protocols

Replicated-write protocols

Quorum-based protocols
Ensure that each operation is carried out in such a way that a majority vote is
established: distinguish read quorum and write quorum:
Read quorum

A B C D A B C D A B C D

E F G H E F G H E F G H

I J K L I J K L I J K L
NR = 3, N W = 10 NR = 7, NW = 6 NR = 1, N W = 12
W

required: NR + NW > N and NW > N/2

42 / 42

You might also like