2phase commit
introduction
A distributed database is in many ways more complex than a centralized one. Some of the
failures that may be experienced in a distributed DBMS include:
-Loss of message
-Failure at a site which is hosting a subtransaction
-Failure of communication link
2 phase commit protocol
In a distributed transactions,all global transactions must satisfy atomicity principle so as to
ensure consistency after updating records on all sites
This protocol ensures all the sites involved in a global transaction are ready to commit before all
changes become permanent.
They are two modules in this protocol:
Coordinator
participant
2 phase commit protocol
A 2PC protocol consists of two main phases:
Voting protocol.
Decision protocol.
2 phase commit protocol
Voting protocol- this is where the participant sites decide whether they are ready to commit a
transaction or not.
Decision protocol - this is where the coordinator site decides whether the transaction should be
committed or aborted.
Voting phase
Voting Phase
Example:
A transaction T is to be initiated at site s1,s2,s3,s4.
The coordinating site-s1
Participating site-s2,s2,s4
The coordinator site writes [T,Prepare](log) to its log and sends this message to all other participating
sites indicating that it is ready to commit.
If a site is ready it writes [T,Ready](log) to its log and sends the message to the coordinator site.
If a site is ready it writes [T,Not Ready](log) to its log and sends the message to the coordinator site as
shown in the previous diagram.
Decision phase
If the coordinator site receives [Ready,T] message from all participants, it writes [T,Commit](log)
to its own log and sends the message to the participants sites. The participant sites write the
same message into their logs and finally perform the commit.
If the coordinator site receives [Not Ready,T] message from any one participant, it writes
[T,Abort](log) to its own log and sends the message to the participants sites. The participant sites
write the same message into their logs and finally perform abort the commit process. All this is
done to ensure ATOMIC commitments of transactions always.
Decision phase
Decision Phase
Key takeaways:
The coordinator aborts a transaction commit if and only of at least one participant votes to abort it.
A coordinator commits a transaction if and only if all the participant sites vote to commit it.
Termination Protocols for 2pc
Termination protocol is invoked when a coordinator or participant times out due to not
receiving an expected message.
For coordinators in states INITIAL, WAITING, DECIDED, and COMPLETED:
Timeout in WAITING state: Coordinator waiting for all participants' votes, can't commit but can
decide to globally abort.
Timeout in DECIDED state: Coordinator waiting for acknowledgments, resends global decision
to sites not acknowledged.
Termination in a participant
Thee simplest termination protocol is to leave the participant process blocked until communication with the
coordinator is re-established
For participants in states INITIAL, PREPARED, ABORTED, and COMMITTED:
Timeout in INITIAL state: Participant waits for a PREPARE message, can unilaterally abort the transaction, or
send an ABORT message to the coordinator.
Timeout in PREPARED state: Participant waits for a global commit/abort instruction, unable to change its vote.
Can use the cooperative termination protocol to contact other participants for the decision.
Cooperative termination protocol reduces blocking likelihood but doesn't eliminate it entirely.
If the coordinator fails and all participants detect it, they can elect a new coordinator to resolve the block.
Recovery process for 2pc
They are three stages for failure of the coordinator .They include:
1)Failure in INITIAL state:
-Coordinator hasn't started the commit procedure
-Recovery initiates the commit procedure.
2)Failure in WAITING state:
-Coordinator sent the PREPARE message.
-Hasn't received all responses or abort responses.
-Recovery restarts the commit procedure.
3)Failure in DECIDED state:
-Coordinator instructed participants to globally abort or commit.
-If all acknowledgments received, successful completion.
-Otherwise, initiate the previously discussed termination protocol.
Participant Failure
1)Failure in INITIAL State:
-Participant hasn't voted on the transaction.
-Can unilaterally abort the transaction on recovery.
-Coordinator can't make a global commit decision without the participant's vote.
2)Failure in PREPARED State:
-Participant has sent its vote to the coordinator.
-Recovery follows the previously discussed termination protocol.
3)Failure in ABORTED/COMMITTED States:
-Participant has completed the transaction.
-No further action is needed upon restart.
Election Protocols
● If the participants detect the failure of the coordinator (by timing out), they can elect a new site to act as
coordinator.
● One election protocol is for the sites to have an agreed-upon linear ordering.
● We assume that site Si has order i in the sequence,
● The lowest being the coordinator, and that each site knows the identification and ordering of the other
sites in the system, some of which may also have failed
Cont…
● One election protocol asks each operational participant to send a message to the sites with a greater
identification number.
● That means, site Si would send a message to sites Si+1, Si+2, . . . , Sn in that order.
● If a site Sk receives a message from a lower-numbered participant, then Sk knows that it is not to be the
new coordinator and stops sending messages.
● This protocol is relatively efficient and most participants stop sending messages quite quickly.
Cont…
● Eventually, each participant will know whether there is an operational participant with a lower number.
● If there is not, the site becomes the new coordinator.
● If the newly elected coordinator also times out during this process, the election protocol is invoked again.
● If there are no operational sites with a lower number, the site forces all higher-numbered sites to let it
become the new coordinator, regardless of whether there is a new coordinator.
Bully Election Algorithm
● Any participant can trigger a request for an election. However, one participant can only issue a singular
request at one point in time. The algorithm operates by identifying all the non-faulty participants and
electing the participant with the largest identifier as the coordinator.
● There can be three kinds of messages that participants would exchange between each other during the
bully algorithm:
a) Election message
b) OK message
c) Coordinator message
Cont…
● Suppose there are n different participants with
unique identifiers ranging from 0 to n-1.
● Given that 5 is the highest ID amongst the nodes, it
is the leader. Assuming that the leader crashes and
node 2 are the first to notice the breakdown of the
leader, the node with ID 2 initiates an election
Cont…
● Accordingly, the participant with ID 2 sends an election
message to all nodes with an ID greater than its own ID.
● Participants 3 and 4 both receive the election message.
However, since 5 is crashed, it does not respond or
receives the ping. Participants 3 and 4 accordingly initiate
election iteratively by broadcasting election messages to
those with IDs greater than their own respective IDs.
Cont…
● Moreover, they respond with an OK message to the
node that sent them a request for election since they
are not the participants with the highest IDs. This
means that 3 and 4 would confirm to participant 2
that they are alive and non-crashed.
Cont…
● Node 4 receives the election message and
accordingly responds with an OK message to node 3
to confirm its operating state. As of the previous
case, node 5 does not respond as it is unavailable.
● Node 4 has already broadcasted an election message
to node 5, and received no response. It simply figures
out that node 5 has crashed, and the new node with
the highest ID is node 4.
Cont…
● Node 4 figures out that it is the node with the
highest ID, then sends a coordinator message to all
of the alive nodes.
● Consecutively, all nodes are updated with the new
leader.
Communication topologies for 2PC
● There are several different communication topologies (ways of exchanging messages) that can be employed
to implement 2PC.
a. Centralized 2PC
● In this topology, all communication is funneled through the coordinator.
Cont…
● A number of improvements to the centralized 2PC protocol have been proposed that attempt to improve
its overall performance, either by reducing the number of messages that need to be exchanged or by
speeding up the decision-making process.
● These improvements depend upon adopting different ways of exchanging messages.
b. Linear 2PC
● In linear 2PC, sites are ordered 1, 2, . . . ,n, where site 1 is the
coordinator and the remaining sites are the participants.
● The 2PC protocol is implemented by a forward chain of
communication from coordinator to participant n for the
voting phase and a backward chain of communication from
participant n to the coordinator for the decision phase
Cont…
● Linear 2PC can be improved if the voting process adopts the forward linear chaining of messages while the
decision process adopts the centralized topology, so that site n can broadcast the global decision to all
participants in parallel.
3. Distributed 2PC
● Uses a distributed topology.
● The coordinator sends the PREPARE message to all participants, which in turn send their decision to all
other sites.
● Each participant waits for messages from the other sites before deciding whether to commit or abort the
transaction.
● This in effect eliminates the need for the decision phase of the 2PC protocol, as the participants can reach
a decision consistently, but independently.