0% found this document useful (0 votes)
13 views40 pages

Mod 5

The document discusses concurrency control in databases, focusing on the importance of isolation and consistency among transactions. It details two-phase locking techniques, types of locks, and methods for preventing deadlocks, including timestamp-based protocols and deadlock detection strategies. The document also outlines various locking schemes such as binary locks and shared/exclusive locks, emphasizing their roles in managing concurrent transaction execution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views40 pages

Mod 5

The document discusses concurrency control in databases, focusing on the importance of isolation and consistency among transactions. It details two-phase locking techniques, types of locks, and methods for preventing deadlocks, including timestamp-based protocols and deadlock detection strategies. The document also outlines various locking schemes such as binary locks and shared/exclusive locks, emphasizing their roles in managing concurrent transaction execution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

RV Institute of Technology and Management®

RV Educational Institutions®
RV Institute of Technology and Management
(Affiliated to VTU, Belagavi)
JP Nagar 8th Phase, Bengaluru - 560076

Department of Computer Science and Engineering

Course Name: Database Management System

Course Code: BCS403

IV Semester
2022 Scheme
RV Institute of Technology and Management®

___________________________________________________________________________________________

Module-5
Concurrency Control in Databases

5.0 Introduction to Concurrency Control

Purpose of Concurrency Control

To enforce Isolation (through mutual exclusion) among conflicting transactions.


To preserve database consistency through consistency preserving execution oftransactions.
To resolve read-write and write-write conflicts.

Example:
In concurrent execution environment if T1 conflicts with T2 over a data item A, then the existing
concurrency control decides if T1 or T2 should get the A and if the other transaction is rolled-back or waits.

5.1 Two-Phase Locking Techniques for Concurrency Control

The concept of locking data items is one of the main techniques used for controlling theconcurrent
execution of transactions.
A lock is a variable associated with a data item in the database. Generally there is a lockfor each data
item in the database.
A lock describes the status of the data item with respect to possible operations that can beapplied to that
item.
It is used for synchronizing the access by concurrent transactions to the database items.A transaction
locks an object before using it
When an object is locked by another transaction, the requesting transaction must wait

5.1.1 Types of Locks and System Lock Tables

1. Binary Locks
A binary lock can have two states or values: locked and unlocked (or 1and 0).
If the value of the lock on X is 1, item X cannot be accessed by a databaseoperation that requests the item

DATABASE MANAGEMENT SYSTEMS(BCS403)

1
RV Institute of Technology and Management®

___________________________________________________________________________________________

If the value of the lock on X is 0, the item can be accessed when requested, and the lock value is
changed to 1.We refer to the current value (or state) of the lock associated with item Xas lock(X).
Two operations, lock_item and unlock_item, are used with binary locking.A transaction requests
access to an item X by first issuing a lock_item(X)operation
If LOCK(X) = 1, the transaction is forced to wait.
If LOCK(X) = 0, it is set to 1 (the transaction locks the item) and thetransaction is allowed to access item X
When the transaction is through using the item, it issues an unlock_item(X) operation, which sets
LOCK(X) back to 0 (unlocks theitem) so that X may be accessed by other transactions
Hence, a binary lock enforces mutual exclusion on the data item.
A description of the lock_item(X) and unlock_item(X) operations is shown in
Figure 5.1

Fig: 5.1 Lock and unlock operations for binary licks.

DATABASE MANAGEMENT SYSTEMS(BCS403)

2
RV Institute of Technology and Management®

___________________________________________________________________________________________

is, no interleaving should be allowed once a lock or unlock operation is started until the operation
terminates or the transaction waits
The wait command within the lock_item(X) operation is usually implemented by putting the transaction in
a waiting queue for item X until X is unlocked and the transaction can be granted access to it
Other transactions that also want to access X are placed in the same queue.Hence, the wait command is
considered to be outside the lock_item operation.
It is quite simple to implement a binary lock; all that is needed is a binary-valued variable, LOCK,
associated with each data item X in the database
In its simplest form, each lock can be a record with three fields: <Data_item_name, LOCK,
Locking_transaction> plus a queue for transactions that are waiting to access theitem
If the simple binary locking scheme described here is used, every transaction must obey the following
rules:
1. A transaction T must issue the operation lock_item(X) before any
read_item(X) or write_item(X) operations are performed in T.
2. A transaction T must issue the operation unlock_item(X) after all
read_item(X) and write_item(X) operations are completed in T.
3. A transaction T will not issue a lock_item(X) operation if it already holds the lockon
item X.
4. A transaction T will not issue an unlock_item(X) operation unless it already holdsthe lock
on item X.

2. Shared/Exclusive (or Read/Write) Locks


• Binary locking scheme is too restrictive for database items because at most, onetransaction can
hold a lock on a given item
• should allow several transactions to access the same item X if they all access X forreading
purposes only
• if a transaction is to write an item X, it must have exclusive access to X
• For this purpose, a different type of lock called a multiple-mode lock is used
• In this scheme called shared/exclusive or read/write locks there are three locking
operations: read_lock(X), write_lock(X), and unlock(X) as shown in Fig 5.12.

DATABASE MANAGEMENT SYSTEMS(BCS403)

3
RV Institute of Technology and Management®

___________________________________________________________________________________________

A read-locked item is also called share-locked because other transactions are allowed to read the item,
whereas a write-locked item is called exclusive-locked because a single transaction exclusively holds the
lock on the item
Method to implement read/write lock is to
- keep track of the number of transactions that hold a shared (read) lock on an
item in the lock table
- Each record in the lock table will have four fields:
<Data_item_name, LOCK, No_of_reads, Locking_transaction(s)>.
If LOCK(X)=write-locked, the value of locking_transaction(s) is a single transaction thatholds the
exclusive (write) lock on X
If LOCK(X)=read-locked, the value of locking transaction(s) is a list of one or moretransactions that
hold the shared (read) lock on X.

DATABASE MANAGEMENT SYSTEMS(BCS403)

4
RV Institute of Technology and Management®
___________________________________________________________________________________________

Fig 5.2: Locking and unlocking operations for two mode (read/write, or shared/exclusive) locks.

When we use the shared/exclusive locking scheme, the system must enforce the followingrules:

1. A transaction T must issue the operation read_lock(X) or write_lock(X) before any


read_item(X) operation is performed in T.
2. A transaction T must issue the operation write_lock(X) before any write_item(X)
operation is performed in T.

3 A transaction T must issue the operation unlock(X) after all read_item(X) andwrite_item(X)
operations are completed in T.3

4. A transaction T will not issue a read_lock(X) operation if it already holds a read (shared)lock or a write
(exclusive) lock on item X.

Conversion of Locks

A transaction that already holds a lock on item X is allowed under certain conditions to
convert the lock from one locked state to another
For example, it is possible for a transaction T to issue a read_lock(X) and then later to
upgrade the lock by issuing a write_lock(X) operation
- If T is the only transaction holding a read lock on X at the time it issuesthe write_lock(X) operation,
the lock can be upgraded;otherwise, the transaction must wait.

DATABASE MANAGEMENT SYSTEMS(BCS403) 5


RV Institute of Technology and Management®
___________________________________________________________________________________________

5.1.2 Guaranteeing Serializability by Two-Phase Locking

A transaction is said to follow the two-phase locking protocol if all locking operations(read_lock,
write_lock) precede the first unlock operation in the transaction
Such a transaction can be divided into two phases:
Expanding or growing (first) phase, during which new locks on items can beacquired but none can be
released
Shrinking (second) phase, during which existing locks can be released but nonew locks can be
acquired
If lock conversion is allowed, then upgrading of locks (from read-locked to write-locked) must be done
during the expanding phase, and downgrading of locks (from write-locked to read-locked) must be done in
the shrinking phase.
Transactions T1 and T2 in Fig 5.3(a) do not follow the two-phase locking protocol because the
write_lock(X) operation follows the unlock(Y) operation in T1, and similarly the write_lock(Y) operation
follows the unlock(X) operation in T2.

Fig 5.3 Transactions that do not obey two-phase locking (a) Two transactions T1 and T2 (b) Results of possible
serial schedules of T1 and T2
(c) A nonserializable schedule S that uses locks

DATABASE MANAGEMENT SYSTEMS(BCS403) 6


RV Institute of Technology and Management®

___________________________________________________________________________________________

If we enforce two-phase locking, the transactions can be rewritten as T


in Fig 5.4.
Now, the schedule shown in Fig 5.13(c) is not permitted for T1_ and T2_ (with their modified order of
locking and unlocking operations) under the rules of locking because T1_ will issue its write_lock(X)
before it unlocks item Y; consequently, when T2_ issues its read_lock(X), it is forced to wait until T1_
releases the lock by issuing an unlock (X) in the schedule.

Fig 5.4 : Transactions T1′ and T2′, which are the same as T1 and T2

If every transaction in a schedule follows the two-phase locking protocol, scheduleguaranteed to be


serializableTwo-phase locking may limit the amount of concurrency that can occur in a scheduleSome
serializable schedules will be prohibited by two-phase locking protocol

5.2 Variations of Two-Phase Locking


Basic 2PL
Technique described previously
Conservative (static) 2PL
▪ Requires a transaction to lock all the items it accesses before the transactionbegins execution by
predeclaring read-set and write-set
▪ Its Deadlock-free protocol

DATABASE MANAGEMENT SYSTEMS(BCS403)

7
RV Institute of Technology and Management®
___________________________________________________________________________________________

Strict 2PL
▪ guarantees strict schedules
▪ Transaction does not release exclusive locks until after it commits or aborts
▪ no other transaction can read or write an item that is written by T unless T hascommitted,
leading to a strict schedule for recoverability
▪ Strict 2PL is not deadlock-free

Rigorous 2PL
▪ guarantees strict schedules
▪ Transaction does not release any locks until after it commits or aborts
▪ easier to implement than strict 2PL

5.3 Dealing with Deadlock and Starvation


Deadlock occurs when each transaction T in a set of two or more transactions iswaiting for some
item that is locked by some other transaction T in the set.
Hence, each transaction in the set is in a waiting queue, waiting for one of the othertransactions in the
set to release the lock on an item.
But because the other transaction is also waiting, it will never release the lock. A simple example
is shown in Fig 5.5(a),

Fig 5.5: Illustrating the deadlock problem (a) A partial schedule of T T


state of deadlock (b) A wait-for graph for the partial schedule in (a)

8
DATABASE MANAGEMENT SYSTEMS(BCS403)
RV Institute of Technology and Management®

___________________________________________________________________________________________

Deadlock prevention protocols


One way to prevent deadlock is to use a deadlock prevention protocol
One deadlock prevention protocol, which is used in conservative two-phase locking, requires that every
transaction lock all the items it needs in advance. If any of the items cannot be obtained, none of the items
are locked. Rather, the transaction waits and then tries again to lock all the items it needs.
A second protocol, which also limits concurrency, involves ordering all the items in the database and
making sure that a transaction that needs several items will lock them according to that order. This
requires that the programmer (or the system) is aware ofthe chosen order of the items
Both approaches impractical
Some of these techniques use the concept of transaction timestamp TS(T), which is a unique identifier
assigned to each transaction
The timestamps are typically based on the order in which transactions are started; hence, if transaction T1 starts
before transaction T2, then TS(T1) < TS(T2).
The older transaction (which starts first) has the smaller timestamp value.
Protocols based on a timestampWait-die
Wound-wait
Suppose that transaction Ti tries to lock an item X but is not able to because X is locked by some other
transaction Tj with a conflicting lock. The rules followed by these schemes are:
Wait-die. If TS(Ti) < TS(Tj), then (Ti older than Tj) Ti is allowed to wait; otherwise (Ti younger than Tj)
abort Ti (Ti dies) and restart it later with the same timestamp.
Wound-wait. If TS(Ti) < TS(Tj), then (Ti older than Tj) abort Tj (Ti wounds Tj) and restart it later with the
same timestamp; otherwise (Ti younger than Tj) Ti is allowed to wait.
In wait-die, an older transaction is allowed to wait for a younger transaction, whereas a younger
transaction requesting an item held by an older transaction is aborted and restarted.
The wound-wait approach does the opposite: A younger transaction is allowed to wait for an older one,
whereas an older transaction requesting an item held by a younger transaction preempts the younger
transaction by aborting it.

DATABASE MANAGEMENT SYSTEMS(BCS403)

9
RV Institute of Technology and Management®

___________________________________________________________________________________________

Both schemes end up aborting the younger of the two transactions (the transaction that started later) that
may be involved in a deadlock, assuming that this will waste lessprocessing.
It can be shown that these two techniques are deadlock-free, since in wait-die,transactions only wait for
younger transactions so no cycle is created.
Similarly, in wound-wait, transactions only wait for older transactions so no cycle iscreated.
Another group of protocols that prevent deadlock do not require timestamps. Theseinclude the
no waiting (NW) and
cautious waiting (CW) algorithms
No waiting algorithm,
if a transaction is unable to obtain a lock, it is immediately aborted and then restarted after a certain time
delay without checking whether a deadlock will actually occur or not.
no transaction ever waits, so no deadlock will occur
this scheme can cause transactions to abort and restart needlessly
cautious waiting
try to reduce the number of needless aborts/restarts
Suppose that transaction Ti tries to lock an item X but is not able to do so because
X is locked by some other transaction Tj with a conflicting lock.The cautious waiting
rules are as follows:
If Tj is not blocked (not waiting for some other locked item), then Ti isblocked and allowed to wait;
otherwise abort Ti.
It can be shown that cautious waiting is deadlock-free, because no transaction willever wait for another
blocked transaction.
5.4 Deadlock Detection.
A second, more practical approach to dealing with deadlock is deadlock detection, where the system
checks if a state of deadlock actually exists.
This solution is attractive if we know there will be little interference among the transactions that is, if
different transactions will rarely access the same items at the same time.

DATABASE MANAGEMENT SYSTEMS(BCS403)

10
RV Institute of Technology and Management®

___________________________________________________________________________________________

This can happen if the transactions are short and each transaction locks only a few items, or if the
transaction load is light.
On the other hand, if transactions are long and each transaction uses many items, or if the transaction load
is quite heavy, it may be advantageous to use a deadlock prevention scheme.
A simple way to detect a state of deadlock is for the system to construct and maintain a
wait-for graph.
One node is created in the wait-for graph for each transaction that is currently executing. Whenever a
transaction Ti is waiting to lock an item X that is currently locked by a transaction Tj, a directed edge (Ti
Tj) is created in the wait-for graph.
When Tj releases the lock(s) on the items that Ti was waiting for, the directed edge is dropped from the
wait-for graph.We have a state of deadlock if and only if the wait-for graph has a cycle.
One problem with this approach is the matter of determining when the system should check for a deadlock.
One possibility is to check for a cycle every time an edge is added to the wait-for graph, but this may cause
excessive overhead.
Criteria such as the number of currently executing transactions or the period of time several transactions
have been waiting to lock items may be used instead to check for a cycle. Figure 5.1.5(b) shows the wait-
for graph for the (partial) schedule shown in Figure 5.15(a).
If the system is in a state of deadlock, some of the transactions causing the deadlock must be aborted.
Choosing which transactions to abort is known as victim selection.
The algorithm for victim selection should generally avoid selecting transactions that havebeen running for
a long time and that have performed many updates, and it should try instead to select transactions that
have not made many changes (younger transactions). Timeouts
Another simple scheme to deal with deadlock is the use of timeouts.This method is practical
because of its low overhead and simplicity.
In this method, if a transaction waits for a period longer than a system-defined timeout period, the system
assumes that the transaction may be deadlocked and aborts it regardless of whether a deadlock actually
exists or not.

DATABASE MANAGEMENT SYSTEMS(BCS403)

11
RV Institute of Technology and Management®

___________________________________________________________________________________________

Starvation.
Another problem that may occur when we use locking is starvation, which occurs when a transaction
cannot proceed for an indefinite period of time while other transactions in the system continue normally.
This may occur if the waiting scheme for locked items is unfair, giving priority to some transactions over
others
One solution for starvation is to have a fair waiting scheme, such as using a first- come-first-served
queue; transactions are enabled to lock an item in the order inwhich they originally requested the lock.
Another scheme allows some transactions to have priority over others but increases the priority of a
transaction the longer it waits, until it eventually gets the highest priority and proceeds.
Starvation can also occur because of victim selection if the algorithm selects the same transaction as
victim repeatedly, thus causing it to abort and never finish execution.
The algorithm can use higher priorities for transactions that have been aborted multiple times to avoid this
problem.
5.5 Concurrency Control Based on Timestamp Ordering

This guarantees serializability using transaction timestamps to order transaction executionfor an equivalent
serial schedule.

5.5.1 Timestamps
timestamp is a unique identifier created by the DBMS to identify a transaction. Typically, timestamp
values are assigned in the order in which the transactions aresubmitted to the system, so a timestamp can
be thought of as the transaction starttime.
We will refer to the timestamp of transaction T as TS(T).
Concurrency control techniques based on timestamp ordering do not uselocks;hence, deadlocks cannot
occur.
Timestamps can be generated in several ways.
One possibility is to use a counter that is incremented each time its value isassigned to a transaction. The
transaction timestamps are numbered 1, 2, 3,

DATABASE MANAGEMENT SYSTEMS(BCS403)

12
RV Institute of Technology and Management®
_________________________________________________________________________________________
__

... in this scheme. A computer counter has a finite maximum value, so the system must periodically
reset the counter to zero when no transactions are executing for some short period of time.
Another way to implement timestamps is to use the current date/time value of the system clock
and
ensure that no two timestamp values are generated during the same tick of the clock.

5.5.2 The Timestamp Ordering Algorithm


The idea for this scheme is to order the transactions based on their timestamps.

• A schedule in which the transactions participate is then serializable, and the only equivalent
serial schedule permitted has the transactions in order of their timestamp values. This is called
timestamp ordering (TO).

• The algorithm must ensure that, for each item accessed by conflicting Operations in the
schedule, the order in which the item is accessed does not violate the timestamp order.

To do this, the algorithm associates with each database item X two timestamp (TS) values:
1. read_TS(X). The read timestamp of item X is the largest timestamp among all the
timestamps of transactions that have successfully read item X that is, read_TS(X) = TS(T), where T is
the youngest transaction that has read X successfully.
2. write_TS(X). The write timestamp of item X is the largest of all thetimestamps of
transactions that have successfully written item X that is, write_TS(X) = TS(T), where T is the
youngest transaction thathas written X successfully.
Basic Timestamp Ordering (TO).
Whenever some transaction T tries to issue a read_item(X) or a write_item(X) operation, the basic TO
algorithm compares the timestamp of T with read_TS(X) and write_TS(X) to ensure that the timestamp
order of transaction execution is not violated.
If this order is violated, then transaction T is aborted and resubmitted to the system as a new
transaction with a new timestamp.
If T is aborted and rolled back, any transaction T1 that may have used a value written by T
must also be rolled back.

DATABASE MANAGEMENT SYSTEMS(BCS403)

13
RV Institute of Technology and Management®

___________________________________________________________________________________________

Similarly, any transaction T2 that may have used a value written by T1 must also be rolled back, and so on.
This effect is known as cascading rollback and is one of the problems associated with basic TO, since the
schedules produced are not guaranteed to be recoverable.
An additional protocol must be enforced to ensure that the schedules are recoverable, cascadeless, or
strict.
The basic TO algorithm :
The concurrency control algorithm must check whether conflicting operations violate the timestamp
ordering in the following two cases:
1. Whenever a transaction T issues a write_item(X) operation, the following is checked:
a. If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then abort and roll back T and reject
the operation. This should be done because some younger transaction with a timestamp greater than
TS(T) and hence after T in the timestamp ordering has already read or written the value of item X
before T had a chance to write X, thus violating the timestamp ordering.
b. If the condition in part (a) does not occur, then execute the write_item(X) operation of T
and set write_TS(X) to TS(T).
2. Whenever a transaction T issues a read_item(X) operation, the following is checked:
a. If write_TS(X) > TS(T), then abort and roll back T and reject the operation. This should be
done because some younger transaction with timestamp greater than TS(T) and hence after T in the
timestamp ordering has already written the value of item X before T had a chance to read X.
b. If write_TS(X T), then execute the read_item(X) operation of T and set read_TS(X)
to the larger of TS(T) and the current read_TS(X).
Whenever the basic TO algorithm detects two conflicting operations that occur in the incorrect order, it
rejects the later of the two operations by aborting the transaction that issued it. The schedules produced by
basic TO are hence guaranteed to be conflict serializable
Strict Timestamp Ordering (TO)
A variation of basic TO called strict TO ensures that the schedules are both strict
(for easy recoverability) and (conflict) serializable.

DATABASE MANAGEMENT SYSTEMS(BCS403)

14
RV Institute of Technology and Management®

___________________________________________________________________________________________

In this variation, a transaction T that issues a read_item(X) or write_item(X) suchthat TS(T) >
write_TS(X) has its read or write operation delayed until the transactionT that wrote the value of X (hence
TS(T ) = write_TS(X)) has committed or aborted.To implement this algorithm, it is necessary to simulate the
locking of an item X that has been written by transaction T until T is either committed or aborted. This
algorithm does not cause deadlock, since T waits for T only if TS(T) > TS(T_).

A modification of the basic TO algorithm, known as , does not


enforce conflict serializability, but it rejects fewer write operations by modifying thechecks for the
write_item(X) operation as follows:
1. If read_TS(X) > TS(T), then abort and roll back T and reject the operation.
2. If write_TS(X) > TS(T), then do not execute the write operation but continue processing.
This is because some transaction with timestamp greater than TS(T) and hence after T in the timestamp
ordering has already written the value of X. Thus, we must ignore the write_item(X) operation of T
because it is already outdatedand obsolete. Notice that any conflict arising from this situation would be
detected bycase (1).
If neither the condition in part (1) nor the condition in part (2) occurs, then executethe
write_item(X) operation of T and set write_TS(X) to TS(T).
5.6 Multiversion Concurrency Control Techniques
Other protocols for concurrency control keep the old values of a data item when the item is updated.
These are known as multiversion concurrency control, because several versions (values) of an item are
maintained
When a transaction requires access to an item, an appropriate version is chosen to maintain the
serializability of the currently executing schedule, if possible.
The idea is that some read operations that would be rejected in other techniques can still be accepted by
reading an older version of the item to maintain serializability.When a transaction writes an item, it writes
a new version and the old version(s) of the item are retained
An obvious drawback of multiversion techniques is that more storage is needed to maintain multiple versions of
the database items

DATABASE MANAGEMENT SYSTEMS(BCS403)

15
RV Institute of Technology and Management®
___________________________________________________________________________________________

5.6.1 Multiversion Technique Based on Timestamp Ordering


In this method, several versions X1, X2, ..., Xk of each data item X are maintained. For each version, the
value of version Xi and the following two timestamps are kept:
1. read_TS(Xi). The read timestamp of Xi is the largest of all the timestamps of
transactions that have successfully read version Xi.
2. write_TS(Xi). The write timestamp of Xi is the timestamp of the transactionthat
wrote the value of version Xi.
Whenever a transaction T is allowed to execute a write_item(X) operation, a newversion Xk+1 of item
X is created, with both the write_TS(Xk+1) and the read_TS(Xk+1) set to TS(T)
Correspondingly, when a transaction T is allowed to read the value of version Xi, thevalue of
read_TS(Xi) is set to the larger of the current read_TS(Xi) and TS(T).
To ensure serializability, the following rules are used:
1. If transaction T issues a write_item(X) operation, and version i of X has the highest
write_TS(Xi) of all versions of X that is also less than or equal to TS(T), and read_TS(Xi) > TS(T), then
abort and roll back transaction T; otherwise, create a new version Xj of X with read_TS(Xj) = write_TS(Xj)
= TS(T).
2. If transaction T issues a read_item(X) operation, find the version i of X that has the
highest write_TS(Xi) of all versions of X that is also less than or equal to TS(T); then return the value of Xi
to transaction T, and set the value of read_TS(Xi) to the larger of TS(T) and the current read_TS(Xi).
5.6.2 Multiversion Two-Phase Locking Using Certify Locks
In this multiple-mode locking scheme, there are three locking modes for an item:read, write, and
certify

Hence, the state of LOCK(X) for an item X can be one of read-locked, writelocked,certify-locked, or
unlocked

We can describe the relationship between read and write locks in the standardscheme by means of
the lock compatibility table shown in Fig 5.6(a)
An entry of Yes means that if a transaction T holds the type of lock specified in thecolumn header on
item X and if transaction T_ requests the type of lock specified in

DATABASE MANAGEMENT SYSTEMS(BCS403)


16
RV Institute of Technology and Management®

___________________________________________________________________________________________

the row header on the same item X, then T_ can obtain the lock because the lockingmodes are compatible

Fig 5.6: Lock compatibility tables. (a) A compatibility table for read/write locking scheme.
(b) A compatibility table for read/write/certify locking scheme.

On the other hand, an entry of No in the table indicates that the locks are not compatible,so T must wait
until T releases the lock
The idea behind multiversion 2PL is to allow other transactions T X
while a single transaction T holds a write lock on X
This is accomplished by allowing two versions for each item X; one version must alwayshave been written
by some committed transaction
The second version X is created when a transaction T acquires a write lock on the item

5.7 Validation (Optimistic) Concurrency Control Techniques


In optimistic concurrency control techniques, also known as validation or
certification techniques, no checking is done while the transaction is executing
In this scheme, updates in the transaction are not applied directly to the database itemsuntil the transaction
reaches its end

DATABASE MANAGEMENT SYSTEMS(BCS403)


17
RV Institute of Technology and Management®

___________________________________________________________________________________________

During transaction execution, all updates are applied to local copies of the data items that are kept for the
transaction
At the end of transaction execution, a validation phase checks whether any of the transaction’s
updates violate serializability.
.
There are three phases for this concurrency control protocol:
1. Read phase. A transaction can read values of committed data items from the database.
However, updates are applied only to local copies (versions) of the data items kept in the transaction
workspace.
2. Validation phase. Checking is performed to ensure that serializability will not be violated if
the transaction updates are applied to the database.
3. Write phase. If the validation phase is successful, the transaction updates are applied to the
database; otherwise, the updates are discarded and the transaction isrestarted.
The idea behind optimistic concurrency control is to do all the checks at once; hence, transaction execution
proceeds with a minimum of overhead until the validation phase is reached
The techniques are called optimistic because they assume that little interference will occur and hence that
there is no need to do checking during transaction execution.
The validation phase for Ti checks that, for each such transaction Tj that is either committed or is
in its validation phase, one of the following conditions holds:
1. Transaction Tj completes its write phase before Ti starts its read phase.
2. Ti starts its write phase after Tj completes its write phase, and the read_setof Ti
has no items in common with the write_set of Tj.
3. Both the read_set and write_set of Ti have no items in common with thewrite_set of Tj,
and Tj completes its read phase before Ti completes its read phase.

5.8 Granularity of Data Items and Multiple Granularity Locking


All concurrency control techniques assume that the database is formed of a numberof named data
items. A database item could be chosen to be one of the following:
A database record
A field value of a database recordA disk block
A whole file

DATABASE MANAGEMENT SYSTEMS(BCS403)

18
RV Institute of Technology and Management®
___________________________________________________________________________________________

The whole database


The granularity can affect the performance of concurrency control and recovery

5.8.1 Granularity Level Considerations for Locking

The size of data items is often called the data item granularity.
Fine granularity refers to small item sizes, whereas coarse granularity refers to large item sizes
The larger the data item size is, the lower the degree of concurrency permitted.
For example, if the data item size is a disk block, a transaction T that needs to lock a record B must lock
the whole disk block X that contains B because a lock is associated with the whole data item (block). Now,
if another transaction S wants to lock a different record C that happens to reside in the same block X in a
conflicting lock mode, it is forced to wait. If the data item size was a single record, transaction S would be
able to proceed, because it would be locking a different data item (record).
The smaller the data item size is, the more the number of items in the database. Because every item is
associated with a lock, the system will have a larger number of active locks to be handled by the lock
manager. More lock and unlock operations will be performed, causing a higher overhead
The best item size depends on the types of transactions involved.
If a typical transaction accesses a small number of records, it is advantageous to have the data item
granularity be one record
On the other hand, if a transaction typically accesses many records in the same file, it may be better to
have block or file granularity so that the transaction will consider all those records as one (or a few) data
items

5.8.2 Multiple Granularity Level Locking


Since the best granularity size depends on the given transaction, it seems appropriate that a database
system should support multiple levels of granularity, where the granularity level can be different for various
mixes of transactions
Fig 5.7 shows a simple granularity hierarchy with a database containing two files, each file containing
several disk pages, and each page containing several records.
This can be used to illustrate a multiple granularity level 2PL protocol, where a lock can be requested at
any level

DATABASE MANAGEMENT SYSTEMS(BCS403) 19


RV Institute of Technology and Management®

___________________________________________________________________________________________

Fig 5.7: A granularity hierarchy for illustrating multiple granularity level locking

To make multiple granularity level locking practical, additional types of locks, called
intention locks, are needed
The idea behind intention locks is for a transaction to indicate, along the path from the rootto the desired
node, what type of lock (shared or exclusive) it will require from one of the node’s descendants.

There are three types of intention locks:


1. Intention-shared (IS) indicates that one or more shared locks will be requested on some
descendant node(s).
2. Intention-exclusive (IX) indicates that one or more exclusive locks will be requested onsome
descendant node(s).
3. Shared-intention-exclusive (SIX) indicates that the current node is locked in shared mode
but that one or more exclusive locks will be requested on some descendant node(s).
The compatibility table of the three intention locks, and the shared and exclusive locks, isshown in Fig
5.8.

DATABASE MANAGEMENT SYSTEMS(BCS403)

20
RV Institute of Technology and Management®
___________________________________________________________________________________________

Fig 5.8: Lock compatibility matrix for multiple granularity locking.

The multiple granularity locking (MGL) protocol consists of the following rules:
1. The lock compatibility (based on Fig 5.8) must be adhered to.
2. The root of the tree must be locked first, in any mode.
3. A node N can be locked by a transaction T in S or IS mode only if the parentnode N
is already locked by transaction T in either IS or IX mode.
4. A node N can be locked by a transaction T in X, IX, or SIX mode only if the parent
of node N is already locked by transaction T in either IX or SIX mode.
5. A transaction T can lock a node only if it has not unlocked any node (to
enforce the 2PL protocol).
6. A transaction T can unlock a node, N, only if none of the children of node N
are currently locked by T.
The multiple granularity level protocol is especially suited when processing a mix oftransactions that
include
1. short transactions that access only a few items (records or fields) and
2. long transactions that access entire files.

DATABASE MANAGEMENT SYSTEMS(BCS403) 21


RV Institute of Technology and Management®
___________________________________________________________________________________________

NOSQL Databases and Big Data Storage Systems

5.9 Introduction to NOSQL Systems


Many companies and organizations are faced with applications that store vast amounts of data. Consider a free e-
mail application, such as Google Mail or Yahoo Mail or other similar service—this application can have millions of
users, and each user can have thousands of e-mail messages.

There is a need for a storage system that can manage all these e-mails; a structured relational SQL system may not
be appropriate because (1) SQL systems offer too many services (powerful query lan guage, concurrency control,
etc.), which this application may not need; and (2) a structured data model such the traditional relational model may
be too restrictive.

5.9.1 Characteristics of NOSQL Systems

NOSQL characteristics related to distributed databases and distributed systems. NOSQL systems emphasize high
availability, so replicating the data is inherent in many of these systems.

1. Increased reliability and fault tolerance:

The important advantage of distributed computing system is reliability. If a segment of machines in a


cluster fails then the rest of the machines continue work. When the datasets replicate at number of data
nodes, the fault tolerance increases further. The dataset in remaining segments continue the same
computations as being done at failed segment machines.

2. Flexibility makes it very easy to install, implement and debug new services in a distributed
environment.

3. Sharding is storing the different parts of data onto different sets of data nodes, clusters or
servers. For example, university students huge database, on sharding divides in databases, called shards.
Each shard may correspond to a database for an individual course and year. Each shard stores at different
nodes or servers.

4. Speed: Computing power increases in a distributed computing system as shards run parallelly on
individual data nodes in clusters independently (no data sharing between shards).

DATABASE MANAGEMENT SYSTEMS(BCS403) 22


RV Institute of Technology and Management®
___________________________________________________________________________________________

5. Scalability: Consider sharding of a large database into a number of shards, distributed for
computing in different systems. When the database expands further, then adding more machines and
increasing the number of shards provides horizontal scalability. Increased computing power and running
number of algorithms on the same machines provides vertical scalability.Resources sharing: Shared
resources of memory, machines and network architecture reduce the cost.

6. Performance: The collection of processors in the system provides higher performance than a
centralized computer, due to lesser cost of communication among machines (Cost means time taken up in
communication).

NOSQL characteristics related to data models and query languages. NOSQL systems emphasize performance and
flexibility over modeling power and complex querying.

1.Not Requiring a Schema: The flexibility of not requiring a schema is achieved in many NOSQL systems by
allowing semi-structured, self describing data. The users can specify a partial schema in some systems to improve
storage efficiency, but it is not required to have a schema in most of the NOSQL systems.

As there may not be a schema to specify constraints, any constraints on the data would have to be pro grammed in
the application programs that access the data items.

There are various languages for describing semistructured data, such as JSON (JavaScript Object Notation) and
XML.JSON is used in several NOSQL systems, but other methods for describing semi-structured data can also be
used.

2. Less Powerful Query Languages: Many applications that use NOSQL systems may not require a powerful
query language such as SQL, because search (read) queries in these systems often locate single objects in a
single file based on their object keys.

NOSQL systems typically provide a set of functions and operations as a programming API (application
programming interface), so reading and writing the data objects is accomplished by calling the appropriate
operations by the programmer. In many cases, the operations are called CRUD operations, for Create, Read,
Update, and Delete. In particular, many NOSQL systems do not provide join operations as part of the query
language itself; the joins need to be implemented in the application programs.

3. Versioning: Some NOSQL systems provide storage of multiple versions of the data items, with the timestamps
of when the data version was created. We will discuss this aspect in Section 24.5 when we present column-
based NOSQL systems.

DATABASE MANAGEMENT SYSTEMS(BCS403) 23


RV Institute of Technology and Management®
___________________________________________________________________________________________

5.9.2 Categories of NOSQL System

NOSQL systems have been characterized into four major categories, with some additional categories that
encompass other types of systems. The most common categorization lists the following four major categories.

1.Document-based NOSQL systems: These systems store data in the form of documents using well-known
formats, such as JSON (JavaScript Object Notation). Documents are accessible via their document id, but
can also be accessed rapidly using other indexes.

2. NOSQL key-value stores: These systems have a simple data model based on fast access by the key to the
value associated with the key; the value can be a record or an object or a document or even have a more
complex data structure.

3. Column-based or wide column NOSQL systems: These systems partition a table by column into
column families, where each column family is stored in its own files. They also allow versioning of data
values.

4. Graph-based NOSQL systems: Data is represented as graphs, and related nodes can be found by
traversing the edges using path expressions.

5.10 The CAP Theorem

CAP Theorem Among C, A and P, two are at least present for the application/service/process.
Consistency means all copies have the same value like in traditional DBs. Availability means at
least one copy is available in case a partition becomes inactive or fails. For example, in web
applications, the other copy in the other partition is available. Partition means parts which are active
but may not cooperate (share) as in distributed DBs.

1. Consistency in distributed databases means that all nodes observe the same data at the same
time. Therefore, the operations in one partition of the database should reflect in other related
partitions in case of distributed database. Operations, which change the sales data from a specific
showroom in a table should also reflect in changes in related tables which are using that sales data.

2. Availability means that during the transactions, the field values must be available in other
partitions of the database so that each request receivesa response on success as well as failure.
(Failure causes the response to request from the replicate of data). Distributed databases require
transparency between one another. Network failure may lead to data unavailability in a certain

DATABASE MANAGEMENT SYSTEMS(BCS403) 24


RV Institute of Technology and Management®
___________________________________________________________________________________________

partition in case of no replication. Replication ensures availability.

3. Partition means division of a large database into different databases without affecting the
operations on them by adopting specified procedures.

4. Partition tolerance: Refers to continuation of operations as a whole even in case of message


loss, node failure or node not reachable.

Brewer's CAP (consistency, Availability and fartition Tolerance) theorem demonstrates that any
distributed system cannot guarantee C, A and P together.

5.11 Document-Based NOSQL Systems and MongoDB

Document-based or document-oriented NOSQL systems typically store data as collections of


similar documents. These types of systems are also sometimes known as document stores.

The individual documents somewhat resemble complex objects or XML documents, but a major
difference between document-based systems versus object and object-relational systems and XML
is that there is no requirement to specify a schema—rather, the documents are specified as self-
describing data.

There are many document-based NOSQL systems, including MongoDB and CouchDB, among
many others. It is important to note that different systems can use different models, languages, and
implementation methods.

5.11.1 MongoDB Data Model

MongoDB documents are stored in BSON (Binary JSON) format, which is a varia tion of JSON with some
additional data types and is more efficient for storage than JSON. Individual documents are stored in a
collection.

For example, the fol lowing command can be used to create a collection called project to hold PROJECT
objects from the COMPANY database.

db.createCollection(“project”, { capped : true, size : 1310720, max : 500 } )

The first parameter “project” is the name of the collection, which is followed by an optional document that
specifies collection options. In our example, the collection is capped, this means it has upper limits on its

DATABASE MANAGEMENT SYSTEMS(BCS403) 25


RV Institute of Technology and Management®
___________________________________________________________________________________________

storage space (size) and number of documents (max). The capping parameters help the system choose the
storage options for each collection.

A collection does not have a schema. The structure of the data fields in documents is chosen based on how
documents will be accessed and used, and the user can choose a normalized design. Fig:5.9 shows a
simplified MongoDB document.

DATABASE MANAGEMENT SYSTEMS(BCS403) 26


RV Institute of Technology and Management®
___________________________________________________________________________________________

Fig 5.9 Example of simple documents in MongoDB. (a) Denormalized document design with embedded
subdocuments. (b) Embedded array of document references. (c) Normalized documents (d) Inserting the
documents into collections.

5.11.2 MongoDB CRUD Operations

MongoDb has several CRUD operations, where CRUD stands for (create, read, update, delete).
Documents can be created and inserted into their collections using the insert operation, whose
format is: db..insert().

The delete operation is called remove, and the format is: db..remove() The documents to be
removed from the collection are specified by a Boolean con dition on some of the fields in the
collection documents.

There is also an update operation, which has a condition to select certain documents, and a $set
clause to specify the update. It is also possible to use the update operation to replace an existing
document with another one but keep the same ObjectId.

For read queries, the main command is called find, and the format is: db..find() General Boolean
conditions can be specified as , and the documents in the collection that return true are selected for
the query result.

5.11.3 MongoDB Distributed Systems Characteristics

Most MongoDB updates are atomic if they refer to a single document, but MongoDB also provides
a pattern for specifying transactions on multiple documents.

Since MongoDB is a distributed system, the two-phase commit method is used to ensure atomicity
and consistency of Mult document transactions.

DATABASE MANAGEMENT SYSTEMS(BCS403) 27


RV Institute of Technology and Management®
___________________________________________________________________________________________

Replication in MongoDB. The concept of replica set is used in MongoDB to create multiple copies
of the same data set on different nodes in the distributed system, and it uses a variation of the
master-slave approach for replication.

For example, suppose that we want to replicate a particular document collection C. A replica set
will have one primary copy of the collection C stored in one node N1, and at least one secondary
copy (replica) of C stored at another node N2. Additional copies can be stored in nodes N3, N4,
etc., as needed, but the cost of storage and update (write) increases with the number of replicas.

• In MongoDB replication, all write operations must be applied to the primary copy and then
propagated to the secondaries.

• For read operations, the user can choose the particular read preference for their application.

• The default read preference processes all reads at the primary copy, so all read and write operations
are per formed at the primary node.

• In this case, secondary copies are mainly to make sure that the system continues operation if the
primary fails, and MongoDB can ensure that every read request gets the latest document value.

• To increase read perfor mance, it is possible to set the read preference so that read requests can be
processed at any replica (primary or secondary).

• However, a read at a secondary is not guaran teed to get the latest version of a document because
there can be a delay in propagating writes from the primary to the secondaries.

Sharding in MongoDB. When a collection holds a very large number of documents or requires a large storage
space, storing all the documents in one node can lead to performance problems, particularly if there are many
user operations accessing the documents concurrently using various CRUD operations.

• Sharding of the documents in the collection—also known as horizontal partitioning— divides the
documents into disjoint partitions known as shards.

• This allows the system to add more nodes as needed by a process known as horizontal scaling of the
distributed system, and to store the shards of the collection on different nodes to achieve load balancing.

• Each node will process only those operations pertaining to the documents in the shard stored at that
node.

DATABASE MANAGEMENT SYSTEMS(BCS403) 28


RV Institute of Technology and Management®
___________________________________________________________________________________________

There are two ways to partition a collection into shards in MongoDB—

• Range partitioning: Range partitioning creates the chunks by specifying a range of key values; for
example, if the shard key values ranged from one to ten million, it is possible to create ten ranges—1
to 1,000,000; 1,000,001 to 2,000,000; … ; 9,000,001 to 10,000,000—and each chunk would contain
the key values in one range.

• Hash partitioning: Hash partitioning applies a hash function h(K) to each shard key K, and the
partitioning of keys into chunks is based on the hash values.

• Both partitions require that the user specify a particular document field to be used as the basis for
partitioning the documents into shards.

• The partitioning field—known as the shard key in MongoDB—must have two characteristics: it
must exist in every document in the collection, and it must have an index.

• The ObjectId can be used, but any other field possessing these two character istics can also be used
as the basis for sharding.

• The values of the shard key are divided into chunks either through range partitioning or hash
partitioning, and the documents are partitioned based on the chunks of shard key values.

• When sharding is used, MongoDB queries are submitted to a module called the query router, which
keeps track of which nodes contain which shards based on the particu lar partitioning method used
on the shard keys.

• The query (CRUD operation) will be routed to the nodes that contain the shards that hold the
documents that the query is requesting.

• If the system cannot determine which shards hold the required docu ments, the query will be
submitted to all the nodes that hold shards of the collection.

• Sharding and replication are used together; sharding focuses on improving perfor mance via load
balancing and horizontal scalability, whereas replication focuses on ensuring system availability
when certain nodes fail in the distributed system.

DATABASE MANAGEMENT SYSTEMS(BCS403) 29


RV Institute of Technology and Management®
___________________________________________________________________________________________

5.12 NOSQL Key-Value Stores


The simplest way to implement a schema-less data store is to use key-value pairs. The data store
characteristics are high performance, scalability and flexibility. Data retrieval is fast in key-value pairs data
store. The key is a unique identifier associated with a data item and is used to locate this data item rapidly. The
value is the data item itself, and it can have very different formats for different key-value storage systems.

The main characteristic of key-value stores is the fact that every value (data item) must be associated with a
unique key, and that retrieving the value by supplying the key must be very fast. There are many systems that
fall under the key-value store label. Some of the systems are: DynamoDB, Voldemort Key-Value Distributed
Data Store.

5.12.1 DynamoDB Overview

The DynamoDB system is an Amazon product and is available as part of Amazon’s AWS/SDK platforms
(Amazon Web Services/Software Development Kit). It can be used as part of Amazon’s cloud computing
services, for the data storage component.

DynamoDB data model.

• The basic data model in DynamoDB uses the concepts of tables, items, and attributes.

• A table in DynamoDB does not have a schema; it holds a collection of self-describing items.

• Each item will consist of a number of (attribute, value) pairs, and attribute values can be
single-valued or multivalued.

• So basically, a table will hold a collection of items, and each item is a self-describing record
(or object). DynamoDB also allows the user to specify the items in JSON for mat, and the
system will convert them to the internal storage format of DynamoDB.

• When a table is created, it is required to specify a table name and a primary key; the primary
key will be used to rapidly locate the items in the table.

• Thus, the primary key is the key and the item is the value for the DynamoDB key-value store.

• The primary key attribute must exist in every item in the table. The primary key can be one of

DATABASE MANAGEMENT SYSTEMS(BCS403) 30


RV Institute of Technology and Management®
___________________________________________________________________________________________

the following two types:

■ A single attribute. The DynamoDB system will use this attribute to build a hash index on the items
in the table. This is called a hash type primary key. The items are not ordered in storage on the value
of the hash attribute.

■ A pair of attributes. This is called a hash and range type primary key. The primary key will be a
pair of attributes (A, B): attribute A will be used for hash ing, and because there will be multiple
items with the same value of A, the B values will be used for ordering the records with the same A
value.

5.12.2 Voldemort Key-Value Distributed Data Store

Voldemort is an open source system available through Apache 2.0 open source licens ing rules. It is
based on Amazon’s DynamoDB. The focus is on high performance and horizontal scalability, as well
as on providing replication for high availability and sharding for improving latency (response time)
of read and write requests.

Some of the features of Voldemort are as follows:

■ Simple basic operations. A collection of (key, value) pairs is kept in a Voldemort store. The
operation s.put(k, v) inserts an item as a key-value pair with key k and value v. The operation
s.delete(k) deletes the item whose key is k from the store, and the operation v = s.get(k) retrieves the
value v associated with key k. The application can use these basic operations to build its own
requirements. At the basic storage level, both keys and values are arrays of bytes (strings).

■ High-level formatted data values. The values v in the (k, v) items can be specified in JSON
(JavaScript Object Notation), and the system will convert between JSON and the internal storage
format. Other data object formats can also be specified if the application provides the conversion
(also known as serialization) between the user format and the storage format as a Serializer class.

■ Consistent hashing for distributing (key, value) pairs. A variation of the data distribution
algorithm known as consistent hashing is used in Volde mort for data distribution among the nodes in
the distributed cluster of nodes. A hash function h(k) is applied to the key k of each (k, v) pair, and
h(k) determines where the item will be stored.

DATABASE MANAGEMENT SYSTEMS(BCS403) 31


RV Institute of Technology and Management®
___________________________________________________________________________________________

■Consistency and versioning. Voldemort uses a method similar to the one developed for
DynamoDB for consistency in the presence of replicas. Basically, concurrent write operations are
allowed by different processes so there could exist two or more different values associated with the
same key at different nodes when items are replicated. Consistency is achieved when the item is read
by using a technique known as versioning and read repair.

5.12.3 Examples of Other Key-Value Stores


Oracle key-value store. Oracle has one of the well-known SQL relational data base systems, and

Oracle also offers a system based on the key-value store concept.this system is called the Oracle

NoSQL Database.

Redis key-value cache and store. Redis differs from the other systems discussed here because it
caches its data in main memory to further improve performance. It offers master-slave replication
and high availability, and it also offers persistence by backing up the cache to disk.

Apache Cassandra. Cassandra is a NOSQL system that is not easily categorized into one
category; it is sometimes listed in the column-based NOSQL category or in the key-value category.
If offers features from several NOSQL categories and is used by Facebook as well as many other
customers.

5.13 Column-Based or Wide Column NOSQL Systems


Another category of NOSQL systems is known as column-based or wide column systems. The
Google distributed storage system for big data, known as BigTable, is a well-known example of
this class of NOSQL systems, and it is used in many Google applications that require large
amounts of data storage, such as Gmail.

Big Table uses the Google File System (GFS) for data storage and distribution. An open source
system known as Apache Hbase is somewhat similar to Google Big Table, but it typically uses
HDFS (Hadoop Distributed File System) for data stor age. HDFS is used in many cloud computing
applications.Hbase can also use Amazon’s Simple Storage System (known as S3)

for data storage.

DATABASE MANAGEMENT SYSTEMS(BCS403) 32


RV Institute of Technology and Management®
___________________________________________________________________________________________

BigTable (and Hbase) is sometimes described as a sparse multidimensional distributed persistent


sorted map, where the word map means a collection of (key, value) pairs (the key is mapped to the
value).

5.13.1 Hbase Data Model and Versioning

Hbase data model.

▪ The data model in Hbase organizes data using the concepts of namespaces, tables, column
families, column qualifiers, columns, rows, and data cells.

▪ A column is identified by a combination of (column family:column qualifier). Data is


stored in a self-describing form by associating columns with data values, where data
values are strings.

▪ Hbase also stores multiple versions of a data item, with a timestamp associated with each
version, so versions and timestamps are also part of the Hbase data model.

▪ It is important to note that the use of the words table, row, and column is not identical to
their use in relational databases, but the uses are related.

▪ Tables and Rows. Data in Hbase is stored in tables, and each table has a table name. Data
in a table is stored as self-describing rows. Each row has a unique row key.

▪ Column Families, Column Qualifiers, and Columns. A table is associated with one or
more column families. Each column family will have a name, and the column families
associated with a table must be specified when the table is created and cannot be changed
later. When the data is loaded into a table, each column family can be associated with
many column qualifiers, but the column qualifiers are not specified as part of creating a
table. So the column qualifiers make the model a self-describing data model because the
qualifiers can be dynamically specified as new rows are created and inserted into the table.
A column is specified by a combination of ColumnFamily:ColumnQualifier.

▪ Versions and Timestamps. Hbase can keep several versions of a data item, along with the
timestamp associated with each version. The timestamp is a long integer number that
represents the system time when the version was created, so newer versions have larger
timestamp values.

DATABASE MANAGEMENT SYSTEMS(BCS403) 33


RV Institute of Technology and Management®
___________________________________________________________________________________________

▪ Cells. A cell holds a basic data item in Hbase. The key (address) of a cell is specified by a
combination of (table, rowid, columnfamily, columnqualifier, timestamp).

▪ Namespaces. A namespace is a collection of tables. A namespace basically specifies a


collection of one or more tables that are typically used together by user applications.

5.13.2 Hbase CRUD Operations

Hbase has low-level CRUD (create, read, update, delete) operations, as in many of the NOSQL
systems. The formats of some of the basic CRUD operations in Hbase are shown in Fig:5.10.

Fig:5.10 Examples in Hbase. (a) Creating a table called EMPLOYEE with three column families: Name,
Address, and Details.

(b) Inserting some in the EMPLOYEE table; different rows can have different self-describing column
qualifiers (Fname, Lname, Nickname, Mname, Minit, Suffix, … for column family Name; Job, Review,
Supervisor, Salary for column family Details). (c) Some CRUD operations of Hbase.

DATABASE MANAGEMENT SYSTEMS(BCS403) 34


RV Institute of Technology and Management®
___________________________________________________________________________________________

5.13.3 Hbase Storage and Distributed System Concepts

Each Hbase table is divided into a number of regions, where each region will hold a range of the
row keys in the table; this is why the row keys must be lexicographically ordered. Each region will
have a number of stores, where each column family is assigned to one store within the region.
Regions are assigned to region servers (storage nodes) for storage. A master server (master node) is
responsible for monitoring the region servers and for splitting a table into regions and assigning
regions to region servers.

Hbase uses the Apache Zookeeper open source system for services related to managing the
naming, distribution, and synchronization of the Hbase data on the distributed Hbase server nodes,
as well as for coordination and replication services. Hbase also uses Apache HDFS (Hadoop
Distributed File System) for distributed file services. So Hbase is built on top of both HDFS and
Zookeeper. Zookeeper can itself have several replicas on several nodes for availability, and it keeps
the data it needs in main memory to speed access to the master servers and region servers.

5.14 NOSQL Graph Databases and Neo4j


In graph databases or graph oriented NOSQL systems, the data is represented as a graph, which is
a collection of vertices (nodes) and edges. Both nodes and edges can be labeled to indicate the
types of entities and relationships they represent, and it is generally possible to store data
associated with both individual nodes and individual edges.

Many systems can be categorized as graph databases. Neo4j is one of graph database system
which is used in many applications. Neo4j is an open source system, and it is implemented in Java.

5.14.1 Neo4j Data Model

The data model in Neo4j organizes data using the concepts of nodes and relationships. Both nodes
and relationships can have properties, which store the data items associated with nodes and
relationships. Nodes can have labels; the nodes that have the same label are grouped into a
collection that identifies a subset of the nodes in the database graph for querying purposes.

A node can have zero, one, or several labels. Relationships are directed; each relationship has a
start node and end node as well as a relationship type, which serves a similar role to a node label by

DATABASE MANAGEMENT SYSTEMS(BCS403) 35


RV Institute of Technology and Management®
___________________________________________________________________________________________

identifying similar relationships that have the same relationship type. Properties can be specified
via a map pattern, which is made of one or more “name : value” pairs enclosed in curly brackets;
for example {Lname : ‘Smith’, Fname : ‘John’, Minit : ‘B’}.

Neo4j uses high-level declarative query language Cypher.Neo4j has many options and variations
for creating nodes and relationships using various scripting interfaces.

Labels and properties. When a node is created, the node label can be specified. It is also
possible to create nodes without any labels. In Fig 5.10(a), the node labels are EMPLOYEE,
DEPARTMENT, PROJECT, and LOCATION, and the created nodes correspond to some of the
data from the COMPANY database.

Relationships and relationship types: The → specifies the direction of the relationship, but the
relationship can be traversed in either direction. The relationship types (labels) in Figure 24.4(b)
are WorksFor, Manager, LocatedIn, and WorksOn; only relationships with the relationship type
WorksOn have properties (Hours) in Fig 5.10(b).

Paths. A path specifies a traversal of part of the graph. It is typically used as part of a query to
specify a pattern, where the query will retrieve from the graph data that matches the pattern.

Optional Schema. A schema is optional in Neo4j. Graphs can be created and used without a
schema, but in Neo4j version 2.0, a few schema-related functions were added. The main features
related to schema creation involve creating indexes and constraints based on the labels and
properties.

Indexing and node identifiers. When a node is created, the Neo4j system creates an internal
unique system-defined identifier for each node. To retrieve individual nodes using other properties
of the nodes efficiently, the user can create indexes for the collection of nodes that have a particular

label.

DATABASE MANAGEMENT SYSTEMS(BCS403) 36


RV Institute of Technology and Management®
___________________________________________________________________________________________

Fig: 5.10 Examples in Neo4j using the Cypher language. (a) Creating some nodes. (b) Creating some
relationships.

5.14.2 The Cypher Query Language of Neo4j

Neo4j has a high-level query language, Cypher. There are declarative commands for

creating nodes and relationships as well as for finding nodes and relationships based on specifying
patterns. Deletion and modification of data is also possible in Cypher.

DATABASE MANAGEMENT SYSTEMS(BCS403) 37


RV Institute of Technology and Management®
___________________________________________________________________________________________

Fig:5.11 Examples in Neo4j using the Cypher language. (c) Basic syntax of Cypher queries.

A Cypher query is made up of clauses. Fig 5.11 summarizes some of the main clauses that can be
part of a Cyber query. The Cyber language can specify complex queries and updates on a graph
database.

5.14.3 Neo4j Interfaces and Distributed System Characteristics

Neo4j has other interfaces that can be used to create, retrieve, and update nodes and relationships
in a graph database. It also has two main versions: the enterprise edition, which comes with
additional capabilities, and the community edition.

Additional features of Neo4j are:

Enterprise edition vs. community edition. Both editions support the Neo4j graph data model
and storage system, as well as the Cypher graph query language, and several other interfaces,
including a high-performance native API, language drivers for several popular programming
languages, such as Java, Python, PHP, and the REST (Representational State Transfer) API.

Graph visualization interface. Neo4j has a graph visualization interface, so that a subset of the
nodes and edges in a database graph can be displayed as a graph. This tool can be used to visualize
query results in a graph representation.

Master-slave replication. Neo4j can be configured on a cluster of distributed system nodes


(computers), where one node is designated the master node. The data and indexes are fully
replicated on each node in the cluster. Various ways of synchronizing the data between master and

DATABASE MANAGEMENT SYSTEMS(BCS403) 38


RV Institute of Technology and Management®
___________________________________________________________________________________________

slave nodes can be configured in the distributed cluster.

Caching. A main memory cache can be configured to store the graph data for improved
performance.

Logical logs. Logs can be maintained to recover from failures.

_End of Module-5

DATABASE MANAGEMENT SYSTEMS(BCS403) 39

You might also like