0% found this document useful (0 votes)
28 views113 pages

Module 5

Module 5 covers transaction processing concepts including concurrency control, transaction states, and recovery mechanisms. It discusses various transaction problems like lost updates and dirty reads, as well as the significance of ACID properties for maintaining database integrity. Additionally, it introduces NoSQL databases and their characteristics, alongside the importance of system logs for transaction recovery.

Uploaded by

Aann Mariya Sabu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views113 pages

Module 5

Module 5 covers transaction processing concepts including concurrency control, transaction states, and recovery mechanisms. It discusses various transaction problems like lost updates and dirty reads, as well as the significance of ACID properties for maintaining database integrity. Additionally, it introduces NoSQL databases and their characteristics, alongside the importance of system logs for transaction recovery.

Uploaded by

Aann Mariya Sabu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 113

Module 5

• Transaction Processing Concepts - overview of concurrency control,


Transaction Model, Significance of concurrency Control & Recovery,
Transaction States, System Log, Desirable Properties of transactions.
• Serial schedules, Concurrent and Serializable Schedules, Conflict
equivalence and conflict serializability, Recoverable and cascade-less
schedules, Locking, Two-phase locking and its variations. Log-based
recovery, Deferred database modification, check-pointing.
• Introduction to NoSQL Databases, Main characteristics of Key-value DB
(examples from: Redis), Document DB (examples from: MongoDB)
• Main characteristics of Column - Family DB (examples from:
Cassandra) and Graph DB (examples from : ArangoDB)
Transaction Processing

• An action or series of action performed by a user or an application program


which reads or updates the content of database
• A transaction is an executing program that forms a logic unit of database
processing
• The operations performed in a transaction include one or more of database
operations like insert, delete, update or retrieve data.
• Example for transaction processing :Airline reservation, Banking
transaction etc
• One way of specifying the transaction boundaries is by specifying explicit
begin transaction and end transaction
• The basic database access operation that a transaction can include are as
follows

• read_item(X) − reads data item from storage to main memory

• modify_item() − change value of item in the main memory.

• write_item() − write the modified value from main memory to storage


Single User and Multi User Database Systems
Interleaved Processing Vs Parallel
► The figure shows two
processing processes, A and B,
executing concurrently in
an interleaved fashion.
► if the computer system has
multiple hardware
processors (CPUs), parallel
processing of multiple
processes is possible, as
illustrated by processes C
and D in the figure.
Source: https://www.geeksforgeeks.org/
Concurrent Transactions

• Several transactions will be executed in concurrent manner

• Concurrency control and recovery mechanism are mainly concerned with


database commands in a transaction

• If this concurrent execution is uncontrolled, it may lead to problems


Why Concurrency Control Is Needed?
• When multiple transactions execute concurrently in an uncontrolled
or unrestricted manner, then it might lead to several problems.
• These problems are commonly referred to as concurrency problems
in database environment.
• The concurrency problems that can occur in database are:
– Temporary Update Problem (Dirty Read Problem)
– Incorrect Summary Problem
– Lost Update Problem
– Unrepeatable Read Problem
Lost update problem

• In the lost update problem, update


done to a data item by a transaction
is lost as it is overwritten by the
update done by another transaction.
• At time t1, transaction TX reads the value of
account A, i.e., $300 (only read).
• At time t2, transaction TX deducts $50 from
account A which becomes $250 (only deducted
and not updated/write).
• Alternately, at time t3, transaction TY reads the
value of account A which will be $300 only
because TX didn't update the value yet.
• At time t4, transaction TY adds $100 to account
A which becomes $400 (only added but not
updated/written).
• At time t6, transaction TX writes the value of
account A that will be updated as $250 only, as
TY didn't update the value yet.
• Similarly, at time t7, transaction TY writes the
values of account A, so it will write as done at
time t4 which will be $400. It means the value
written by TX is lost, i.e., $250 is lost.
Temporary Update Problem

• Temporary update or dirty read problem occurs when one transaction


updates an item and fails, then the updated item is read by another
transaction before the item is changed or reverted back to its last or orginal
value.
► In this example, if transaction 1 fails for some
reason then X will revert back to its previous
value.
► But transaction 2 has already read the incorrect
value of X.
Incorrect Summary Problem

• Consider a situation, where one transaction is applying the aggregate


function on some records while another transaction is updating these
records.
• The aggregate function may calculate some values before the values have
been updated and others after they are updated.

► In this example, transaction 2 is calculating the sum


of some records while transaction 1 is updating
them.
► Therefore the aggregate function may calculate
some values before they have been updated and
others after they have been updated.
Unrepeatable Read Problem

• A transaction T reads the same item twice and the item is changed
by another transaction T between two reads. So T receives different
values for its two reads of the same item.
• ex: A customer inquires about the seat availability on several
flights. when a customer is searching for the ticket availability, on a
particular flight and before completing the reservation, and it may
end up reading a different value for the item
Unrepeatable Read Problem:

• The unrepeatable problem occurs when two or more read operations


of the same transaction read different values of the same variable.

► In this example, once transaction 2 reads the variable


X, a write operation in transaction 1 changes the
value of the variable X.
► Thus, when another read operation is performed by
transaction 2, it reads the new value of X which was
updated by transaction 1.
Why Recovery Is Needed
• Whenever a transaction is submitted to a DBMS for execution
– The system is responsible for making sure that either all the operations
in the transaction are completed successfully and their effect is
recorded permanently in the database(Committed) or that the
transaction does not have any effect on the database or any other
transactions(Aborted)
Types of Failures

• Computer failure (system crash)


• Transaction or system error
• Local errors or exception conditions detected by the transaction
• Disk failure
• Physical problems and catastrophes

Types of Failures

• A computer failure (system crash). A hardware, software, or network error


occurs in the computer system during transaction execution.
• A transaction or system error. Some operation in the transaction may cause it
to fail,
– Integer overflow or division by zero.
– Erroneous parameter values
– Logical programming error.
– User may interrupt the transaction during its execution
Why Recovery Is Needed

• Types of Failures
• Local errors or exception conditions detected by the transaction
– During transaction execution, certain conditions may occur that
necessitate cancellation of the transaction.
– Eg: Insufficient account balance in a banking database, may cause a
transaction, such as a fund withdrawal, to be canceled.
• Disk failure
– Some disk blocks may lose their data because of a read or write
malfunction or because of a disk read/write head crash.
– This may happen during a read or a write operation of the transaction.
• Physical problems and catastrophes.
– This refers to an endless list of problems that includes power or air-
conditioning failure, fire, theft, sabotage, overwriting disks or tapes
by mistake, and mounting of a wrong tape by the operator.
What to be done if a transaction fails

• If a transaction fails after executing some of its operation but before


executing all of them, the operations already executed must be undone

• Whenever a failure occurs, the system must quickly recover from the
failure
States of Transactions
States of Transactions

1. Once a transaction states execution, it becomes active. It can issue READ or WRITE
operation.
2. Once the READ and WRITE operations complete, the transactions becomes partially
committed state.
3. Next, some recovery protocols need to ensure that a system failure will not result in an
inability to record changes in the transaction permanently. If this check is a success, the
transaction commits and enters into the committed state.
4. If the check is a fail, the transaction goes to the Failed state.
5. If the transaction is aborted while it’s in the active state, it goes to the failed state. The
transaction should be rolled back to undo the effect of its write operations on the
database.
6. The terminated state refers to the transaction leaving the system.
Transaction Operations
► The low level operations performed in a transaction are −
► begin_transaction − A marker that specifies start of transaction execution.
► read_item or write_item − Database operations that may be interleaved with
main memory operations as a part of transaction.
► end_transaction − A marker that specifies end of transaction.
► commit − A signal to specify that the transaction has been successfully
completed in its entirety and will not be undone.
► rollback − A signal to specify that the transaction has been unsuccessful and
so all temporary changes in the database are undone. A committed transaction
cannot be rolled back.
THE SYSTEM LOG

• To be able to recover from failures that affect transactions, the system


maintains a log to keep track of all transaction operations that affect the
values of database items, as well as other transaction information that may
be needed to permit recovery from failures.
• The log is a file that is kept on disk
• The following are the types of entries- called log records -that are written to
the log file
The System Log
1. [start_transaction, T]. Indicates that transaction T has started execution.
2. [write_item, T, X, old_value, new_value]. Indicates that transaction T has
changed the value of database item X from old_value to new_value.
3. [read_item, T, X]. Indicates that transaction T has read the value of database
item X.
4. [commit, T]. Indicates that transaction T has completed successfully, and
affirms that its effect can be committed (recorded permanently) to the
database.
5. [abort, T]. Indicates that transaction T has been aborted.
Properties of Transactions
► ACID Properties are used for maintaining the integrity of database during
transaction processing.
► ACID in DBMS stands for Atomicity, Consistency, Isolation, and Durability.
Atomicity
► By this, we mean that either the entire transaction takes place at once or
doesn’t happen at all.
► There is no midway i.e. transactions do not occur partially.
► Each transaction is considered as one unit and either runs to completion or is
not executed at all.
► It involves the following two operations.
► Abort: If a transaction aborts, changes made to database are not visible.

► Commit: If a transaction commits, changes made are visible.

► Atomicity is also known as the ‘All or nothing rule’.


Atomicity
► Consider the following transaction T consisting of T1 and T2: Transfer of 100
from account X to account Y.

► If the transaction fails after completion of T1 but before completion of T2.(


say, after write(X) but before write(Y)), then amount has been deducted from
X but not added to Y.
► This results in an inconsistent database state. Therefore, the transaction must
be executed in entirety in order to ensure correctness of database state.
Consistency
► This means that integrity constraints must be maintained so that the database
is consistent before and after the transaction.
► It refers to the correctness of a database.
► Referring to the example above,
► The total amount before and after the transaction must be maintained.
► Total before T occurs = 500 + 200 = 700.
► Total after T occurs = 400 + 300 = 700.
► Therefore, database is consistent. Inconsistency occurs in case T1 completes
but T2 fails. As a result T is incomplete.
Isolation

• In a database system where more than one transaction is being executed


simultaneously and in parallel
• The term 'isolation' means separation.
• In DBMS, Isolation is the property of a database where no data should affect
the other one and may occur concurrently. In short, the operation on one
database should begin when the operation on the first database gets complete
• The property of isolation states that all the transactions will be carried out and
executed as if it is the only transaction in the system.
• No transaction will affect the existence of any other transaction.
• Isolation can be ensured trivially by running transactions serially, that is one
after the other.
• Although multiple transactions may execute concurrently each transaction
must be unaware of either concurrently executing the transaction.
• Intermediate transaction results must be hidden from other concurrently
executed transactions.
Isolation
► Let X= 500, Y = 500. Consider two transactions T and T”.

► Suppose T has been executed till Read (Y) and then T’’ starts. As a result ,
interleaving of operations takes place due to which T’’ reads correct value of
X but incorrect value of Y and sum computed by
► T’’: (X+Y = 50, 000+500=50, 500) is thus not consistent with the sum at end
of transaction:
► T: (X+Y = 50, 000 + 450 = 50, 450).
► This results in database inconsistency, due to a loss of 50 units. Hence,
transactions must take place in isolation and changes should be visible only
after they have been made to the main memory.
Durability
► The database should be durable enough to hold all its latest updates even if
the system fails or restarts.
► If a transaction updates a chunk of data in a database and commits, then the
database will hold the modified data.
► If a transaction commits but the system fails before the data could be written
on to the disk, then that data will be updated once the system springs back
into action.
Schedules

• A schedule (or history) S of n transactions T1, T2, ..., Tn is an ordering of


the operations of the transactions.
• Operations from different transactions can be interleaved in the schedule S.
• A Schedule is defined as an execution sequence of transactions
• It is the arrangement of transaction operations
• A shorthand notation for describing a schedules uses the symbols b,r,w,e,c
and a
• b- begin_transaction
• r- read_item
• w- write_item
• e- end_transaction
• c- commit
• a- abort
Serial schedule

• Schedules in which the T1 T2


transactions are executed non- R(A)
interleaved
W(A)
• A serial schedule is one in
W(A)
which no transaction starts
until a running transaction has R(A)

ended are called serial commit


schedules.
commit
• If some transaction Tj is
reading value updated or
written by some other
Example: Consider the schedule involving
transaction Ti, then the commit
two transactions T1 and T2. This is a
of Tj must occur after the
serial schedule since the transactions
commit of Ti.
perform serially in the order T1 —> T2
Non Serial Schedule

• This is a type of Scheduling where the operations of multiple


transactions are interleaved. MIXED
• Unlike the serial schedule where one transaction must wait for
another to complete all its operation, in the non-serial schedule, the
other transaction proceeds without waiting for the previous
transaction to complete.
• This might lead to a rise in the concurrency problem.
• It can be of two types namely, Serializable and Non-Serializable Schedule
Conflict Operations in Schedule

Conflicting Operations
• The two operations become conflicting if all conditions satisfy:
– Both belong to separate transactions.
– They have the same data item.
– They contain at least one write operation.
• In the schedule Sa:
• Sa: r1(X);r2(X);W1(x); r1(Y);w2(X);w1(Y);
Conflicting Operations
• r1(X) and w2(X)
• r2(X) and w1(X)
• W1(X) and w2(X)
Non conflicting operations
• r1(X) and r2(X) (both are read )
• W2(X) and w1(Y)(operate on different data items X and Y)
• r1(X) and w1(X)(they belong to same transaction)
Changing orders of conflicting operations

• Two operations are conflicting if changing their order can result in a


different outcome
• Eg r1(x); w2(x) -> value is read by Transaction T1 before it is written by
transaction T2 (read- write conflict)
• This can be changed as w2(x);r1(x) -> the value of X is changed by w2(x)
before it is read by r1(x)
• w1(x);w2(x) -> w2(x);w1(x)
• This type is called write-write conflict
• Last value of x will differ because in one case it is written by T2 and in other
case by T1
• Two read- read operations are not conflicting because changing order makes
no difference
Serializable Schedule

• This is used to maintain the consistency of the database.


• It is mainly used in the Non-Serial scheduling to verify whether the
scheduling will lead to any inconsistency or not.
• A schedule S of n transactions is serializable if it is equivalent to some
serial schedule of the same n transactions
• These are of two types:
– Conflict Serializable
– View Serializable: PROBLEM
Equivalent Schedule

• For two schedule to be equivalent,


the operations applied to each data
item in both schedules must be in
the same order
• Two definitions of equivalence of
schedules are generally used:
– conflict equivalence
– View equivalence
Conflict equivalence
Two schedules are said to be
conflict equivalent if the order of
any two conflicting operations is
the same in both schedules
Conflict Serializable Schedule

• A schedule is called conflict serializability if after swapping of non-conflicting


operations, it can transform into a serial schedule.
• The schedule will be a conflict serializable if it is conflict equivalent to a serial
schedule.
Conflict Serializable Schedule
T1 T2

Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)

Schedule S1 can be transformed into a serial schedule


by swapping non-conflicting operations of S1.
Testing for Conflict Serializability of a Schedule –
precedence graph

T1

T2 T3
Testing for Conflict Serializability of a Schedule

T1

T2 T3
Testing for conflict serializability of a schedule
If the schedule is conflict serializable then apply the
topological ordering in the graph to find out the
equivalent serial schedule
Example 1

NO CYCLE SO SERALIXZABLE
2. Check whether the given schedule S is conflict serializable or not ?

COMMIT=MPERMANNENT
Testing for Conflict Serializability of a Schedule -
Example
T1 T2 T3 • Draw Edge when we have WHENEVER CONFLICT=EDGE
R(x) • read_item(X) and write_item(X)
R(y) • write_item(X) and read_item(X)
R(x)
• write_item(X) and write_item(X)
R(y)
R(z) T1 ► T1-R(x): no W(x) in T2, T3
W(y)
► T3-R(y): no W(y) in T2, T1
W(z) ► T3-R(x): W(x) in T1 draw the edge T3->T1
R(z) ► T2-R(y): W(y) in T3 draw the edge T2->T3
W(x) T2 T3
W(z) ► T2-R(z): W(z) in T1 draw the edge T2->T1
► T3-W(y): no R(y) or W(y) in T2, T1
► T2-W(z): R(z) and W(z) in T1 draw edge T2-
>T1(already there)

► As we have no Cycle/Loop in the Precedence Graph these schedule is


conflict serializable
Testing for Conflict Serializability of a Schedule
– Example 2
• Draw Edge when we have
T1 T2 T3
R(A) • read_item(X) and write_item(X)
W(A)
• write_item(X) and read_item(X)
W(A) • write_item(X) and write_item(X)
W(A)
T1

T2 T3

► As we have Cycle/Loop in the Precedence Graph this schedule is


Non conflict serializable . Is this serializable?
University Previous Question Paper Question

Check if the following schedules are conflict-serializable using precedence graph. If so,
give the equivalent serial schedule(s). r3(X), r2(X), w3(X), r1(X), w1(X). (Note:
ri(X)/wi(X) means transaction Ti issues read/write on item X.)
T1 T2 T3 • Draw Edge when we have
R(x) • read_item(X) and write_item(X)
R(x) T1
• write_item(X) and read_item(X)
W(x) • write_item(X) and write_item(X)
R(x)
W(x)

T2 T3

► As we have no Cycle/Loop in the Precedence Graph this schedule is conflict serializable .


University Previous Question Paper Question

Check if the following schedules are conflict-serializable using precedence


graph. If so, give the equivalent serial schedule(s). r3(X), r2(X), w3(X), r1(X),
w1(X). (Note: ri(X)/wi(X) means transaction Ti issues read/write on item X.)
T1 Indegree 2 0 1

T1 T2 T3

T2 T3 0 0 1 T2

T2 T3 T1
► All the possible topological orderings
of the above precedence graph will 0 0 0 T3
T2
be the possible serialized schedules.
T2 T3 T1
NO CYCLE T2 T3
T1
serialized schedule
Testing for Conflict Serializability of a Schedule – Example 3

Check whether the given schedule S is conflict serializable or not. If yes,


then determine all the possible serialized schedules
Testing for Conflict Serializability of a Schedule – Example 3

Check whether the given schedule S is conflict serializable or not. If yes, then
determine all the possible serialized schedules

Step-01:

List all the conflicting


operations and determine the
dependency between the
transactions-
•R4(A) , W2(A) (T4 → T2)
•R3(A) , W2(A) (T3 → T2)
•W1(B) , R3(B) (T1 → T3)
•W1(B) , W2(B) (T1 → T2)
•R3(B) , W2(B) (T3 → T2)
Testing for Conflict Serializability of a Schedule – Example 3 contd

Check whether the given schedule S is conflict serializable or not. If yes, then
determine all the possible serialized schedules
Step-02:

Draw the precedence


graph-

Clearly, there exists no cycle in the precedence


graph. Therefore, the given schedule S is conflict
serializable.
Testing for Conflict Serializability of a Schedule – Example 3

Check whether the given schedule S is conflict serializable or not. If yes,


then determine all the possible serialized schedules
Finding the Serialized
Schedules-

•All the possible topological


orderings of the above
precedence graph will be the
possible serialized schedules.
•The topological orderings
can be found be
performing the
Topological Sort of the
precedence graph.
1.T1 T3 T4 T2
• 2.T1 T4 T3 T2
• 3.T4 T1 T3 T2 -
• After performing the topological sort, the possible serialized
schedules are-
• 1.T1 → T3 → T4 → T2
• 2.T1 → T4 → T3 → T2
• 3.T4 → T1 → T3 → T2
Testing for Conflict Serializability of a Schedule – univ question

Check whether the given schedules are conflict serializable or not


i) S1 : R1(X) , R2(X) , R1(Y) , R2(Y) , R3(Y) , W1(X) , W2(Y)
Ans : S1 is not conflict serializable:

ii) S2 : R1(X) , R2(X) , R2(Y) , W2(Y) , R1(Y) , W1(X)


Ans: S2 is conflict serializable:
Non-Serializability in DBMS

► A non-serial schedule that is not serializable is called a non-


serializable schedule. Non-serializable schedules may/may not be
consistent or recoverable. Non-serializable Schedule is divided into
types:
► Recoverable Schedule-A schedule is recoverable if each transaction commits
• only after all the transactions from which it has read have committed.
► Non-recoverable Schedule-If a transaction reads the value of an operation
from an uncommitted transaction and commits before the transaction from
where it has read the value, then such a schedule is called Non-
Recoverable schedule.
► Recoverable schedules are further categorized into 3 types:
► Cascading Schedule
► Cascadeless Schedule
► Strict Schedule
Recoverable Schedules
Irrecoverable Schedule

• Schedule-If a transaction reads the value of an operation from an uncommitted


transaction and commits before the transaction from where it has read the value,
then such a schedule is called Non-Recoverable schedule.

► Suppose that the system allows T9 to commit


immediately after the execution of read(A)
instruction. Thus T9 commits before T8 does.
► Now suppose that T8 fails before it commits. Since
T9 has read the value of data item A written by T8
we must abort T9 to ensure transaction Atomicity.
► However, T9 has already committed and cannot be
aborted. Thus we have a situation where it is
impossible to recover correctly from the failure of
Recoverable Schedules
• Consider the schedule
sa=r1(x);r2(x);w1(x);r1(y);w2(x);c2;w1(y);c1;
• sa is recoverable
2. sc: r1(x);w1(x);r2(x);r1(y)w2(x);c2;a1;
3. sd:r1(x);w1(x); r2(x); r1(y); w2(x);w1(y); c1; c2;
4. se :r1(x); w1(x); r2(x); r1(y); w2(x); w1(y);a1; a2;
• sc is not recoverable because t2 reads item x from t1, but t2
commits before t1 commits
• The problem occurs if t1 aborts after the c2 operation in sc
• The value of x that t2 read is no longer valid and t2 must be
aborted after it is committed so it is not recoverable
• For the schedule to be recoverable, the c2 operation in sc must
be postponed until t1 commits, as shown in sd
• if t1 aborts instead of committing, then t2 should also abort as
shown in se, because the value of x it reads is no longer valid
• In se, aborting t2 is acceptable since it has not committed yet
Recoverable Schedules
Recoverable Schedules with cascading Rollback
Recoverable with Cascading Rollback
Cascadeless Recoverable Rollback
Strict schedule

• More restrictive type of schedule, called a strict schedule


• Transactions can neither read nor write an item X until the last
transaction that wrote X has committed (or aborted)
What is NoSQL?
► NoSQL database stands for “Not Only SQL” or “Not SQL.”
► It is a non-relational Data Management System, that does not require a
fixed schema.
► It avoids joins, and is easy to scale.
► The major purpose of using a NoSQL database is for distributed data

stores with
NoSQL enormous
is used dataand
for Big data storage needs.
real-time web apps.
► For example, companies like Twitter, Facebook and Google collect terabytes
of user data every single day.
What is NoSQL?
Online analytical
processing (OLAP)
Why NoSQL?

► The concept of NoSQL databases became popular with Internet giants like
Google, Facebook, Amazon, etc. who deal with huge volumes of data.
► The system response time becomes slow when you use RDBMS for massive
volumes of data
► To resolve this problem, we could “scale up” our systems by upgrading our
• existing hardware. This process is expensive.
• The alternative for this issue is to distribute the database load on multiple hosts
whenever the load increases. This method is known as “scaling out.”
► NoSQL database is non-relational, so it scales out better than relational
databases as they are designed with web applications in mind.
non relational
schema free
distributive
simple api
Features of NoSQL

• Non-relational
– NoSQL databases never follow the relational mode
– Never provide tables with flat fixed-column records
– Work with self-contained aggregates or BLOBs (Binary Large Objects)
.They are complex files such as images, video, and audio.
– Doesn’t require object-relational mapping and data normalization
– No complex features like query languages, query planners, referential
integrity joins, ACID
► Schema-free
– NoSQL databases are either schema-free or have relaxed schemas
– Do not require any sort of definition of the schema of the data
– Offers heterogeneous structures of data in the same domain
Features of NoSQL

• Simple API
– Offers easy to use interfaces for storage and querying data
provided
– APIs allow low -level data manipulation & selection methods
– Text-based protocols mostly used with HTTP REST with JSON
– Mostly used no standard -based NoSql quey language
– Web-enabled databases running as internet-facing services
Features of NoSQL
• Distributed
– Multiple NoSQL databases can be executed in a distributed fashion
– Offers auto-scaling and fail-over capabilities
► Often ACID concept can be sacrificed for scalability and throughput
► Mostly no synchronous replication between distributed nodes
Asynchronous Multi-Master
– Replication, peer-to-peer, HDFS Replication
► Only providing eventual consistency

► Shared Nothing Architecture. This enables less coordination and


higher distribution.
Types of NoSQL Databases

► NoSQL Databases are mainly categorized into four types:


► Key-value Pair Based

► Column-oriented

► Graphs based

► Document-oriented
key-value database

• A key-value database (sometimes


called a key-value store) uses a
simple key- value method to store
data.
• These databases contain a simple
string (the key) that is always
unique and an arbitrary large data
field (the value).
• They are easy to design and
implement
What is a Key-Value Database?

• As the name suggests, this type of NoSQL database implements a hash


table to store unique keys along with the pointers to the corresponding data
values.
► The values can be of scalar data types such as integers or complex
structures such as JSON, lists, BLOB, and so on.
► A value can be stored as an integer, a string, JSON, or an array—with a
key used to reference that value.
► It typically offers excellent performance and can be optimized to fit an
• organization’s needs.
► Key-value stores have no query language but they do provide a way to
add and remove key-value pairs.
► Values cannot be queried or searched upon. Only the key can be queried.
What is a Key-Value Database?
CHARACHTERSTICS

A simple example of key-value data store.


When to use a key-value database
► When your application needs to handle lots of small continuous reads
and writes, that may be volatile.
► Key- value databases offers fast in-memory access.

► When storing basic information, such as customer details; storing webpages


with the URL as the key and the webpage as the value; storing shopping-
cart contents, product categories, e-commerce product details
► For applications that don’t require frequent updates or need to support
complex queries.
Use cases for key-value databases
► Session management on a large scale.
► Using cache to accelerate application responses.
► Storing personal data on specific users.
► Product recommendations, storing personalized lists of items for individual
customers.
► Managing each player’s session in massive multiplayer online games.
► Redis, Dynamo, Riak are some NoSQL examples of key-value store
DataBases.

Prof. Sarju S, Department of Computer Science and


Page 174
Engineering, SJCET Palai
Key value database: - Redis

► Redis is an in-memory, key/value store.


► Redis allows you to
set and retrieve pairs
of keys and values.
► Redis supports the following
• data types and data manipulations:
► Lists, Sets, Hashes, Increments,
► Command repetition, Random Keys,
• Secondary indexes, Scripts
► features of Redis:-
► enables low latency and high throughput data access set title
► Flexible data structures set author "aann"
get tile
► Simplicity and ease-of-use get set commande
Column-oriented

• While a relational database stores data in rows and reads data


row by row, a column store is organized as a set of columns.
• When you want to run analytics on a small number of columns,
you can read those columns directly without consuming
memory with unwanted data
• Columns are often of the same type and benefit from more
efficient compression, making reads even faster.
• Columnar databases can quickly aggregate the value of a given
column (adding up the total sales for the year, for example). Use
cases include analytics.
Column-oriented
Column-oriented
► Column databases use the concept of keyspace, which is sort of like a schema in
relational models.
► This keyspace contains all the column families, which then contain rows, which
then contain columns.
Column-oriented

► If we take a specific row as an example:

► The Row Key is exactly that: the specific identifier of that row and is always unique.
► The column contains the name, value, and timestamp, so that’s straightforward.
► The name/value pair is also straight forward, and the timestamp is the date and time
the data was entered into the database.
► Some examples of column-store databases include Casandra, CosmoDB,
Bigtable, and HBase.
Column-oriented
Column-oriented - Use cases

► Developers mainly use column databases in:


► Content management systems

► Blogging platforms

► Systems that maintain counters

► Services that have expiring usage

► Systems that require heavy write requests (like log


aggregators)
Benefits of Column Databases
► There are several benefits that go along with columnar databases:
► Column stores are excellent at compression and therefore are efficient in terms of
storage.
► You can reduce disk resources while holding massive amounts of information
in a single column
► Since a majority of the information is stored in a column, aggregation queries are
quite fast, which is important for projects that require large amounts of queries in
a small amount of time.
► Scalability is excellent with column-store databases.

► They can be expanded nearly infinitely, and are often spread across large
clusters of machines,
even numbering in thousands.
► That also means that they are great for Massive Parallel Processing
Benefits of Column Databases
► Load times are similarly excellent, as you can easily load a billion-row table
in a few seconds.
► You can load and query nearly instantly.

► Large amounts of flexibility as columns do not necessarily have to look


like each other.
► You can add new and different columns without disrupting the whole
database.
column database:-Cassandra

► A columnar database is a database management system (DBMS) that


stores data in columns instead of rows
► Cassandra is an open-source, column-oriented database designed to
handle large amounts of data across many commodity servers.
► Features of Cassandra:-
► Efficient and speed at scale
► reduces the data storage costs
► improve query performance significantly
► CQL (Cassandra Query Language):
Cassandra provides CQL, a SQL-like //colum
language, to interact with the database. //row
► CQL allows you to create tables, insert and retrieve data, and "aann":
perform various operations on the database fname aann
last sabu
}
//row
Document-oriented databases
► Is a modernized way of storing data as JSON rather than basic columns/rows
— i.e. storing data in its native form.
► This storage system lets you retrieve, store, and manage document-oriented
information
► It’s a very popular category of modern NoSQL databases, used by the likes of
MongoDB, Cosmos DB, DocumentDB, SimpleDB, PostgreSQL, OrientDB,
Elasticsearch and RavenDB.

► This is an example of a document that might


appear in a document database like
MongoDB.
► This sample document represents a company
contact card, describing an employee called
Sammy:
What are document-oriented databases?

► Notice that the document is written as a JSON


► object.
JSON is a human-readable data format that has become quite popular in recent years.
► While many different formats can be used to represent data within a document
database, such as XML or YAML, JSON is one of the most common choices.
► For example, MongoDB adopted JSON as the primary data format to define and
manage
data.
CHARACHTERSTICS

Relational – Document Database


MongoDB
• MongoDB is a NoSQL open-source database that is available for on all operating systems.
• NoSQL, stands for “not only SQL” or “non SQL.”
• NoSQL is used to perform operations on data in databases not structured by rows and
columns.
• NoSQL supports four different types of databases: document, key-value stores, column-
oriented, and graph.

• BSON (binary JSON)


MongoDB is a document database because it stores data JSON-like
documents with schema.
• MongoDB supports all the essential CRUD operations
Atlas (Cloud) & Compass

• MongoDB Atlas is a multi-cloud database service by the same people that build
MongoDB.
• Atlas simplifies deploying and managing your databases on the cloud providers of your choice
(AWS, Azure, and Google Cloud).
• MongoDB Compass is GUI client which can be used for querying, aggregating, and
analayze your MongoDB data in a visual environment.
MONGO DB
• Create or insert operations add new documents to a collection. If the collection does
not
currently exist, insert operations will create the collection.
• Read operations retrieve documents from a collection; i.e. query a collection for
documents. MongoDB provides the following methods to read documents from a
collection
• Update operations modify existing documents in a collection. MongoDB provides
the
following methods to update documents of a collection:
• Delete operations remove documents from a collection. MongoDB
provides the following methods to delete documents of a collection
MongoDB CRUD Operations

Page 205
MongoDB CRUD Operations

Page 206
Benefits of Document Databases
structure unstruct flexibili adapitibilty dcalability

► A few of the most important benefits are:


► Flexibility and adaptability: with a high level of control over the data
structure, document databases enable experimentation and adaptation to
new emerging requirements.
► New fields can be added right away and existing ones can be changed any
time.
► It’s up to the developer to decide whether old documents must be amended or
the
change can be implemented only going forward.
► Ability to manage structured and unstructured data: Document databases
can be used to handle structured data as well, but they’re also quite useful for
storing unstructured data where necessary.
► Scalability by design: Conversely, document databases are designed as
distributed systems that instead allow you to scale horizontally (meaning
that you split a single database up across multiple servers).
Graph-Based NoSQL
► Graph databases are generally straightforward in how they’re structured
though. They primarily are composed of two components:
► The Node
► This is the actual piece of data itself.
► It can be the number of viewers of a youtube video, the number of people who have read
a tweet, or it could even be basic information such as people’s names, addresses, and so
forth.
► The Edge
► This explains actual relationship between two nodes.

► Interestingly enough, edges can also have their own pieces of information, such as the nature of the
relation between two nodes. Similarly, edges might also have directions describing the flow of said
data.
Translating NoSQL Knowledge to Graphs
► With the advent of the NoSQL movement, businesses of all sizes have a
variety of modern options from which to build solutions relevant to their
use cases.
► Calculating average income? Ask a relational database.

► Building a shopping cart? Use a key-value Store.

► Storing structured product information? Store as a document.

► Describing how a user got from point A to point B? Follow a graph.

► Examples of Graph Databases


► Neo4j, ArangoDB
ArangoDB

► ArangoDB is a native multi-model, open-source database with flexible


data models for documents, graphs, and key-values.
► Build high performance applications using a convenient SQL-like query
• language or JavaScript extensions.
► Use ACID transactions if you require them. Scale horizontally and
vertically with a few mouse clicks.
► Key features include:
► Installing ArangoDB on a cluster is as easy as installing an app on
your mobile
► Powerful query language (AQL) to retrieve and modify data
► Use ArangoDB as an application server and fuse your application
and database together for maximal throughput
ArangoDB
ArangoDB

► Flexible data modeling: model your data as combination of key-value


pairs, documents or graphs - perfect for social relations
► Transactions: run queries on multiple documents or collections with
optional transactional consistency and isolation
► Configurable durability: let the application decide if it needs more
durability or more performance
► No-nonsense storage: ArangoDB uses all of the power of modern
storage hardware, like SSD and large caches
► JavaScript for all: no language zoo, you can use one language
from your browser to your back-end
► ArangoDB can be easily deployed as a fault-tolerant distributed state
• machine, which can serve as the animal brain of distributed appliances
► It is open source (Apache License 2.0)
ArangoDB Use Cases
► ArangoDB is a database system with a large solution space
because it combines graphs, documents, key-value, search engine,
and machine learning all in one
► ArangoDB as a Graph Database
► ArangoDB as a graph database is a great fit for use cases like fraud
detection, knowledge graphs, recommendation engines, identity and
access management, network and IT operations, social media
management, traffic management, and many more.
► ArangoDB as a Document Database
► ArangoDB can be used as the backend for heterogeneous content
management, e-commerce systems, Internet of Things applications,
and more generally as a persistence layer for a broad range of
services that benefit from an agile and scalable data store.
ArangoDB Use Cases
► ArangoDB as a Key-Value Database
► Key-value stores are the simplest kind of database systems.
Each record is stored as a block of data under a key that
uniquely identifies the record.
► The data is opaque, which means the system doesn’t know
anything about the contained information, it simply stores it
and can retrieve it for you via the identifiers
► ArangoDB as a Search Engine
► ArangoDB has a natively integrated search engine for a
broad range of information retrieval needs.
► It is powered by inverted indexes and can index full-text,
GeoJSON, as well as arbitrary JSON data.
Graph-Based NoSQL

You might also like