0% found this document useful (0 votes)
7 views36 pages

Mod5 CH2

NOSQL systems are characterized by their scalability, availability, and flexible data models, often employing horizontal scalability and replication for data management. They do not require a fixed schema, allowing for semi-structured data storage, and typically use simpler query languages compared to traditional SQL. The CAP theorem highlights the trade-offs in distributed systems, emphasizing that it is not possible to achieve consistency, availability, and partition tolerance simultaneously, leading NOSQL systems to often prioritize availability and partition tolerance over strict consistency.

Uploaded by

shanawaddevamma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views36 pages

Mod5 CH2

NOSQL systems are characterized by their scalability, availability, and flexible data models, often employing horizontal scalability and replication for data management. They do not require a fixed schema, allowing for semi-structured data storage, and typically use simpler query languages compared to traditional SQL. The CAP theorem highlights the trade-offs in distributed systems, emphasizing that it is not possible to achieve consistency, availability, and partition tolerance simultaneously, leading NOSQL systems to often prioritize availability and partition tolerance over strict consistency.

Uploaded by

shanawaddevamma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

NOSQL characteristics related to distributed databases and distributed systems.

1. Scalability:
 There are two kinds of scalability in distributed systems: horizontal and vertical
 In NOSQL systems, horizontal scalability is generally used, where the distributed
system is expanded by adding more nodes for data storage and processing as the
volume of data grows.
 Vertical scalability refers to expanding the storage and computing power of existing
nodes.
 In NOSQL systems, horizontal scalability is employed while the system is operational,
so techniques for distributing the existing data among new nodes without interrupting
system operation are necessary.
2. Availability, Replication and Eventual Consistency:
a. Many applications that use NOSQL systems require continuous system availability.
To accomplish this, data is replicated over two or more nodes in a transparent
manner, so that if one node fails, the data is still available on other nodes.
b. Replication improves data availability and can also improve read performance,
because read requests can often be serviced from any of the replicated data nodes.
However, write performance becomes more cumbersome because an update must
be applied to every copy of the replicated data items; this can slow down write
performance if serializable consistency is required.
3. Replication Models:
a. Two major replication models are used in NOSQL systems:
i. master-slave
ii. master-master replication.
i. Master-slave replication requires one copy to be the master copy; all write operations
must be applied to the master copy and then propagated to the slave copies, usually using
eventual consistency (the slave copies will eventually be the same as the master copy).
a. For read, the master-slave paradigm can be configured in various ways.
b. One configuration requires all reads to also be at the master copy, so this would
be similar to the primary site or primary copy methods of distributed
concurrency control, with similar advantages and disadvantages.
c. Another configuration would allow reads at the slave copies but would not
guarantee that the values are the latest writes, since writes to the slave nodes can
be done after they are applied to the master copy.
ii. The master-master replication allows reads and writes at any of the replicas but may not
guarantee that reads at nodes that store different copies see the same values. Different
users may write the same data item concurrently at different nodes of the system, so the
values of the item will be temporarily inconsistent.

A reconciliation method to resolve conflicting write operations of the same data item at
different nodes must be implemented as part of the master-master replication scheme.

4. Sharding of Files:
a. Sharding of files refers to the process of dividing a large file into smaller, more
manageable pieces known as shards. (also known as horizontal partitioning )This
technique is commonly used in distributed systems and databases to improve
performance, scalability, and fault tolerance.
b. In many NOSQL applications, files (or collections of data objects) can have many
millions of records (or documents or objects), and these records can be accessed
concurrently by thousands of users. So it is not practical to store the whole file in
one node.
c. The combination of sharding the file records and replicating the shards works in
tandem to improve load balancing as well as data availability

5. High-Performance Data Access:


a. In many NOSQL applications, it is necessary to find individual records or objects
(data items) from among the millions of data records or objects in a file. To achieve
this, most systems use one of two techniques: hashing or range partitioning on object
keys.
b. The majority of accesses to an object will be by providing the key value rather than
by using complex query conditions.
c. The object key is similar to the concept of object id.
d. In hashing, a hash function h(K) is applied to the key K, and the location of the
object with key K is determined by the value of h(K).
e. In range partitioning, the location is determined via a range of key values;
i. for example, location i would hold the objects whose key values K are in the
range Kimin ≤ K ≤ Kimax. In applications that require range queries, where
multiple objects within a range of key values are retrieved, range partitioned
is preferred.
ii. Other indexes can also be used to locate objects based on attribute conditions
different from the key K

NOSQL characteristics related to data models and query languages.

NOSQL systems emphasize performance and flexibility over modeling power and complex
querying. We discuss some of these characteristics next.

1. Not Requiring a Schema:


a. The flexibility of not requiring a schema is achieved in many NOSQL systems by
allowing semi-structured, self-describing data.
b. The users can specify a partial schema in some systems to improve storage
efficiency, but it is not required to have a schema in most of the NOSQL systems.
As there may not be a schema to specify constraints, any constraints on the data
would have to be programmed in the application programs that access the data
items. There are various languages for describing semi-structured data, such as
JSON (JavaScript Object Notation) and XML (Extensible Markup Language).
c. JSON is used in several NOSQL systems, but other methods for describing semi-
structured data can also be used. when we present document-based NOSQL
systems.
2. Less Powerful Query Languages:
a. Many applications that use NOSQL systems may not require a powerful query
language such as SQL, because search (read) queries in these systems often locate
single objects in a single file based on their object keys.
b. NOSQL systems typically provide a set of functions and operations as a
programming API (application programming interface), so reading and writing the
data objects is accomplished by calling the appropriate operations by the
programmer.
c. In many cases, the operations are called CRUD operations, for Create, Read,
Update, and Delete. In other cases, they are known as SCRUD because of an added
Search (or Find) operation.
d. Some NOSQL systems also provide a high-level query language, but it may not
have the full power of SQL; only a subset of SQL querying capabilities would be
provided.
e. In particular, many NOSQL systems do not provide join operations as part of the
query language itself; the joins need to be implemented in the application programs.
3. Versioning:
a. Some NOSQL systems provide storage of multiple versions of the data items, with
the timestamps of when the data version was created. when we present column-
based NOSQL systems.

24.1.3 Categories of NOSQL Systems:

NOSQL systems have been characterized into four major categories, with some additional
categories that encompass other types of systems. The most common categorization lists the
following four major categories:

1. Document-based NOSQL systems:


a. These systems store data in the form of documents using well-known formats, such
as JSON (JavaScript Object Notation). Documents are accessible via their document
id, but can also be accessed rapidly using other indexes.
2. NOSQL key-value stores:
a. These systems have a simple data model based on fast access by the key to the value
associated with the key; the value can be a record or an object or a document or even
have a more complex data structure.
3. Column-based or wide column NOSQL systems:
a. These systems partition a table by column into column families ,where each column
family is stored in its own files. They also allow versioning of data values.
4. Graph-based NOSQL systems:
a. Data is represented as graphs, and related nodes can be found by traversing the edges
using path expressions. Additional categories can be added as follows to include
some systems that are not easily categorized into the above four categories, as well
as some other types of systems that have been available even before the term
NOSQL became widely used.

5. Hybrid NOSQL systems: These systems have characteristics from two or more of the above
four categories.
6. Object databases:

An object database is a type of database management system (DBMS) that stores data in the form
of objects, making it particularly suitable for object-oriented programming languages. Unlike
traditional relational databases, which store data in tables with rows and columns, object databases
store data as objects, which can encapsulate both data (attributes) and behavior (methods).

Here are some key characteristics of object databases:

o Object-Oriented Data Model: Object databases support an object-oriented data


model, which closely aligns with the programming paradigm used in object-oriented
programming languages like Java, C++, or Python. This allows developers to work
with data in a way that is more natural and intuitive for object-oriented applications.

o Complex Data Structures: Object databases can store complex data structures
directly, including hierarchical structures, nested objects, arrays, and other non-
primitive data types. This flexibility makes them well-suited for applications with
rich and complex data models.

 3.Transparent Persistence: Object databases provide transparent persistence, meaning that


objects in memory are automatically stored and retrieved from the database without the need
for explicit mapping or conversion between object-oriented data structures and relational
tables. This simplifies the development process and reduces the amount of boilerplate code
required.

o Relationships: Object databases support relationships between objects, such as one-


to-one, one-to-many, and many-to-many relationships, similar to relational
databases. However, in object databases, these relationships are typically
represented through object references rather than foreign keys.
o Querying: Object databases usually provide query languages or APIs that allow
developers to query and manipulate objects stored in the database. These query
languages may support object-oriented concepts such as inheritance, polymorphism,
and encapsulation.

o Performance: Object databases can offer performance advantages for certain types
of applications, especially those that involve complex data structures and frequent
object manipulations. However, the performance of object databases can vary
depending on factors such as the specific implementation, query complexity, and
data access patterns.

o Usage: Object databases are commonly used in domains where the data model is
inherently object-oriented, such as in software engineering tools, CAD/CAM
systems, multimedia applications, and complex modeling and simulation
applications.
6. XML databases:

XML databases are a type of database management system (DBMS) designed specifically for
storing, querying, and managing XML (eXtensible Markup Language) data. XML is a markup
language commonly used for structuring and representing data in a hierarchical format, making it
suitable for a wide range of applications, including web development, data interchange, and
document storage.

Here are some key aspects of XML databases:

 Native XML Storage: XML databases store XML documents natively, meaning that XML
data is stored directly without the need for any intermediate conversion or mapping. This
allows for efficient storage and retrieval of XML documents, preserving their hierarchical
structure and element relationships.

 XML Querying: XML databases typically provide query languages or APIs that allow users
to query and manipulate XML data. These query languages are designed to support the
hierarchical structure of XML documents and may include features such as XPath and
XQuery for navigating and selecting XML elements.

 Schema Support: XML databases may support XML schema languages such as Document
Type Definitions (DTDs) or XML Schema Definition (XSD), which define the structure,
constraints, and data types of XML documents. Schema validation ensures data integrity
and helps enforce data consistency.

 Indexing and Full-Text Search: XML databases often include indexing and full-text search
capabilities to facilitate efficient retrieval of XML documents based on content and
structure. Indexing allows for fast lookup of XML elements and attributes, while full-text
search enables keyword-based searching within XML documents.

 Integration with XML Technologies: XML databases are designed to integrate seamlessly
with other XML technologies and standards, such as XSLT (eXtensible Stylesheet
Language Transformations) for transforming XML data, XInclude for including external
XML documents, and XLink for creating hyperlinks between XML documents.

 Scalability and Performance: XML databases are optimized for handling large volumes of
XML data and can scale to accommodate growing datasets and user loads. They often
employ techniques such as data partitioning, caching, and parallel processing to improve
performance and scalability.

 Application Areas: XML databases are commonly used in various application domains
where XML is the primary data format, such as content management systems, document
repositories, web services, electronic publishing, and scientific data management.

24.2 The CAP Theorem:


As we discussed concurrency control in distributed databases , we assumed that the distributed
database system (DDBS) is required to enforce the ACID properties (atomicity, consistency,
isolation, durability) of transactions that are running concurrently.

In a system with data replication, concurrency control becomes more complex because there can
be multiple copies of each data item. So if an update is applied to one copy of an item, it must be
applied to all other copies in a consistent manner.

The possibility exists that one copy of an item X is updated by a transaction T1 whereas another
copy is updated by a transaction T2, so two inconsistent copies of the same item exist at two
different nodes in the distributed system.

If two other transactions T3 and T4 want to read X, each may read a different copy of item X.

We saw in Section 23.3 that there are distributed concurrency control methods that do not allow
this inconsistency among copies of the same data item, thus enforcing serializability and hence the
isolation property in the presence of replication. However, these techniques often come with high
overhead, which would defeat the purpose of creating multiple copies to improve performance and
availability in distributed database systems such as NOSQL. In the field of distributed systems,
there are various levels of consistency among replicated data items, from weak consistency to strong
consistency. Enforcing serializability is considered the strongest form of consistency, but it has high
overhead so it can reduce performance of read and write operations and hence adversely affect
system performance.

The CAP theorem, which was originally introduced as the CAP principle, can be used to explain
some of the competing requirements in a distributed system with replication.

The three letters in CAP refer to three desirable properties of distributed systems with replicated
data:

consistency (among replicated copies),

availability (of the system for read and write operations) and

partition tolerance (in the face of the nodes in the system being partitioned by a network fault).

Consistency means that the nodes will have the same copies of a replicated data item visible for
various transactions
Availability means that each read or write request for a data item will either be processed
successfully or will receive a message that the operation cannot be completed.

Partition tolerance means that the system can continue operating if the network connecting the
nodes has a fault that results in two or more partitions, where the nodes in each partition can only
communicate among each other.

It is important to note here that the use of the word consistency in CAP and its use in ACID do not
refer to the same identical concept.

 In CAP, the term consistency refers to the consistency of the values in different
copies of the same data item in a replicated distributed system.
 In ACID, it refers to the fact that a transaction will not violate the integrity
constraints specified on the database schema.
 However, if we consider that the consistency of replicated copies is a specified
constraint, then the two uses of the term consistency would be related.
 The CAP theorem states that it is not possible to guarantee all three of the desirable
properties—consistency, availability, and partition tolerance—at the same time in a
distributed system with data replication. If this is the case, then the distributed
system designer would have to choose two properties out of the three to guarantee.
It is generally assumed that in many traditional (SQL) applications, guaranteeing
consistency through the ACID properties is important. On the other hand, in a
NOSQL distributed data store, a weaker consistency level is often acceptable, and
guaranteeing the other two properties (availability, partition tolerance) is important.
 Hence, weaker consistency levels are often used in NOSQL system instead of
guaranteeing serializability.
 In particular, a form of consistency known as eventual consistency is often adopted
in NOSQL systems.

24.3 Document-Based NOSQL Systems and MongoDB


 Document-based or document-oriented NOSQL systems typically store data as collections
of similar documents.
 These types of systems are also sometimes known as document stores.
 The individual documents somewhat resemble complex objects or XML documents, but a
major difference between document-based systems versus object and object-relational
systems and XML is that there is no requirement to specify a schema—rather, the documents
are specified as self-describing data .
 Although the documents in a collection should be similar, they can have different data
elements (attributes), and new documents can have new data elements that do not exist in
any of the current documents in the collection.
 The system basically extracts the data element names from the self-describing documents
in the collection, and the user can request that the system create indexes on some of the data
elements.
 Documents can be specified in various formats, such as XML.
 A popular language to specify documents in NOSQL systems is JSON (JavaScript Object
Notation).
 There are many document-based NOSQL systems, including MongoDB and CouchDB,
among many others.
 We will give an overview of MongoDB in this section. It is important to note that different
systems can use different models, languages, and implementation methods, but giving a
complete survey of all document-based NOSQL systems is beyond the scope of our
presentation.

24.3.1 MongoDB Data Model:

 MongoDB documents are stored in BSON (Binary JSON) format, which is a variation of
JSON with some additional data types and is more efficient for storage than JSON.
 Individual documents are stored in a collection. We will use a simple example based on
our COMPANY database that we used throughout this book.
 The operation createCollection is used to create each collection.
 For example, the following command can be used to create a collection called project to
hold PROJECT objects from the COMPANY database
o db.createCollection(“project”, { capped : true, size : 1310720, max : 500 } )
 The first parameter “project” is the name of the collection, which is followed by an optional
document that specifies collection options.
 In our example, the collection is capped; this means it has upper limits on its storage space
(size) and number of documents (max).
 The capping parameters help the system choose the storage options for each collection.
There are other collection options, but we will not discuss them here.
 For our example, we will create another document collection called worker to hold
information about the EMPLOYEEs who work on each project;
 for example:
o db.createCollection(“worker”, { capped : true, size : 5242880, max : 2000 } ) )
 Each document in a collection has a unique ObjectId field, called _id, which is automatically
indexed in the collection unless the user explicitly requests no index for the _id field.
 The value of ObjectId can be specified by the user, or it can be system-generated if the user
does not specify an _id field for a particular document.
 System-generated ObjectIds have a specific format, which combines the timestamp when the
object is created (4 bytes, in an internal MongoDB format), the node id (3 bytes), the process id
(2 bytes), and a counter (3 bytes) into a 16-byte Id value.
 User-generated ObjectsIds can have any value specified by the user as long as it uniquely
identifies the document and so these Ids are similar to primary keys in relational systems.
 A collection does not have a schema.
 The structure of the data fields in documents is chosen based on how documents will be
accessed and used, and the user can choose a normalized design (similar to normalized relational
tuples) or a denormalized design (similar to XML documents or complex objects).
Interdocument references can be specified by storing in one document the ObjectId or ObjectIds
of other related documents.

 Figure 24.1(a) shows a simplified MongoDB document showing some of the data from Figure
5.6 from the COMPANY database example
 . In our example, the _id values are user-defined, and the documents whose _id starts with P
(for project) will be stored in the “project” collection, whereas those whose _id starts with W
(for worker) will be stored in the “worker” collection. In Figure 24.1(a), the workers information
is embedded in the project document; so there is no need for the “worker” collection.

 This is known as the denormalized pattern, which is similar to creating a complex object or an
XML document. A list of values that is enclosed in square brackets [ … ] within a document
represents a field whose value is an array.

Another option is to use the design in Figure 24.1(b), where worker references are embedded
in the project document, but the worker documents themselves are stored in a separate “worker”
collection. A third option in Figure 24.1(c) would use a normalized design, similar to First
Normal Form relations. The choice of which design option to use depends on how the data will
be accessed. It is important to note that the simple design in Figure 24.1(c) is not the general
normalized design for a many-to-many relationship, such as the one between employees and
projects; rather, we would need three collections for “project”, “employee”, and “works_on”,
In the design in Figure 24.1(c), an EMPLOYEE who works on several projects would be
represented by multiple worker documents with different _id values; each document would
represent the employee as worker for a particular project. This is similar to the design decisions
for XML schema design (see Section 13.6). However, it is again important to note that the
typical document-based system does not have a schema, so the design rules would have to be
followed whenever individual documents are inserted into a collection.
24.3.2 MongoDB CRUD Operations
 MongoDb has several CRUD operations, where CRUD stands for (create, read, update,
delete).
 Documents can be created and inserted into their collections using the insert operation,
whose format is:
o db. <collection_name>.insert(<document(s)>)
 The parameters of the insert operation can include either a single document or an array
of documents, as shown in Figure 24.1(d). The delete operation is called remove, and
the format is:
o db. <collection_name>.remove(<condition>)
 The documents to be removed from the collection are specified by a Boolean condition on
some of the fields in the collection documents.
 There is also an update operation, which has a condition to select certain documents, and a
$set clause to specify the update.
 It is also possible to use the update operation to replace an existing document with another
one but keep the same ObjectId. For read queries, the main command is called find, and the
format is:
o db. <collection_name>. find (<condition>)
 General Boolean conditions can be specified as , and the documents in the collection that
return true are selected for the query result. For a full discussion of the MongoDb CRUD
operations, see the MongoDB online documentation in the chapter references.

24.3.3 MongoDB Distributed Systems Characteristics

Most MongoDB updates are atomic if they refer to a single document, but MongoDB also provides
a pattern for specifying transactions on multiple documents.

Since MongoDB is a distributed system, the two-phase commit method is used to ensure atomicity
and consistency of multi-document transactions.

The concept of replica set is used in MongoDB to create multiple copies of the same data set on
different nodes in the distributed system, and it uses a variation of the master-slave approach for
replication.
For example, suppose that we want to replicate a particular document collection C.

A replica set will have one primary copy of the collection C stored in one node N1, and at least
one secondary copy (replica) of C stored at another node N2.

Additional copies can be stored in nodes N3, N4, etc., as needed, but the cost of storage and update
(write) increases with the number of replicas.

The total number of participants in a replica set must be at least three, so if only one secondary copy
is needed, a participant in the replica set known as an arbiter must run on the third node N3.

The arbiter does not hold a replica of the collection but participates in elections to choose a new
primary if the node storing the current primary copy fails.

If the total number of members in a replica set is n (one primary plus i secondary’s, for a total of n
= i + 1), then n must be an odd number; if it is not, an arbiter is added to ensure the election
process works correctly if the primary fails.

In MongoDB replication, all write operations must be applied to the primary copy and then
propagated to the secondaries.

For read operations, the user can choose the particular read preference for their application.

The default read preference processes all reads at the primary copy, so all read and write operations
are performed at the primary node.

In this case, secondary copies are mainly to make sure that the system continues operation if the
primary fails, and MongoDB can ensure that every read request gets the latest document value.

To increase read performance, it is possible to set the read preference so that read requests can be
processed at any replica (primary or secondary); however, a read at a secondary is not guaranteed
to get the latest version of a document because there can be a delay in propagating writes from the
primary to the secondaries.

Sharding in MongoDB.

When a collection holds a very large number of documents or requires a large storage space, storing
all the documents in one node can lead to performance problems, particularly if there are many user
operations accessing the documents concurrently using various CRUD operations.

Sharding of the documents in the collection—also known as horizontal partitioning— divides the
documents into disjoint partitions known as shards. This allows the system to add more nodes as
needed by a process known as horizontal scaling of the distributed system and to store the shards
of the collection on different nodes to achieve load balancing.

Each node will process only those operations pertaining to the documents in the shard stored at that
node. Also, each shard will contain fewer documents than if the entire collection were stored at one
node, thus further improving performance.

There are two ways to partition a collection into shards in MongoDB—

 Range Partitioning
 Hash Partitioning.

Both require that the user specify a particular document field to be used as the basis for partitioning
the documents into shards.

The partitioning field—known as the shard key in MongoDB—must have two characteristics:

 it must exist in every document in the collection, and it must have an index.
 The ObjectId can be used, but any other field possessing these two characteristics can also
be used as the basis for sharding.

The values of the shard key are divided into chunks either through range partitioning or hash
partitioning, and the documents are partitioned based on the chunks of shard key values.

Range partitioning creates the chunks by specifying a range of key values;

for example, if the shard key values ranged from one to ten million, it is possible to create
ten ranges—1 to 1,000,000; 1,000,001 to 2,000,000; ..… ; 9,000,001 to 10,000,000—and each
chunk would contain the key values in one range.

Hash partitioning applies a hash function h(K) to each shard key K, and the partitioning of keys
into chunks is based on the hash values

In general, if range queries are commonly applied to a collection (for example, retrieving all
documents whose shard key value is between 200 and 400), then range partitioning is preferred
because each range query will typically be submitted to a single node that contains all the required
documents in one shard.

If most searches retrieve one document at a time, hash partitioning may be preferable because it
randomizes the distribution of shard key values into chunks.
When sharding is used, MongoDB queries are submitted to a module called the query router, which
keeps track of which nodes contain which shards based on the particular partitioning method used
on the shard keys.

The query (CRUD operation) will be routed to the nodes that contain the shards that hold the
documents that the query is requesting. If the system cannot determine which shards hold the
required documents, the query will be submitted to all the nodes that hold shards of the collection.
Sharding and replication are used together; sharding focuses on improving performance via load
balancing and horizontal scalability, whereas replication focuses on ensuring system availability
when certain nodes fail in the distributed system.

24.4 NOSQL Key-Value Stores

Key-value stores focus on high performance, availability, and scalability by storing data in a
distributed storage system.

The data model used in key-value stores is relatively simple, and in many of these systems, there is
no query language but rather a set of operations that can be used by the application programmers.
The key is a unique identifier associated with a data item and is used to locate this data item rapidly.
The value is the data item itself, and it can have very different formats for different key-value
storage systems.

In some cases, the value is just a string of bytes or an array of bytes, and the application using the
key-value store has to interpret the structure of the data value.

In other cases, some standard formatted data is allowed; for example, structured data rows (tuples)
similar to relational data, or semi-structured data using JSON or some other self-describing data
format.

Different key-value stores can thus store unstructured, semi-structured, or structured data items

The main characteristic of key-value stores is the fact that every value (data item) must be associated
with a unique key, and that retrieving the value by supplying the key must be very fast.

There are many systems that fall under the key-value store label, so rather than provide a lot of
details on one particular system, we will give a brief introductory overview for some of these
systems and their characteristics.
24.4.1 DynamoDB Overview

The DynamoDB system is an Amazon product and is available as part of Amazon’s AWS/SDK
platforms (Amazon Web Services/Software Development Kit). It can be used as part of Amazon’s
cloud computing services, for the data storage component. DynamoDB data model. The basic data
model in DynamoDB uses the concepts of tables, items, and attributes.

A table in DynamoDB does not have a schema; it holds a collection of self-describing items. Each
item will consist of a number of (attribute, value) pairs, and attribute values can be single-valued
or multivalued. So basically, a table will hold a collection of items, and each item is a self-describing
record (or object).

DynamoDB also allows the user to specify the items in JSON format, and the system will convert
them to the internal storage format of DynamoDB. When a table is created, it is required to specify
a table name and a primary key; the primary key will be used to rapidly locate the items in the table.
Thus, the primary key is the key and the item is the value for the DynamoDB key-value store.

The primary key attribute must exist in every item in the table. The primary key can be one of the
following two types:

■ A single attribute. The DynamoDB system will use this attribute to build a hash index on the
items in the table. This is called a hash type primary key. The items are not ordered in storage on
the value of the hash attribute.

■ A pair of attributes. This is called a hash and range type primary key. The primary key will be a
pair of attributes (A, B): attribute A will be used for hashing, and because there will be multiple
items with the same value of A, the B values will be used for ordering the records with the same A
value. A table with this type of key can have additional secondary indexes defined on its attributes.
For example, if we want to store multiple versions of some type of items in a table, we could use
ItemID as hash and Date or Timestamp (when the version was created) as range in a hash and range
type primary key.

DynamoDB Distributed Characteristics. Because DynamoDB is proprietary, in the next


subsection we will discuss the mechanisms used for replication, sharding, and other distributed
system concepts in an open source key-value system called Voldemort. Voldemort is based on many
of the techniques proposed for DynamoDB.

24.4.2 Voldemort Key-Value Distributed Data Store


Voldemort is an open source system available through Apache 2.0 open source licensing rules. It is
based on Amazon’s DynamoDB. The focus is on high performance and horizontal scalability, as
well as on providing replication for high availability and sharding for improving latency (response
time) of read and write requests. All three of those features—replication, sharding, and horizontal
scalability—are realized through a technique to distribute the key-value pairs among the nodes of
a distributed cluster; this distribution is known as consistent hashing. Voldemort has been used by
LinkedIn for data storage. Some of the features of Voldemort are as follows:

■ Simple basic operations. A collection of (key, value) pairs is kept in a Voldemort store. In our
discussion, we will assume the store is called s. The basic interface for data storage and retrieval is
very simple and includes three operations: get, put, and delete. The operation s.put(k, v) inserts an
item as a key-value pair with key k and value v. The operation s.delete(k) deletes the item whose
key is k from the store, and the operation v = s.get(k) retrieves the value v associated with key k.
The application can use these basic operations to build its own requirements. At the basic storage
level, both keys and values are arrays of bytes (strings).

■ High-level formatted data values. The values v in the (k, v) items can be specified in JSON
(JavaScript Object Notation), and the system will convert between JSON and the internal storage
format. Other data object formats can also be specified if the application provides the conversion
(also known as serialization) between the user format and the storage format as a Serializer class.
The Serializer class must be provided by the user and will include operations to convert the user
format into a string of bytes for storage as a value, and to convert back a string (array of bytes)
retrieved via s.get(k) into the user format. Voldemort has some built-in serializers for formats other
than JSON.

■ Consistent hashing for distributing (key, value) pairs. A variation of the data distribution
algorithm known as consistent hashing is used in Voldemort for data distribution among the nodes
in the distributed cluster of nodes.

 A hash function h(k) is applied to the key k of each (k, v) pair, and h(k) determines where the
item will be stored.
 The method assumes that h(k) is an integer value,
o usually in the range 0 to Hmax = 2n−1,
 where n is chosen based on the desired range for the hash values.

This method is best visualized by considering the range of all possible integer hash values 0 to
Hmax to be evenly distributed on a circle (or ring). The nodes in the distributed system are then
also located on the same ring; usually each node will have several locations on the ring (see Figure
24.2).

 The positioning of the points on the ring that represent the nodes is done in a psuedorandom
manner.

An item (k, v) will be stored on the node whose position in the ring follows the position of h(k) on
the ring in a clockwise direction.

In Figure 24.2(a), we assume there are three nodes in the distributed cluster labelled A, B, and C,
where node C has a bigger capacity than nodes A and B.

In a typical system, there will be many more nodes. On the circle, two instances each of A and B
are placed, and three instances of C (because of its higher capacity), in a pseudorandom manner to
cover the circle.

Figure 24.2(a) indicates which (k, v) items are placed in which nodes based on the h(k) values
The h(k) values that fall in the parts of the circle marked as range 1 in Figure 24.2(a) will have their
(k, v) items stored in node A because that is the node whose label follows h(k) on the ring in a
clockwise direction; those in range 2 are stored in node B; and those in range 3 are stored in node
C.

This scheme allows horizontal scalability because when a new node is added to the distributed
system, it can be added in one or more locations on the ring depending on the node capacity.

Only a limited percentage of the (k, v) items will be reassigned to the new node from the existing
nodes based on the consistent hashing placement algorithm.

Also, those items assigned to the new node may not all come from only one of the existing nodes
because the new node can have multiple locations on the ring.

For example, if a node D is added and it has two placements on the ring as shown in Figure 24.2(b),
then some of the items from nodes B and C would be moved to node D.

The items whose keys hash to range 4 on the circle (see Figure 24.2(b)) would be migrated to node
D. This scheme also allows replication by placing the number of specified replicas of an item on
successive nodes on the ring in a clockwise direction.
The sharding is built into the method, and different items in the store (file) are located on different
nodes in the distributed cluster, which means the items are horizontally partitioned (sharded) among
the nodes in the distributed system.

When a node fails, its load of data items can be distributed to the other existing nodes whose labels
follow the labels of the failed node in the ring. And nodes with higher capacity can have more
locations on the ring, as illustrated by node C in Figure 24.2(a), and thus store more items than
smaller-capacity nodes.

■ Consistency and versioning. Voldemort uses a method similar to the one developed for
DynamoDB for consistency in the presence of replicas.

Basically, concurrent write operations are allowed by different processes so there could exist two
or more different values associated with the same key at different nodes when items are replicated.
Consistency is achieved when the item is read by using a technique known as versioning and read
repair.

Concurrent writes are allowed, but each write is associated with a vector clock value. When a read
occurs, it is possible that different versions of the same value (associated with the same key) are
read from different nodes.

If the system can reconcile to a single final value, it will pass that value to the read; otherwise, more
than one version can be passed back to the application, which will reconcile the various versions
into one version based on the application semantics and give this reconciled value back to the nodes.
24.4.3 Examples of Other Key-Value Stores In this section, we briefly review three other key-value
stores. It is important to note that there are many systems that can be classified in this category, and
we can only mention a few of these systems. Oracle key-value store.

Oracle has one of the well-known SQL relational database systems, and Oracle also offers a system
based on the key-value store concept; this system is called the Oracle NoSQL Database. Redis
differs from the other systems discussed here because it caches its data in main memory to further
improve performance. It offers master-slave replication and high availability, and it also offers
persistence by backing up the cache to disk. Apache Cassandra. Cassandra is a NOSQL system that
is not easily categorized into one category; it is sometimes listed in the column-based NOSQL
category (see Section 24.5) or in the key-value category. If offers features from several NOSQL
categories and is used by Facebook as well as many other customers.
24.5 Column-Based or Wide Column NOSQL Systems

Another category of NOSQL systems is known as column-based or wide column systems.

The Google distributed storage system for big data, known as BigTable, is a well-known example
of this class of NOSQL systems, and it is used in many Google applications that require large
amounts of data storage, such as Gmail. BigTable uses the Google File System (GFS) for data
storage and distribution.

An open source system known as Apache Hbase is somewhat similar to Google BigTable, but it
typically uses HDFS (Hadoop Distributed File System) for data storage. HDFS is used in many
cloud computing applications.

We will focus on Hbase in this section as an example of this category of NOSQL systems.

BigTable (and Hbase) is sometimes described as a sparse multidimensional distributed persistent


sorted map, where the word map means a collection of (key, value) pairs (the key is mapped to the
value).

One of the main differences that distinguish column-based systems from key-value stores (see
Section 24.4) is the nature of the key

In column-based systems such as Hbase, the key is multidimensional and so has several
components: typically, a combination of table name, row key, column, and timestamp.

As we shall see, the column is typically composed of two components:

column family and column qualifier.

The data model in Hbase organizes data using the concepts of namespaces, tables, column families,
column qualifiers, columns, rows, and data cells.

A column is identified by a combination of (column family:column qualifier).

Data is stored in a self-describing form by associating columns with data values, where data values
are strings.

Hbase also stores multiple versions of a data item, with a timestamp associated with each version,
so versions and timestamps are also part of the Hbase data model

As with other NOSQL systems, unique keys are associated with stored data items for fast access,
but the keys identify cells in the storage system. Because the focus is on high performance when
storing huge amounts of data, the data model includes some storage-related concepts. We discuss
the Hbase data modeling concepts and define the terminology next. It is important to note that the
use of the words table, row, and column is not identical to their use in relational databases, but the
uses are related.

■ Tables and Rows. Data in Hbase is stored in tables, and each table has a table name. Data in a
table is stored as self-describing rows. Each row has a unique row key, and row keys are strings
that must have the property that they can be lexicographically ordered, so characters that do not
have a lexicographic order in the character set cannot be used as part of a row key.

■ Column Families, Column Qualifiers, and Columns.

o A table is associated with one or more column families.


o Each column family will have a name, and the column families associated with a table must
be specified when the table is created and cannot be changed later.
o Figure 24.3(a) shows how a table may be created; the table name is followed by the names
of the column families associated with the table.
o When the data is loaded into a table, each column family can be associated with many
column qualifiers, but the column qualifiers are not specified as part of creating a table.
o So the column qualifiers make the model a self-describing data model because the
qualifiers can be dynamically specified as new rows are created and inserted into the table.
o A column is specified by a combination of Column Family: Column Qualifier. Basically,
column families are a way of grouping together related columns (attributes in relational
terminology) for storage purposes, except that the column qualifier names are not specified
during table creation.
o Rather, they are specified when the data is created and stored in rows, so the data is
selfdescribing since any column qualifier name can be used in a new row of data (see Figure
24.3(b)). However, it is important that the application programmers know which column
qualifiers belong to each column family, even though they have the flexibility to create new
column qualifiers on the fly when new data rows are created.
o The concept of column family is somewhat similar to vertical partitioning , because
columns (attributes) that are accessed together because they belong to the same column
family are stored in the same files. Each column family of a table is stored in its own files
using the HDFS file system.
■ Versions and Timestamps. Hbase can keep several versions of a data item, along with the
timestamp associated with each version. The timestamp is a long integer number that represents the
system time when the version was created, so newer versions have larger timestamp values. Hbase
uses midnight ‘January 1, 1970 UTC’ as timestamp value zero, and uses a long integer that
measures the number of milliseconds since that time as the system timestamp value (this is similar
to the value returned by the Java utility java.util.Date.getTime() and is also used in MongoDB). It
is also possible for

the user to define the timestamp value explicitly in a Date format rather than using the system-
generated timestamp.

■ Cells. A cell holds a basic data item in Hbase. The key (address) of a cell is specified by a
combination of (table, rowid, columnfamily, columnqualifier, timestamp). If timestamp is left out,
the latest version of the item is retrieved unless a default number of versions is specified, say the
latest three versions. The default number of versions to be retrieved, as well as the default number
of versions that the system needs to keep, are parameters that can be specified during table creation.
■ Namespaces. A namespace is a collection of tables. A namespace basically specifies a collection
of one or more tables that are typically used together by user applications, and it corresponds to a
database that contains a collection of tables in relational terminology.

24.5.2 Hbase CRUD Operations

 HBase CRUD Operations

- **CRUD Operations**: HBase provides low-level CRUD (create, read, update,


delete) operations.

- **Complex Operations**: More complex operations, like joins between rows in


different tables, must be implemented by application programs.

 CRUD Operations Formats (Figure 24.3(c))

- **Create Operation**:

- Creates a new table.

- Specifies one or more column families for the table.

- Does not specify column qualifiers.

- **Put Operation**:

- Inserts new data.

- Inserts new versions of existing data items.

- **Get Operation**:

- Retrieves data associated with a single row in a table.


- **Scan Operation**:

- Retrieves all rows in a table.

24.5.3 Hbase Storage and Distributed System Concepts


 HBase Architecture and Components
 Table and Region Structure

- **Regions**:

- Each HBase table is divided into regions.

- Regions hold a range of row keys, requiring lexicographical ordering of row keys.

- **Stores**:

- Each region contains several stores.

- Each column family is assigned to one store within a region.

 Server Roles

- **Region Servers (Storage Nodes)**:

- Store assigned regions.

- **Master Server (Master Node)**:

- Monitors region servers.

- Splits tables into regions and assigns regions to region servers.

 Integration with Other Systems

- **Apache Zookeeper**:

- Manages naming, distribution, and synchronization of HBase data.

- Provides coordination and replication services.

- Can have multiple replicas for availability.

- Keeps necessary data in main memory for fast access.


- **Apache HDFS (Hadoop Distributed File System)**:

- Provides distributed file services for HBase.

- HBase is built on top of both HDFS and Zookeeper.

24.6 NOSQL Graph Databases and Neo4j


Another category of NOSQL systems is known as graph databases or graph oriented NOSQL
systems.

The data is represented as a graph, which is a collection of vertices (nodes) and edges.

Both nodes and edges can be labeled to indicate the types of entities and relationships they
represent, and it is generally possible to store data associated with both individual nodes and
individual edges.

Many systems can be categorized as graph databases. We will focus our discussion on one particular
system, Neo4j, which is used in many applications.

Neo4j is an open source system, and it is implemented in Java.

24.6.1 Neo4j Data Model

- **Data Organization in Neo4j**:

- **Nodes and Relationships**: Fundamental components for organizing data.

- **Properties**: Both nodes and relationships can have properties to store data items.

- **Labels**: Nodes can have zero, one, or several labels, grouping them into subsets for querying.

- **Directed Relationships**: Each relationship has a start node, an end node, and a relationship
type.

- **Properties Specification**:

- Defined using a map pattern with “name: value” pairs enclosed in curly brackets, e.g., `{Lname:
‘Smith’, Fname: ‘John’, Minit: ‘B’}`.
- **Terminology**:

- In graph theory, nodes and relationships are called vertices and edges, respectively.

- **Comparison with ER/EER Models**:

- **Nodes**: Correspond to entities in ER/EER models.

- **Node Labels**: Correspond to entity types and subclasses.

- **Relationships**: Correspond to relationship instances.

- **Relationship Types**: Correspond to relationship types in ER/EER models.

- **Properties**: Correspond to attributes in ER/EER models.

- **Directed Relationships**: Unlike ER/EER models, relationships in Neo4j are directed.

- **Labels on Nodes**: Nodes can have no label in Neo4j, unlike entities in ER/EER models that
must belong to an entity type.

- **Practical Usage**: Neo4j’s graph model is used for high-performance distributed databases,
whereas the ER/EER model is primarily for database design.

- **Node and Relationship Creation in Neo4j**:

- Created using the Neo4j CREATE command in Cypher, Neo4j’s declarative query language.

- Various ways to create nodes and relationships using Neo4j APIs and scripting interfaces.

- **Figure Reference**:

- Figure 24.4(a) illustrates node creation in Neo4j.

■ Labels and Properties

- **Node Labeling**:

- Node labels can be specified when a node is created.


- Nodes can be created without any labels.

- Example labels from Figure 24.4(a): EMPLOYEE, DEPARTMENT, PROJECT, LOCATION.

- Node data corresponds to the COMPANY database (Figure 5.6) with modifications, e.g., EmpId
instead of SSN.

- **Properties**:

- Enclosed in curly brackets `{ ... }`.

- Nodes can have multiple labels, e.g., `PERSON:EMPLOYEE:MANAGER`.

- Multiple labels can represent an entity belonging to an entity type and its subclasses in the EER
model but can also serve other purposes.

 Relationships and Relationship Types

- **Relationship Creation**:

- Illustrated in Figure 24.4(b) with the COMPANY database.

- Direction specified by `→`, but can be traversed in either direction.

- Example relationship types (labels): WorksFor, Manager, LocatedIn, WorksOn.

- Only WorksOn relationships have properties (e.g., Hours).

 Paths

- **Path Specification**:

- Specifies a traversal part of the graph.

- Used in queries to specify patterns, retrieving data matching the pattern.

- Composed of a start node, followed by one or more relationships, leading to end nodes.

- Similar to path expressions in OQL and XML query languages (XPath and XQuery).

 Optional Schema
- **Schema Usage**:

- Optional in Neo4j.

- Graphs can be created and used without a schema.

- Neo4j version 2.0 introduced some schema-related functions.

- Features include creating indexes and constraints based on labels and properties.

- Example: Key constraint on a property to ensure unique values for all nodes with a specific label.

 Indexing and Node Identifiers

- **Node Identifiers**:

- Neo4j assigns an internal unique system-defined identifier to each node upon creation.

- **Indexing**:

- Indexes can be created for collections of nodes with specific labels to efficiently retrieve nodes
based on their properties.

- Example:

- Index nodes with EMPLOYEE label by EmpId.

- Index nodes with DEPARTMENT label by Dno.

- Index nodes with PROJECT label by Pno.

24.6.2 The Cypher Query Language of Neo4j :

### Cypher Query Language in Neo4j

* General Features

- **High-Level Query Language**: Cypher is a declarative language used for querying Neo4j
databases.

- **Command Capabilities**: Supports creating, finding, deleting, and modifying nodes and
relationships.
* Query Structure

- **Clauses**: A Cypher query is composed of multiple clauses. The result from one clause can
serve as input to the next clause.

* Key Clauses in Cypher (Figure 24.4(c))

- **MATCH**: Finds nodes and relationships based on specified patterns.

- **WHERE**: Adds conditions to filter query results.

- **RETURN**: Specifies what should be returned by the query.

- **ORDER BY**: Sorts the results based on specified properties.

- **LIMIT**: Limits the number of results returned.

- **CREATE**: Used to create nodes and relationships.

- **WITH**: Passes results from one part of a query to the next, often used with aggregation.

- **SET**: Adds or updates properties and labels on nodes.

* Examples of Cypher Queries (Figure 24.4(d))

- **Query 1**: Basic node creation (covered in the previous section).

- **Query 2**: Creation of relationships (covered in the previous section).

- **Query 3**: Returns employees and hours per week for those working on the project with Pno
= 2.

- **Query 4**: Uses `ORDER BY` to return all employees and the projects they work on, sorted
by Ename.

- **Query 5**: Uses `LIMIT` to return only the first 10 results.

- **Query 6**: Illustrates the use of `WITH` and `WHERE` for aggregation and additional
conditions. Returns employees who work on more than two projects and the number of projects
each works on.
- **Query 7**: Similar to Query 5 but returns the nodes and relationships themselves for
visualization.

- **Query 8**: Demonstrates adding properties to a node by adding a Job property to an employee
node.

* Visualization and Schema Flexibility

- **Graph Visualization**: Results can be displayed as a graph using Neo4j's visualization tools.

- **Schema Flexibility**: Cypher queries can add or remove labels and properties, providing
flexibility in data modeling.

These points summarize the functionality and features of the Cypher query language in Neo4j,
highlighting its ability to perform complex queries and updates on graph databases.
24.6.3 Neo4j Interfaces and Distributed System Characteristics
Neo4j has other interfaces that can be used to create, retrieve, and update nodes and relationships
in a graph database. It also has two main versions: the enterprise edition, which comes with
additional capabilities, and the community edition. We discuss some of the additional features of
Neo4j in this subsection.
■ Enterprise edition vs. community edition. Both editions support the Neo4j graph data model and
storage system, as well as the Cypher graph query language, and several other interfaces, including
a high-performance native API, language drivers for several popular programming languages, such
as Java, Python, PHP, and the REST (Representational State Transfer) API. In addition, both
editions support ACID properties. The enterprise edition supports additional features for enhancing
performance, such as caching and clustering of data and locking. ■ Graph visualization interface.
Neo4j has a graph visualization interface, so that a subset of the nodes and edges in a database graph
can be displayed as a graph. This tool can be used to visualize query results in a graph
representation.

■ Master-slave replication. Neo4j can be configured on a cluster of distributed system nodes


(computers), where one node is designated the master node. The data and indexes are fully
replicated on each node in the cluster. Various ways of synchronizing the data between master and
slave nodes can be configured in the distributed cluster.

■ Caching. A main memory cache can be configured to store the graph data for improved
performance.

■ Logical logs. Logs can be maintained to recover from failures.

You might also like