Mod5 CH2
Mod5 CH2
1. Scalability:
There are two kinds of scalability in distributed systems: horizontal and vertical
In NOSQL systems, horizontal scalability is generally used, where the distributed
system is expanded by adding more nodes for data storage and processing as the
volume of data grows.
Vertical scalability refers to expanding the storage and computing power of existing
nodes.
In NOSQL systems, horizontal scalability is employed while the system is operational,
so techniques for distributing the existing data among new nodes without interrupting
system operation are necessary.
2. Availability, Replication and Eventual Consistency:
a. Many applications that use NOSQL systems require continuous system availability.
To accomplish this, data is replicated over two or more nodes in a transparent
manner, so that if one node fails, the data is still available on other nodes.
b. Replication improves data availability and can also improve read performance,
because read requests can often be serviced from any of the replicated data nodes.
However, write performance becomes more cumbersome because an update must
be applied to every copy of the replicated data items; this can slow down write
performance if serializable consistency is required.
3. Replication Models:
a. Two major replication models are used in NOSQL systems:
i. master-slave
ii. master-master replication.
i. Master-slave replication requires one copy to be the master copy; all write operations
must be applied to the master copy and then propagated to the slave copies, usually using
eventual consistency (the slave copies will eventually be the same as the master copy).
a. For read, the master-slave paradigm can be configured in various ways.
b. One configuration requires all reads to also be at the master copy, so this would
be similar to the primary site or primary copy methods of distributed
concurrency control, with similar advantages and disadvantages.
c. Another configuration would allow reads at the slave copies but would not
guarantee that the values are the latest writes, since writes to the slave nodes can
be done after they are applied to the master copy.
ii. The master-master replication allows reads and writes at any of the replicas but may not
guarantee that reads at nodes that store different copies see the same values. Different
users may write the same data item concurrently at different nodes of the system, so the
values of the item will be temporarily inconsistent.
A reconciliation method to resolve conflicting write operations of the same data item at
different nodes must be implemented as part of the master-master replication scheme.
4. Sharding of Files:
a. Sharding of files refers to the process of dividing a large file into smaller, more
manageable pieces known as shards. (also known as horizontal partitioning )This
technique is commonly used in distributed systems and databases to improve
performance, scalability, and fault tolerance.
b. In many NOSQL applications, files (or collections of data objects) can have many
millions of records (or documents or objects), and these records can be accessed
concurrently by thousands of users. So it is not practical to store the whole file in
one node.
c. The combination of sharding the file records and replicating the shards works in
tandem to improve load balancing as well as data availability
NOSQL systems emphasize performance and flexibility over modeling power and complex
querying. We discuss some of these characteristics next.
NOSQL systems have been characterized into four major categories, with some additional
categories that encompass other types of systems. The most common categorization lists the
following four major categories:
5. Hybrid NOSQL systems: These systems have characteristics from two or more of the above
four categories.
6. Object databases:
An object database is a type of database management system (DBMS) that stores data in the form
of objects, making it particularly suitable for object-oriented programming languages. Unlike
traditional relational databases, which store data in tables with rows and columns, object databases
store data as objects, which can encapsulate both data (attributes) and behavior (methods).
o Complex Data Structures: Object databases can store complex data structures
directly, including hierarchical structures, nested objects, arrays, and other non-
primitive data types. This flexibility makes them well-suited for applications with
rich and complex data models.
o Performance: Object databases can offer performance advantages for certain types
of applications, especially those that involve complex data structures and frequent
object manipulations. However, the performance of object databases can vary
depending on factors such as the specific implementation, query complexity, and
data access patterns.
o Usage: Object databases are commonly used in domains where the data model is
inherently object-oriented, such as in software engineering tools, CAD/CAM
systems, multimedia applications, and complex modeling and simulation
applications.
6. XML databases:
XML databases are a type of database management system (DBMS) designed specifically for
storing, querying, and managing XML (eXtensible Markup Language) data. XML is a markup
language commonly used for structuring and representing data in a hierarchical format, making it
suitable for a wide range of applications, including web development, data interchange, and
document storage.
Native XML Storage: XML databases store XML documents natively, meaning that XML
data is stored directly without the need for any intermediate conversion or mapping. This
allows for efficient storage and retrieval of XML documents, preserving their hierarchical
structure and element relationships.
XML Querying: XML databases typically provide query languages or APIs that allow users
to query and manipulate XML data. These query languages are designed to support the
hierarchical structure of XML documents and may include features such as XPath and
XQuery for navigating and selecting XML elements.
Schema Support: XML databases may support XML schema languages such as Document
Type Definitions (DTDs) or XML Schema Definition (XSD), which define the structure,
constraints, and data types of XML documents. Schema validation ensures data integrity
and helps enforce data consistency.
Indexing and Full-Text Search: XML databases often include indexing and full-text search
capabilities to facilitate efficient retrieval of XML documents based on content and
structure. Indexing allows for fast lookup of XML elements and attributes, while full-text
search enables keyword-based searching within XML documents.
Integration with XML Technologies: XML databases are designed to integrate seamlessly
with other XML technologies and standards, such as XSLT (eXtensible Stylesheet
Language Transformations) for transforming XML data, XInclude for including external
XML documents, and XLink for creating hyperlinks between XML documents.
Scalability and Performance: XML databases are optimized for handling large volumes of
XML data and can scale to accommodate growing datasets and user loads. They often
employ techniques such as data partitioning, caching, and parallel processing to improve
performance and scalability.
Application Areas: XML databases are commonly used in various application domains
where XML is the primary data format, such as content management systems, document
repositories, web services, electronic publishing, and scientific data management.
In a system with data replication, concurrency control becomes more complex because there can
be multiple copies of each data item. So if an update is applied to one copy of an item, it must be
applied to all other copies in a consistent manner.
The possibility exists that one copy of an item X is updated by a transaction T1 whereas another
copy is updated by a transaction T2, so two inconsistent copies of the same item exist at two
different nodes in the distributed system.
If two other transactions T3 and T4 want to read X, each may read a different copy of item X.
We saw in Section 23.3 that there are distributed concurrency control methods that do not allow
this inconsistency among copies of the same data item, thus enforcing serializability and hence the
isolation property in the presence of replication. However, these techniques often come with high
overhead, which would defeat the purpose of creating multiple copies to improve performance and
availability in distributed database systems such as NOSQL. In the field of distributed systems,
there are various levels of consistency among replicated data items, from weak consistency to strong
consistency. Enforcing serializability is considered the strongest form of consistency, but it has high
overhead so it can reduce performance of read and write operations and hence adversely affect
system performance.
The CAP theorem, which was originally introduced as the CAP principle, can be used to explain
some of the competing requirements in a distributed system with replication.
The three letters in CAP refer to three desirable properties of distributed systems with replicated
data:
availability (of the system for read and write operations) and
partition tolerance (in the face of the nodes in the system being partitioned by a network fault).
Consistency means that the nodes will have the same copies of a replicated data item visible for
various transactions
Availability means that each read or write request for a data item will either be processed
successfully or will receive a message that the operation cannot be completed.
Partition tolerance means that the system can continue operating if the network connecting the
nodes has a fault that results in two or more partitions, where the nodes in each partition can only
communicate among each other.
It is important to note here that the use of the word consistency in CAP and its use in ACID do not
refer to the same identical concept.
In CAP, the term consistency refers to the consistency of the values in different
copies of the same data item in a replicated distributed system.
In ACID, it refers to the fact that a transaction will not violate the integrity
constraints specified on the database schema.
However, if we consider that the consistency of replicated copies is a specified
constraint, then the two uses of the term consistency would be related.
The CAP theorem states that it is not possible to guarantee all three of the desirable
properties—consistency, availability, and partition tolerance—at the same time in a
distributed system with data replication. If this is the case, then the distributed
system designer would have to choose two properties out of the three to guarantee.
It is generally assumed that in many traditional (SQL) applications, guaranteeing
consistency through the ACID properties is important. On the other hand, in a
NOSQL distributed data store, a weaker consistency level is often acceptable, and
guaranteeing the other two properties (availability, partition tolerance) is important.
Hence, weaker consistency levels are often used in NOSQL system instead of
guaranteeing serializability.
In particular, a form of consistency known as eventual consistency is often adopted
in NOSQL systems.
MongoDB documents are stored in BSON (Binary JSON) format, which is a variation of
JSON with some additional data types and is more efficient for storage than JSON.
Individual documents are stored in a collection. We will use a simple example based on
our COMPANY database that we used throughout this book.
The operation createCollection is used to create each collection.
For example, the following command can be used to create a collection called project to
hold PROJECT objects from the COMPANY database
o db.createCollection(“project”, { capped : true, size : 1310720, max : 500 } )
The first parameter “project” is the name of the collection, which is followed by an optional
document that specifies collection options.
In our example, the collection is capped; this means it has upper limits on its storage space
(size) and number of documents (max).
The capping parameters help the system choose the storage options for each collection.
There are other collection options, but we will not discuss them here.
For our example, we will create another document collection called worker to hold
information about the EMPLOYEEs who work on each project;
for example:
o db.createCollection(“worker”, { capped : true, size : 5242880, max : 2000 } ) )
Each document in a collection has a unique ObjectId field, called _id, which is automatically
indexed in the collection unless the user explicitly requests no index for the _id field.
The value of ObjectId can be specified by the user, or it can be system-generated if the user
does not specify an _id field for a particular document.
System-generated ObjectIds have a specific format, which combines the timestamp when the
object is created (4 bytes, in an internal MongoDB format), the node id (3 bytes), the process id
(2 bytes), and a counter (3 bytes) into a 16-byte Id value.
User-generated ObjectsIds can have any value specified by the user as long as it uniquely
identifies the document and so these Ids are similar to primary keys in relational systems.
A collection does not have a schema.
The structure of the data fields in documents is chosen based on how documents will be
accessed and used, and the user can choose a normalized design (similar to normalized relational
tuples) or a denormalized design (similar to XML documents or complex objects).
Interdocument references can be specified by storing in one document the ObjectId or ObjectIds
of other related documents.
Figure 24.1(a) shows a simplified MongoDB document showing some of the data from Figure
5.6 from the COMPANY database example
. In our example, the _id values are user-defined, and the documents whose _id starts with P
(for project) will be stored in the “project” collection, whereas those whose _id starts with W
(for worker) will be stored in the “worker” collection. In Figure 24.1(a), the workers information
is embedded in the project document; so there is no need for the “worker” collection.
This is known as the denormalized pattern, which is similar to creating a complex object or an
XML document. A list of values that is enclosed in square brackets [ … ] within a document
represents a field whose value is an array.
Another option is to use the design in Figure 24.1(b), where worker references are embedded
in the project document, but the worker documents themselves are stored in a separate “worker”
collection. A third option in Figure 24.1(c) would use a normalized design, similar to First
Normal Form relations. The choice of which design option to use depends on how the data will
be accessed. It is important to note that the simple design in Figure 24.1(c) is not the general
normalized design for a many-to-many relationship, such as the one between employees and
projects; rather, we would need three collections for “project”, “employee”, and “works_on”,
In the design in Figure 24.1(c), an EMPLOYEE who works on several projects would be
represented by multiple worker documents with different _id values; each document would
represent the employee as worker for a particular project. This is similar to the design decisions
for XML schema design (see Section 13.6). However, it is again important to note that the
typical document-based system does not have a schema, so the design rules would have to be
followed whenever individual documents are inserted into a collection.
24.3.2 MongoDB CRUD Operations
MongoDb has several CRUD operations, where CRUD stands for (create, read, update,
delete).
Documents can be created and inserted into their collections using the insert operation,
whose format is:
o db. <collection_name>.insert(<document(s)>)
The parameters of the insert operation can include either a single document or an array
of documents, as shown in Figure 24.1(d). The delete operation is called remove, and
the format is:
o db. <collection_name>.remove(<condition>)
The documents to be removed from the collection are specified by a Boolean condition on
some of the fields in the collection documents.
There is also an update operation, which has a condition to select certain documents, and a
$set clause to specify the update.
It is also possible to use the update operation to replace an existing document with another
one but keep the same ObjectId. For read queries, the main command is called find, and the
format is:
o db. <collection_name>. find (<condition>)
General Boolean conditions can be specified as , and the documents in the collection that
return true are selected for the query result. For a full discussion of the MongoDb CRUD
operations, see the MongoDB online documentation in the chapter references.
Most MongoDB updates are atomic if they refer to a single document, but MongoDB also provides
a pattern for specifying transactions on multiple documents.
Since MongoDB is a distributed system, the two-phase commit method is used to ensure atomicity
and consistency of multi-document transactions.
The concept of replica set is used in MongoDB to create multiple copies of the same data set on
different nodes in the distributed system, and it uses a variation of the master-slave approach for
replication.
For example, suppose that we want to replicate a particular document collection C.
A replica set will have one primary copy of the collection C stored in one node N1, and at least
one secondary copy (replica) of C stored at another node N2.
Additional copies can be stored in nodes N3, N4, etc., as needed, but the cost of storage and update
(write) increases with the number of replicas.
The total number of participants in a replica set must be at least three, so if only one secondary copy
is needed, a participant in the replica set known as an arbiter must run on the third node N3.
The arbiter does not hold a replica of the collection but participates in elections to choose a new
primary if the node storing the current primary copy fails.
If the total number of members in a replica set is n (one primary plus i secondary’s, for a total of n
= i + 1), then n must be an odd number; if it is not, an arbiter is added to ensure the election
process works correctly if the primary fails.
In MongoDB replication, all write operations must be applied to the primary copy and then
propagated to the secondaries.
For read operations, the user can choose the particular read preference for their application.
The default read preference processes all reads at the primary copy, so all read and write operations
are performed at the primary node.
In this case, secondary copies are mainly to make sure that the system continues operation if the
primary fails, and MongoDB can ensure that every read request gets the latest document value.
To increase read performance, it is possible to set the read preference so that read requests can be
processed at any replica (primary or secondary); however, a read at a secondary is not guaranteed
to get the latest version of a document because there can be a delay in propagating writes from the
primary to the secondaries.
Sharding in MongoDB.
When a collection holds a very large number of documents or requires a large storage space, storing
all the documents in one node can lead to performance problems, particularly if there are many user
operations accessing the documents concurrently using various CRUD operations.
Sharding of the documents in the collection—also known as horizontal partitioning— divides the
documents into disjoint partitions known as shards. This allows the system to add more nodes as
needed by a process known as horizontal scaling of the distributed system and to store the shards
of the collection on different nodes to achieve load balancing.
Each node will process only those operations pertaining to the documents in the shard stored at that
node. Also, each shard will contain fewer documents than if the entire collection were stored at one
node, thus further improving performance.
Range Partitioning
Hash Partitioning.
Both require that the user specify a particular document field to be used as the basis for partitioning
the documents into shards.
The partitioning field—known as the shard key in MongoDB—must have two characteristics:
it must exist in every document in the collection, and it must have an index.
The ObjectId can be used, but any other field possessing these two characteristics can also
be used as the basis for sharding.
The values of the shard key are divided into chunks either through range partitioning or hash
partitioning, and the documents are partitioned based on the chunks of shard key values.
for example, if the shard key values ranged from one to ten million, it is possible to create
ten ranges—1 to 1,000,000; 1,000,001 to 2,000,000; ..… ; 9,000,001 to 10,000,000—and each
chunk would contain the key values in one range.
Hash partitioning applies a hash function h(K) to each shard key K, and the partitioning of keys
into chunks is based on the hash values
In general, if range queries are commonly applied to a collection (for example, retrieving all
documents whose shard key value is between 200 and 400), then range partitioning is preferred
because each range query will typically be submitted to a single node that contains all the required
documents in one shard.
If most searches retrieve one document at a time, hash partitioning may be preferable because it
randomizes the distribution of shard key values into chunks.
When sharding is used, MongoDB queries are submitted to a module called the query router, which
keeps track of which nodes contain which shards based on the particular partitioning method used
on the shard keys.
The query (CRUD operation) will be routed to the nodes that contain the shards that hold the
documents that the query is requesting. If the system cannot determine which shards hold the
required documents, the query will be submitted to all the nodes that hold shards of the collection.
Sharding and replication are used together; sharding focuses on improving performance via load
balancing and horizontal scalability, whereas replication focuses on ensuring system availability
when certain nodes fail in the distributed system.
Key-value stores focus on high performance, availability, and scalability by storing data in a
distributed storage system.
The data model used in key-value stores is relatively simple, and in many of these systems, there is
no query language but rather a set of operations that can be used by the application programmers.
The key is a unique identifier associated with a data item and is used to locate this data item rapidly.
The value is the data item itself, and it can have very different formats for different key-value
storage systems.
In some cases, the value is just a string of bytes or an array of bytes, and the application using the
key-value store has to interpret the structure of the data value.
In other cases, some standard formatted data is allowed; for example, structured data rows (tuples)
similar to relational data, or semi-structured data using JSON or some other self-describing data
format.
Different key-value stores can thus store unstructured, semi-structured, or structured data items
The main characteristic of key-value stores is the fact that every value (data item) must be associated
with a unique key, and that retrieving the value by supplying the key must be very fast.
There are many systems that fall under the key-value store label, so rather than provide a lot of
details on one particular system, we will give a brief introductory overview for some of these
systems and their characteristics.
24.4.1 DynamoDB Overview
The DynamoDB system is an Amazon product and is available as part of Amazon’s AWS/SDK
platforms (Amazon Web Services/Software Development Kit). It can be used as part of Amazon’s
cloud computing services, for the data storage component. DynamoDB data model. The basic data
model in DynamoDB uses the concepts of tables, items, and attributes.
A table in DynamoDB does not have a schema; it holds a collection of self-describing items. Each
item will consist of a number of (attribute, value) pairs, and attribute values can be single-valued
or multivalued. So basically, a table will hold a collection of items, and each item is a self-describing
record (or object).
DynamoDB also allows the user to specify the items in JSON format, and the system will convert
them to the internal storage format of DynamoDB. When a table is created, it is required to specify
a table name and a primary key; the primary key will be used to rapidly locate the items in the table.
Thus, the primary key is the key and the item is the value for the DynamoDB key-value store.
The primary key attribute must exist in every item in the table. The primary key can be one of the
following two types:
■ A single attribute. The DynamoDB system will use this attribute to build a hash index on the
items in the table. This is called a hash type primary key. The items are not ordered in storage on
the value of the hash attribute.
■ A pair of attributes. This is called a hash and range type primary key. The primary key will be a
pair of attributes (A, B): attribute A will be used for hashing, and because there will be multiple
items with the same value of A, the B values will be used for ordering the records with the same A
value. A table with this type of key can have additional secondary indexes defined on its attributes.
For example, if we want to store multiple versions of some type of items in a table, we could use
ItemID as hash and Date or Timestamp (when the version was created) as range in a hash and range
type primary key.
■ Simple basic operations. A collection of (key, value) pairs is kept in a Voldemort store. In our
discussion, we will assume the store is called s. The basic interface for data storage and retrieval is
very simple and includes three operations: get, put, and delete. The operation s.put(k, v) inserts an
item as a key-value pair with key k and value v. The operation s.delete(k) deletes the item whose
key is k from the store, and the operation v = s.get(k) retrieves the value v associated with key k.
The application can use these basic operations to build its own requirements. At the basic storage
level, both keys and values are arrays of bytes (strings).
■ High-level formatted data values. The values v in the (k, v) items can be specified in JSON
(JavaScript Object Notation), and the system will convert between JSON and the internal storage
format. Other data object formats can also be specified if the application provides the conversion
(also known as serialization) between the user format and the storage format as a Serializer class.
The Serializer class must be provided by the user and will include operations to convert the user
format into a string of bytes for storage as a value, and to convert back a string (array of bytes)
retrieved via s.get(k) into the user format. Voldemort has some built-in serializers for formats other
than JSON.
■ Consistent hashing for distributing (key, value) pairs. A variation of the data distribution
algorithm known as consistent hashing is used in Voldemort for data distribution among the nodes
in the distributed cluster of nodes.
A hash function h(k) is applied to the key k of each (k, v) pair, and h(k) determines where the
item will be stored.
The method assumes that h(k) is an integer value,
o usually in the range 0 to Hmax = 2n−1,
where n is chosen based on the desired range for the hash values.
This method is best visualized by considering the range of all possible integer hash values 0 to
Hmax to be evenly distributed on a circle (or ring). The nodes in the distributed system are then
also located on the same ring; usually each node will have several locations on the ring (see Figure
24.2).
The positioning of the points on the ring that represent the nodes is done in a psuedorandom
manner.
An item (k, v) will be stored on the node whose position in the ring follows the position of h(k) on
the ring in a clockwise direction.
In Figure 24.2(a), we assume there are three nodes in the distributed cluster labelled A, B, and C,
where node C has a bigger capacity than nodes A and B.
In a typical system, there will be many more nodes. On the circle, two instances each of A and B
are placed, and three instances of C (because of its higher capacity), in a pseudorandom manner to
cover the circle.
Figure 24.2(a) indicates which (k, v) items are placed in which nodes based on the h(k) values
The h(k) values that fall in the parts of the circle marked as range 1 in Figure 24.2(a) will have their
(k, v) items stored in node A because that is the node whose label follows h(k) on the ring in a
clockwise direction; those in range 2 are stored in node B; and those in range 3 are stored in node
C.
This scheme allows horizontal scalability because when a new node is added to the distributed
system, it can be added in one or more locations on the ring depending on the node capacity.
Only a limited percentage of the (k, v) items will be reassigned to the new node from the existing
nodes based on the consistent hashing placement algorithm.
Also, those items assigned to the new node may not all come from only one of the existing nodes
because the new node can have multiple locations on the ring.
For example, if a node D is added and it has two placements on the ring as shown in Figure 24.2(b),
then some of the items from nodes B and C would be moved to node D.
The items whose keys hash to range 4 on the circle (see Figure 24.2(b)) would be migrated to node
D. This scheme also allows replication by placing the number of specified replicas of an item on
successive nodes on the ring in a clockwise direction.
The sharding is built into the method, and different items in the store (file) are located on different
nodes in the distributed cluster, which means the items are horizontally partitioned (sharded) among
the nodes in the distributed system.
When a node fails, its load of data items can be distributed to the other existing nodes whose labels
follow the labels of the failed node in the ring. And nodes with higher capacity can have more
locations on the ring, as illustrated by node C in Figure 24.2(a), and thus store more items than
smaller-capacity nodes.
■ Consistency and versioning. Voldemort uses a method similar to the one developed for
DynamoDB for consistency in the presence of replicas.
Basically, concurrent write operations are allowed by different processes so there could exist two
or more different values associated with the same key at different nodes when items are replicated.
Consistency is achieved when the item is read by using a technique known as versioning and read
repair.
Concurrent writes are allowed, but each write is associated with a vector clock value. When a read
occurs, it is possible that different versions of the same value (associated with the same key) are
read from different nodes.
If the system can reconcile to a single final value, it will pass that value to the read; otherwise, more
than one version can be passed back to the application, which will reconcile the various versions
into one version based on the application semantics and give this reconciled value back to the nodes.
24.4.3 Examples of Other Key-Value Stores In this section, we briefly review three other key-value
stores. It is important to note that there are many systems that can be classified in this category, and
we can only mention a few of these systems. Oracle key-value store.
Oracle has one of the well-known SQL relational database systems, and Oracle also offers a system
based on the key-value store concept; this system is called the Oracle NoSQL Database. Redis
differs from the other systems discussed here because it caches its data in main memory to further
improve performance. It offers master-slave replication and high availability, and it also offers
persistence by backing up the cache to disk. Apache Cassandra. Cassandra is a NOSQL system that
is not easily categorized into one category; it is sometimes listed in the column-based NOSQL
category (see Section 24.5) or in the key-value category. If offers features from several NOSQL
categories and is used by Facebook as well as many other customers.
24.5 Column-Based or Wide Column NOSQL Systems
The Google distributed storage system for big data, known as BigTable, is a well-known example
of this class of NOSQL systems, and it is used in many Google applications that require large
amounts of data storage, such as Gmail. BigTable uses the Google File System (GFS) for data
storage and distribution.
An open source system known as Apache Hbase is somewhat similar to Google BigTable, but it
typically uses HDFS (Hadoop Distributed File System) for data storage. HDFS is used in many
cloud computing applications.
We will focus on Hbase in this section as an example of this category of NOSQL systems.
One of the main differences that distinguish column-based systems from key-value stores (see
Section 24.4) is the nature of the key
In column-based systems such as Hbase, the key is multidimensional and so has several
components: typically, a combination of table name, row key, column, and timestamp.
The data model in Hbase organizes data using the concepts of namespaces, tables, column families,
column qualifiers, columns, rows, and data cells.
Data is stored in a self-describing form by associating columns with data values, where data values
are strings.
Hbase also stores multiple versions of a data item, with a timestamp associated with each version,
so versions and timestamps are also part of the Hbase data model
As with other NOSQL systems, unique keys are associated with stored data items for fast access,
but the keys identify cells in the storage system. Because the focus is on high performance when
storing huge amounts of data, the data model includes some storage-related concepts. We discuss
the Hbase data modeling concepts and define the terminology next. It is important to note that the
use of the words table, row, and column is not identical to their use in relational databases, but the
uses are related.
■ Tables and Rows. Data in Hbase is stored in tables, and each table has a table name. Data in a
table is stored as self-describing rows. Each row has a unique row key, and row keys are strings
that must have the property that they can be lexicographically ordered, so characters that do not
have a lexicographic order in the character set cannot be used as part of a row key.
the user to define the timestamp value explicitly in a Date format rather than using the system-
generated timestamp.
■ Cells. A cell holds a basic data item in Hbase. The key (address) of a cell is specified by a
combination of (table, rowid, columnfamily, columnqualifier, timestamp). If timestamp is left out,
the latest version of the item is retrieved unless a default number of versions is specified, say the
latest three versions. The default number of versions to be retrieved, as well as the default number
of versions that the system needs to keep, are parameters that can be specified during table creation.
■ Namespaces. A namespace is a collection of tables. A namespace basically specifies a collection
of one or more tables that are typically used together by user applications, and it corresponds to a
database that contains a collection of tables in relational terminology.
- **Create Operation**:
- **Put Operation**:
- **Get Operation**:
- **Regions**:
- Regions hold a range of row keys, requiring lexicographical ordering of row keys.
- **Stores**:
Server Roles
- **Apache Zookeeper**:
The data is represented as a graph, which is a collection of vertices (nodes) and edges.
Both nodes and edges can be labeled to indicate the types of entities and relationships they
represent, and it is generally possible to store data associated with both individual nodes and
individual edges.
Many systems can be categorized as graph databases. We will focus our discussion on one particular
system, Neo4j, which is used in many applications.
- **Properties**: Both nodes and relationships can have properties to store data items.
- **Labels**: Nodes can have zero, one, or several labels, grouping them into subsets for querying.
- **Directed Relationships**: Each relationship has a start node, an end node, and a relationship
type.
- **Properties Specification**:
- Defined using a map pattern with “name: value” pairs enclosed in curly brackets, e.g., `{Lname:
‘Smith’, Fname: ‘John’, Minit: ‘B’}`.
- **Terminology**:
- In graph theory, nodes and relationships are called vertices and edges, respectively.
- **Labels on Nodes**: Nodes can have no label in Neo4j, unlike entities in ER/EER models that
must belong to an entity type.
- **Practical Usage**: Neo4j’s graph model is used for high-performance distributed databases,
whereas the ER/EER model is primarily for database design.
- Created using the Neo4j CREATE command in Cypher, Neo4j’s declarative query language.
- Various ways to create nodes and relationships using Neo4j APIs and scripting interfaces.
- **Figure Reference**:
- **Node Labeling**:
- Node data corresponds to the COMPANY database (Figure 5.6) with modifications, e.g., EmpId
instead of SSN.
- **Properties**:
- Multiple labels can represent an entity belonging to an entity type and its subclasses in the EER
model but can also serve other purposes.
- **Relationship Creation**:
Paths
- **Path Specification**:
- Composed of a start node, followed by one or more relationships, leading to end nodes.
- Similar to path expressions in OQL and XML query languages (XPath and XQuery).
Optional Schema
- **Schema Usage**:
- Optional in Neo4j.
- Features include creating indexes and constraints based on labels and properties.
- Example: Key constraint on a property to ensure unique values for all nodes with a specific label.
- **Node Identifiers**:
- Neo4j assigns an internal unique system-defined identifier to each node upon creation.
- **Indexing**:
- Indexes can be created for collections of nodes with specific labels to efficiently retrieve nodes
based on their properties.
- Example:
* General Features
- **High-Level Query Language**: Cypher is a declarative language used for querying Neo4j
databases.
- **Command Capabilities**: Supports creating, finding, deleting, and modifying nodes and
relationships.
* Query Structure
- **Clauses**: A Cypher query is composed of multiple clauses. The result from one clause can
serve as input to the next clause.
- **WITH**: Passes results from one part of a query to the next, often used with aggregation.
- **Query 3**: Returns employees and hours per week for those working on the project with Pno
= 2.
- **Query 4**: Uses `ORDER BY` to return all employees and the projects they work on, sorted
by Ename.
- **Query 6**: Illustrates the use of `WITH` and `WHERE` for aggregation and additional
conditions. Returns employees who work on more than two projects and the number of projects
each works on.
- **Query 7**: Similar to Query 5 but returns the nodes and relationships themselves for
visualization.
- **Query 8**: Demonstrates adding properties to a node by adding a Job property to an employee
node.
- **Graph Visualization**: Results can be displayed as a graph using Neo4j's visualization tools.
- **Schema Flexibility**: Cypher queries can add or remove labels and properties, providing
flexibility in data modeling.
These points summarize the functionality and features of the Cypher query language in Neo4j,
highlighting its ability to perform complex queries and updates on graph databases.
24.6.3 Neo4j Interfaces and Distributed System Characteristics
Neo4j has other interfaces that can be used to create, retrieve, and update nodes and relationships
in a graph database. It also has two main versions: the enterprise edition, which comes with
additional capabilities, and the community edition. We discuss some of the additional features of
Neo4j in this subsection.
■ Enterprise edition vs. community edition. Both editions support the Neo4j graph data model and
storage system, as well as the Cypher graph query language, and several other interfaces, including
a high-performance native API, language drivers for several popular programming languages, such
as Java, Python, PHP, and the REST (Representational State Transfer) API. In addition, both
editions support ACID properties. The enterprise edition supports additional features for enhancing
performance, such as caching and clustering of data and locking. ■ Graph visualization interface.
Neo4j has a graph visualization interface, so that a subset of the nodes and edges in a database graph
can be displayed as a graph. This tool can be used to visualize query results in a graph
representation.
■ Caching. A main memory cache can be configured to store the graph data for improved
performance.