0% found this document useful (0 votes)

29 views25 pages

Unit

The document discusses Aggregate Data Models in NoSQL databases, highlighting their ability to handle complex and nested records, and detailing four main types: Key-Value, Document, Column Family, and Graph-Based Models. It emphasizes the flexibility and schema-less nature of NoSQL databases, which are suited for big data and real-time applications, while also addressing the handling of relationships and the concept of materialized views. Additionally, it explains the advantages of schemaless databases, including enhanced scalability and the ability to process unstructured data.

Uploaded by

Surendra Arjun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views25 pages

Unit

Uploaded by

Surendra Arjun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

UNIT – II

3.1 AGGREGATE DATA MODELS

The Aggregate Data Models in NoSQL Database allow easy handling of complex and list of
nested records. In this article, you will learn about Aggregate Data Models in NoSQL Database,
what are its different types, and their use cases. You will also go through an example of
Aggregate Data Models in NoSQL.

A NoSQL Database, also known as a non SQL or non-relational Database is a non-tabular

Database that stores data differently than the tabular relations used in relational databases.
Companies widely used NoSQL Database generally for big Data and real-time web applications.
NoSQL Databases offers flexible schema, and it’s not mandatory to provide schema. As modern
applications use a variety of changing data, so NoSQL Databases are best suited for them.

Aggregate means a collection of objects that are treated as a unit. In NoSQL Databases, an
aggregate is a collection of data that interact as a unit. Moreover, these units of data or
aggregates of data form the boundaries for the ACID operations.

Aggregate Data Models in NoSQL make it easier for the Databases to manage data storage over
the clusters as the aggregate data or unit can now reside on any of the machines. Whenever data
is retrieved from the Database all the data comes along with the Aggregate Data Models in
NoSQL.

Aggregate Data Models in NoSQL don’t support ACID transactions and sacrifice one of the
ACID properties. With the help of Aggregate Data Models in NoSQL, you can easily perform
OLAP operations on the Database.

Types of Aggregate Data Models

The Aggregate Data Models in NoSQL are majorly classified into 4 Data Models

The Aggregate Data Models in NoSQL are majorly classified into 4 Data Models
The Key-Value Data Model contains the key or an ID used to access or fetch the data of the
aggregates corresponding to the key. In this Aggregate Data Models in NoSQL, the data of the
aggregates are secure and encrypted and can be decrypted with a Key.

2. Document Model

The Document Data Model allows access to the parts of aggregates. In this Aggregate Data
Models in NoSQL, the data can be accessed in an inflexible way. The Database stores and
retrieves documents, which can be XML, JSON, BSON, etc. There are some restrictions on data
structure and data types of the data aggregates that are to be used in this Aggregate Data Models
in NoSQL Database.

3. Column Family Model

Column family is an Aggregate Data Models in NoSQL Database usually with big-table style
Data Models that are referred to as column stores. It is also called a two-level map as it offers a
two-level aggregate structure. In this Aggregate Data Models in NoSQL, the first level of the
Column family contains the keys that act as a row identifier that is used to select the aggregate
data. Whereas the second level values are referred to as columns

4. Graph-Based Model

Graph-based data models store data in nodes that are connected by edges. These Aggregate Data
Models in NoSQL are widely used for storing the huge volumes of complex aggregates and
multidimensional data having many interconnections between them.

Now that you have a brief knowledge of Aggregate Data Models in NoSQL Database. In this
section, you will go through an example to understand how to design Aggregate Data Models in
NoSQL. For this, a Data Model of an E-Commerce website will be used to explain Aggregate
Data Models in NoSQL.

This example of the E-Commerce Data Model has two main aggregates – customer and order.
The customer contains data related to billing addresses while the order aggregate consists of
ordered items, shipping addresses, and payments. The payment also contains the billing address.

// in customers
{
"customer": {
"id": 1,
"name": "Martin",
"billingAddress": [{"city": "Chicago"}],
"orders": [
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":27,
"price": 32.45,
"productName": "NoSQL Distilled"
}
],
"shippingAddress":[{"city":"Chicago"}]

"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Chicago"}
}],
}]
}
}

3.2 Key-Value Data Model in NoSQL

A key-value data model or database is also referred to as a key-value store. It is a non-

relational type of database. In this, an associative array is used as a basic database in which an
individual key is linked with just one value in a collection. For the values, keys are special
identifiers. Any kind of entity can be valued. The collection of key-value pairs stored on
separate records is called key-value databases and they do not have an already defined
structure.

A number of easy strings or even a complicated entity are referred to as a value that is
associated with a key by a key-value database, which is utilized to monitor the entity. Like in
many programming paradigms, a key-value database resembles a map object or array, or
dictionary, however, which is put away in a tenacious manner and controlled by a DBMS.
An efficient and compact structure of the index is used by the key-value store to have the
option to rapidly and dependably find value using its key. For example, Redis is a key-value
store used to tracklists, maps, heaps, and primitive types (which are simple data structures) in a
constant database. Redis can uncover a very basic point of interaction to query and manipulate
value types, just by supporting a predetermined number of value types, and when arranged, is
prepared to do high throughput.

Some examples of key-value databases:

Here are some popular key-value databases which are widely used:
 Couchbase: It permits SQL-style querying and searching for text.
 Amazon DynamoDB: The key-value database which is mostly used is Amazon
DynamoDB as it is a trusted database used by a large number of users. It can easily handle
a large number of requests every day and it also provides various security options.
 Riak: It is the database used to develop applications.
 Aerospike: It is an open-source and real-time database working with billions of exchanges.
 Berkeley DB: It is a high-performance and open-source database providing scalability.

Data Model In Neo4j

The data model in Neo4j organizes data using the concepts of nodes and relationships. Both
nodes and relationships can have properties, which store the data items associated with nodes
and relationships.
Nodes can have labels:
 A node can have zero, one, or several labels.
 The nodes that have the same label are grouped into a collection that identifies a subset of
the nodes in the database graph for querying purposes.
Relationships are directed, each relationship has a start node and end node as well as a
relationship type, which serves a similar role to a node label by identifying similar
relationships that have the same relationship type Properties can be specified via a map pattern,
which is made of one or more “name : value” pairs enclosed in curly brackets.
Example: (Lname: ‘Sharma’, Fname: ‘Nitin’, Minit : ‘B’).
The Neo4j graph data model resembles how data is represented in the ER and EER models
over the conventional graph theory.
Features of Neo4j Data Model:
 Labels and properties: A node label can be declared/specified when a node is created.
o It is also possible to create nodes without any labels.
 Relationships and relationship types: “->” specifies the direction of the relationship.
o The relationship can be traversed in either direction.
 Paths: A path specifies a traversal of part of the graph. Typically it is used as part of a
query to specify a pattern, where the query will retrieve from the graph data that matches
the pattern.
o A path is typically specified by a start node, followed by one or more
relationships, leading to one or more end nodes that satisfy the pattern.
 Optional Schema: Graphs can be created and used without a schema(Optional).
o The main features related to schema creation involve creating indexes and
constraints based on the labels and properties.
 Indexing and node identifiers: The Neo4j system creates an internal unique system-
defined identifier for each node, when a node is created. To retrieve individual nodes using
other properties of the nodes efficiently, the user can create indexes for the collection of
nodes that have a particular label.

3.3 RELATIONSHIPS:

Embedded documents: In this approach, related data is stored as nested documents within a
single parent document. For example, in a blog application, a post document may contain an
array of comment documents, with each comment containing a user ID to identify the author.
This approach is useful for data that is heavily read-oriented, and where the related data is
relatively small.
References: In this approach, related data is stored as a separate document and referenced from
the parent document using an ID or other unique identifier. For example, in a blog application, a
post document may contain a user ID that references a separate user document. This approach is
useful when the related data is relatively large, or when multiple documents reference the same
data.
Hybrid approach: This approach combines the benefits of both embedded documents and
references. For example, in a blog application, a post document may contain an array of
comment IDs that reference separate comment documents. When displaying a post with its
comments, the application can retrieve the comments using the IDs and embed them within the
post document for faster access.
It's important to note that in NoSQL databases, denormalization is often used to improve
performance by duplicating data across multiple documents. This is because NoSQL databases
typically prioritize query performance over data consistency, meaning that data may be
duplicated or updated in multiple places to improve performance. When modelling relationships
in NoSQL databases, it's important to carefully consider the trade-offs between data duplication,
query performance, and data consistency.
Relations are the crux of any database and relations in NoSQL databases are handled in a
completely different way compared to an SQL database

One to One Relation

One to one relation, as the name suggests requires one entity to have an exclusive relationship

with another entity and vice versa One user can have one account associated with him and one

account can have only one user associated with it.

One to many Relation

One to many relation, requires one entity to have an exclusive relationship with another entity but

the other entity can have relations with multiple other entities.
Many to many Relation

Many to many relation, doesn’t require any entity to have exclusive relations. Both entities can

have multiple relations.

3.4 NoSQL Graph Database

The NoSQL graph database is a technology for data management designed to handle very large
sets of structured, semi-structured or unstructured data. The semantic graph database (also
known as RDF triplestore) is a type of NoSQL graph database that is capable of integrating
heterogeneous data from many sources and making links between datasets. It focuses on the
relationships between entities and is able to infer new knowledge out of existing information.

The NoSQL (‘not only SQL’) graph database is a technology for data management designed to
handle very large sets of structured, semi-structured or unstructured data. It helps organizations
access, integrate and analyze data from various sources, thus helping them with their big data
and social media analytics.

This makes the graph database much more flexible, dynamic and lower-cost in integrating new
data sources than relational databases.
Compared to the moderate data velocity from one or few locations of the relational databases,
NoSQL graph databases are able to store, retrieve, integrate and analyze high-velocity data
coming from many locations.

Semantically Rich NoSQL Graph Database

The semantic graph database is a type of NoSQL graph database that is capable of integrating
heterogeneous data from many sources and making links between datasets.
The semantic graph database, also referred to as an RDF triplestore, focuses on the relationships
between entities and is able to infer new knowledge out of existing information. It is a powerful
tool to use in relationship-centered analytics and knowledge discovery.
In addition, the capability to handle massive datasets and the schema-less approach support the
NoSQL semantic graph database usage in real-time big data analytics.
 In relational databases, the need to have the schemas defined before adding new information
restricts data integration from new sources because the whole schema needs to be changed
anew.
 With the schema-less NoSQL semantic graph database with no need to change schemas
every time a new data source is about to be added, enterprises integrate data with less effort
and cost.

The semantic graph database stands out from the other types of graph databases with its ability to
additionally support rich semantic data schema, the so-called ontologies.
The semantic NoSQL graph database gets the best of both worlds: on the one hand, data is
flexible because it does not depend on the schema. On the other hand, ontologies give the
semantic graph database the freedom and ability to build logical models any way organizations
find it useful for their applications, without having to change the data.

3.5 Schema less database

A schemaless database manages information without the need for a blueprint. The onset of
building a schemaless database doesn’t rely on conforming to certain fields, tables, or data model
structures. There is no Relational Database Management System (RDBMS) to enforce any
specific kind of structure. In other words, it’s a non-relational database that can handle any
database type, whether that be a key-value store, document store, in-memory, column-oriented,
or graph data model. NoSQL databases’ flexibility is responsible for the rising popularity of a
schemaless approach and is often considered more user-friendly than scaling a schema or SQL
database.
A schemaless database, like MongoDB, does not have these up-front constraints, mapping to a
more ‘natural’ database. Even when sitting on top of a data lake, each document is created with a
partial schema to aid retrieval.

In schemaless databases, information is stored in JSON-style documents which can have varying sets of
fields with different data types for each field
Greater flexibility over data types

By operating without a schema, schemaless databases can store, retrieve, and query any data type
— perfect for big data analytics and similar operations that are powered by unstructured data.
Relational databases apply rigid schema rules to data, limiting what can be stored.

No pre-defined database schemas

The lack of schema means that your NoSQL database can accept any data type — including
those that you do not yet use. This future-proofs your database, allowing it to grow and change
as your data-driven operations change and mature.

No data truncation

A schemaless database makes almost no changes to your data; each item is saved in its own
document with a partial schema, leaving the raw information untouched. This means that every
detail is always available and nothing is stripped to match the current schema. This is particularly
valuable if your analytics needs to change at some point in the future.

Suitable for real-time analytics functions

With the ability to process unstructured data, applications built on NoSQL databases are better
able to process real-time data, such as readings and measurements from IoT sensors. Schemaless
databases are also ideal for use with machine learning and artificial intelligence operations,
helping to accelerate automated actions in your business.

Enhanced scalability and flexibility

With NoSQL, you can use whichever data model is best suited to the job. Graph databases allow
you to view relationships between data points, or you can use traditional wide table views with
an exceptionally large number of columns. You can query, report, and model information
however you choose. And as your requirements grow, you can keep adding nodes to increase
capacity and power.
When a record is saved to a relational database, anything (particularly metadata) that does not
match the schema is truncated or removed. Deleted at write, these details cannot be recovered at
a later point in time.

3.6 Materialized view

A materialized view takes the regular view described above and materializes it by proactively
computing the results and storing them in a “virtual” table.

A view can be “materialized” by storing the tuples of the view in the database. Index structures can
be built on the materialized view. Consequently, database accesses to the materialized view can be
much faster than recomputing the view. A materialized view is like a cache --- a copy of the data that
can be accessed quickly.

If a regular view is a saved query, a materialized view is a saved query plus its results stored as a
table.

There are a few important implications of a view being “materialized:”

1. When referenced in a query, a materialized view doesn’t need to be recomputed. — The

results are stored, so querying materialized views tends to be faster.
2. Because it’s stored as if it were a table, indexes can be built on the columns of a materialized
view.
3. A new problem of “view maintenance” arises. — Once a view is materialized, it is only
accurate until the underlying base relations are modified. The process of updating a
materialized view in response to these changes is called view maintenance.

In principle: Materialized views should update automatically. A “view” implies an anchored

perspective on changing inputs. Think back to how regular views work: results are constantly
changing as the underlying data changes. Materialization just implies that the transformation is done
proactively.

In defining a materialized view, we have given the database all the information necessary to
continually maintain the results as underlying data changes. Databases should keep materialized
views updated by default, but this has so far proven impossible to deliver on.

The limitations of materialized views in popular databases discussed above have historically made
them a relatively niche feature, used primarily as a way to cache the results of complex queries that
would bring down the database if run frequently as regular views. But if we set aside historic
limitations and think about the idea of materialized views: They give us the ability to define (using
SQL) any complex transformation of our data, and leave it to the database to maintain the results in a
“virtual” table.
Materialized views for analytics
The extract-load-transform (ELT) pattern where raw data is loaded in bulk into a warehouse and then
transformed via SQL typically relies on alternatives to materialized views for the transform step. In
dbt, these are referred to as materializations. A materialization can use a regular view (where nothing
is cached) or cached tables built from the results of a SELECT query, or an incrementally updated
table where the user is responsible for writing the update strategy. Historically, support for
materialized views in data warehouses has been so bad that SQL modelling services like dbt don’t
even have the syntax to allow users to create them. However, the dbt-materialize adapter allows dbt
users building on Materialize to use materialized views. Here’s the standard advice given to dbt users
on when to use the different types of materializations available to them:
1. If using a view isn’t too slow for your end-users, use a view.
2. If a view gets too slow for your end-users, use a table.
3. If building a table with dbt gets too slow, use incremental models in dbt.

Materialized views for applications

Incrementally updated materialized views can replace the caching and denormalization traditionally
done to “guard” OLTP databases from read-side latency and overload.

They do this by moving the computation work from read to write-side of your database: Instead of
waiting for a query and doing computation to get the answer, we are now asking for the query
upfront and doing the computation to update the results as the writes (creates, updates and deletes)
come in. This inverts the constraints of traditional database architectures, allowing developers to
build data-intensive applications without complex cache invalidation or denormalization.

3.7 Distribution Models

The primary driver of interest in NoSQLhas been its ability to run databases on a large cluster. As data
volumes increase, it becomes more difficult and expensive to scale up—buy a bigger server to run the
database on. A more appealing option is to scale out—run the database on a cluster of servers. Aggregate
orientation fits well with scaling out because the aggregate is a natural unit to use for distribution.

Depending on your distribution model, you can get a data store that will give you the ability to handle
larger quantities of data, the ability to process a greater read or write traffic, or more availability in the
face of network slowdowns or breakages. These are often important benefits, but they come at a cost.
Running over a cluster introduces complexity—so it’s not something to do unless the benefits are
compelling. Broadly, there are two paths to data distribution: replication and sharding. Replication takes
the same data and copies it over multiple nodes.

Sharding puts different data on different nodes. Replication and sharding are orthogonal techniques: You
can use either or both of them. Replication comes into two forms: master-slave and peer-to-peer. We will
now discuss these techniques starting at the simplest and working up to the more complex: first single-
server, then master-slave replication, then sharding, and finally peer-to-peer replication.

Single Server
The first and the simplest distribution option is the one we would most often recommend—no distribution
at all. Run the database on a single machine that handles all the reads and writes to the data store. We
prefer this option because it eliminates all the complexities that the other options introduce; it’s easy for
operations people to manage and easy for application developers to reason about. Although a lot of
NoSQLdatabases are designed around the idea of running on a cluster, it can make sense to use
NoSQLwith a single-server distribution model if the data model of the NoSQL store is more suited to the
application.

Graph databases are the obvious category here—these work best in a single-server configuration. If your
data usage is mostly about processing aggregates, then a single-server document or key-value store may
well be worthwhile because it’s easier on application developers. For the rest of this chapter we’ll be
wading through the advantages and complications of more sophisticated distribution schemes. Don’t let
the volume of words fool you into thinking that we would prefer these options. If we can get away
without distributing our data, we will always choose a single-server approach

Sharding
Often, a busy data store is busy because different people are accessing different parts of the dataset. In
these circumstances we can support horizontal scalability by putting different parts of the data onto
different servers—a technique that’s called sharding
Sharding puts different data on separate nodes, each of which does its own reads and writes. In the ideal
case, we have different users all talking to different server nodes. Each user only has to talk to one server,
so gets rapid responses from that server. The load is balanced out nicely between servers—for example, if
we have ten servers, each one only has to handle 10% of the load. Of course the ideal case is a pretty rare
beast.

In order to get close to it we have to ensure that data that’s accessed together is clumped together on the
same node and that these clumps are arranged on the nodes to provide the best data access. The first part
of this question is how to clump the data up so that one user mostly gets her data from a single server.
This is where aggregate orientation comes in really handy. The whole point of aggregates is that we
design them to combine data that’s commonly accessed together—so aggregates leap out as an obvious
unit of distribution. When it comes to arranging the data on the nodes, there are several factors that can
help improve performance. If you know that most accesses of certain aggregates are based on a physical
location, you can place the data close to where it’s being accessed.

If you have orders for someone who lives in Boston, you can place that data in your eastern US data
center. Another factor is trying to keep the load even. This means that you should try to arrange
aggregates so they are evenly distributed across the nodes which all get equal amounts of the load. This
may vary over time, for example if some data tends to be accessed on certain days of the week—so there
may be domain-specific rules you’d like to use. In some cases, it’s useful to put aggregates together if you
think they may be read in sequence. The Bigtable paper [Chang etc.] described keeping its rows in
lexicographic order and sorting web addresses based on reversed domain names (e.g., com.martinfowler).
This way data for multiple pages could be accessed together to improve processing efficiency

Sharding is particularly valuable for performance because it can improve both read and write
performance. Using replication, particularly with caching, can greatly improve read performance but does
little for applications that have a lot of writes. Sharding provides a way to horizontally scale writes.
Sharding does little to improve resilience when used alone. Although the data is on different nodes, a
node failure makes that shard’s data unavailable just as surely as it does for a single-server solution. The
resilience benefit it does provide is that only the users of the data on that shard will suffer; however, it’s
not good to have a database with part of its data missing.

With a single server it’s easier to pay the effort and cost to keep that server up and running; clusters
usually try to use less reliable machines, and you’re more likely to get a node failure. So in practice,
sharding alone is likely to decrease resilience. Despite the fact that sharding is made much easier with
aggregates, it’s still not a step to be taken lightly. Some databases are intended from the beginning to use
sharding, in which case it’s wise to run them on a cluster from the very beginning of development, and
certainly in production.

Other databases use sharding as a deliberate step up from a single-server configuration, in which case it’s
best to start single-server and only use sharding once your load projections clearly indicate that you are
running out of headroom. In any case the step from a single node to sharding is going to be tricky. We
have heard tales of teams getting into trouble because they left sharding to very late, so when they turned
it on in production their database became essentially unavailable because the sharding support consumed
all the database resources for moving the data onto new shards. The lesson here is to use sharding well
before you need to—when you have enough headroom to carry out the sharding.

Master-Slave Replication

With master-slave distribution, you replicate data across multiple nodes. One node is designated as the
master, or primary. This master is the authoritative source for the data and is usually responsible for
processing any updates to that data. The other nodes are slaves, or secondaries. A replication process
synchronizes the slaves with the master

Master-slave replication is most helpful for scaling when you have a read-intensive dataset. You can scale
horizontally to handle more read requests by adding more slave nodes and ensuring that all read requests
are routed to the slaves. You are still, however, limited by the ability of the master to process updates and
its ability to pass those updates on. Consequently it isn’t such a good scheme for datasets with heavy
write traffic, although offloading the read traffic will help a bit with handling the write load.

A second advantage of master-slave replication is read resilience: Should the master fail, the slaves can
still handle read requests. Again, this is useful if most of your data access is reads. The failure of the
master does eliminate the ability to handle writes until either the master is restored or a new master is
appointed. However, having slaves as replicates of the master does speed up recovery after a failure of the
master since a slave can be appointed a new master very quickly.

The ability to appoint a slave to replace a failed master means that master-slave replication is useful even
if you don’t need to scale out. All read and write traffic can go to the master while the slave acts as a hot
backup. In this case it’s easiest to think of the system as a single-server store with a hot backup. You get
the convenience of the single-server configuration but with greater resilience— which is particularly
handy if you want to be able to handle server failures gracefully.

In order to get read resilience, you need to ensure that the read and write paths into your application are
different, so that you can handle a failure in the write path and still read. This includes such things as
putting the reads and writes through separate database connections—a facility that is not often supported
by database interaction libraries. As with any feature, you cannot be sure you have read resilience without
good tests that disable the writes and check that reads still occur. Replication comes with some alluring
benefits, but it also comes with an inevitable dark side— inconsistency.

You have the danger that different clients, reading different slaves, will see different values because the
changes haven’t all propagated to the slaves. In the worst case, that can mean that a client cannot read a
write it just made. Even if you use master-slave replication just for hot backup this can be a concern,
because if the master fails, any updates not passed on to the backup are lost. We’ll talk about how to deal
with these issues later

3.8 Peer-to-Peer Replication

Master-slave replication helps with read scalability but doesn’t help with scalability of writes. It provides
resilience against failure of a slave, but not of a master. Essentially, the master is still a bottleneck and a
single point of failure. Peer-to-peer replication attacks these problems by not having a master. All the
replicas have equal weight, they can all accept writes, and the loss of any of them doesn’t prevent access
to the data store.
The prospect here looks mighty fine. With a peer-to-peer replication cluster, you can ride over node
failures without losing access to data. Furthermore, you can easily add nodes to improve your
performance. There’s much to like here—but there are complications. The biggest complication is, again,
consistency. When you can write to two different places, you run the risk that two people will attempt to
update the same record at the same time—a write-write conflict. Inconsistencies on read lead to problems
but at least they are relatively transient. Inconsistent writes are forever. We’ll talk more about how to deal
with write inconsistencies later on, but for the moment we’ll note a couple of broad options.

At one end, we can ensure that whenever we write data, the replicas coordinate to ensure we avoid a
conflict. This can give us just as strong a guarantee as a master, albeit at the cost of network traffic to
coordinate the writes. We don’t need all the replicas to agree on the write, just a majority, so we can still
survive losing a minority of the replica nodes. At the other extreme, we can decide to cope with an
inconsistent write. There are contexts when we can come up with policy to merge inconsistent writes. In
this case we can get the full performance benefit of writing to any replica. These points are at the ends of
a spectrum where we trade off consistency for availability

Combining Sharding and Replication

Replication and sharding are strategies that can be combined. If we use both master-slave replication and
sharding this means that we have multiple masters, but each data item only has a single master.
Depending on your configuration, you may choose a node to be a master for some data and slaves for
others, or you may dedicate nodes for master or slave duties

Using peer-to-peer replication and sharding is a common strategy for column-family databases. In a
scenario like this you might have tens or hundreds of nodes in a cluster with data sharded over them. A
good starting point for peer-to-peer replication is to have a replication factor of 3, so each shard is present
on three nodes. Should a node fail, then the shards on that node will be built on the other nodes
3.9 Consistency

One of the biggest changes from a centralized relational database to a cluster-oriented NoSQL database is
in how you think about consistency. Relational databases try to exhibit strong consistency by avoiding all
the various inconsistencies that we’ll shortly be discussing. Once you start looking at the NoSQLworld,
phrases such as “CAP theorem” and “eventual consistency” appear, and as soon as you start building
something you have to think about what sort of consistency you need for your system.
Consistency comes in various forms, and that one word covers a myriad of ways errors can creep into
your life. So we’re going to begin by talking about the various shapes consistency can take. After that
we’ll discuss why you may want to relax consistency (and its big sister, durability).

Update Consistency
We’ll begin by considering updating a telephone number. Coincidentally, Martin and Pramod are looking
at the company website and notice that the phone number is out of date. Implausibly, they both have
update access, so they both go in at the same time to update the number. To make the example interesting,
we’ll assume they update it slightly differently, because each uses a slightly different format. This issue is
called a write-write conflict: two people updating the same data item at the same time. When the writes
reach the server, the server will serialize them—decide to apply one, then the other.

Let’s assume it uses alphabetical order and picks Martin’s update first, then Pramod’s. Without any
concurrency control, Martin’s update would be applied and immediately overwritten by Pramod’s. In this
case Martin’s is a lost update. Here the lost update is not a big problem, but often it is. We see this as a
failure of consistency because Pramod’s update was based on the state before Martin’s update, yet was
applied after it. Approaches for maintaining consistency in the face of concurrency are often described as
pessimistic or optimistic.

A pessimistic approach works by preventing conflicts from occurring; an optimistic approach lets
conflicts occur, but detects them and takes action to sort them out. For update conflicts, the most common
pessimistic approach is to have write locks, so that in order to change a value you need to acquire a lock,
and the system ensures that only one client can get a lock at a time. So Martin and Pramod would both
attempt to acquire the write lock, but only Martin (the first one) would succeed. Pramod would then see
the result of Martin’s write before deciding whether to make his own update.

A common optimistic approach is a conditional update where any client that does an update tests the
value just before updating it to see if it’s changed since his last read. In this case, Martin’s update would
succeed but Pramod’s would fail. The error would let Pramod know that he should look at the value again
and decide whether to attempt a further update. Both the pessimistic and optimistic approaches that we’ve
just described rely on a consistent serialization of the updates.

With a single server, this is obvious—it has to choose one, then the other. But if there’s more than one
server, such as with peer-to-peer replication, then two nodes might apply the updates in a different order,
resulting in a different value for the telephone number on each peer. Often, when people talk about
concurrency in distributed systems, they talk about sequential

There is another optimistic way to handle a write-write conflict—save both updates and record that they
are in conflict. This approach is familiar to many programmers from version control systems, particularly
distributed version control systems that by their nature will often have conflicting commits. The next step
again follows from version control: You have to merge the two updates somehow.

Maybe you show both values to the user and ask them to sort it out—this is what happens if you update
the same contact on your phone and your computer. Alternatively, the computer may be able to perform
the merge itself; if it was a phone formatting issue, it may be able to realize that and apply the new
number with the standard format.

Any automated merge of write-write conflicts is highly domain-specific and needs to be programmed for
each particular case. Often, when people first encounter these issues, their reaction is to prefer pessimistic
concurrency because they are determined to avoid conflicts. While in some cases this is the right answer,
there is always a tradeoff. Concurrent programming involves a fundamental tradeoff between safety
(avoiding errors such as update conflicts) and liveness (responding quickly to clients). Pessimistic
approaches often severely degrade the responsiveness of a system to the degree that it becomes unfit for
its purpose.

This problem is made worse by the danger of errors—pessimistic concurrency often leads to deadlocks,
which are hard to prevent and debug. Replication makes it much more likely to run into write-write
conflicts. If different nodes have different copies of some data which can be independently updated, then
you’ll get conflicts unless you take specific measures to avoid them. Using a single node as the target for
all writes for some data makes it much easier to maintain update consistency. Of the distribution models
we discussed earlier, all but peer-to-peer replication do this.

Read Consistency

Having a data store that maintains update consistency is one thing, but it doesn’t guarantee that readers of
that data store will always get consistent responses to their requests. Let’s imagine we have an order with
line items and a shipping charge. The shipping charge is calculated based on the line items in the order. If
we add a line item, we thus also need to recalculate and update the shipping charge. In a relational
database, the shipping charge and line items will be in separate tables. The danger of inconsistency is that
Martin adds a line item to his order, Pramod then reads the line items and shipping charge, and then
Martin updates the shipping charge. This is an inconsistent read or read-write conflict

We refer to this type of consistency as logical consistency: ensuring that different data items make sense
together. To avoid a logically inconsistent read-write conflict, relational databases support the notion of
transactions. Providing Martin wraps his two writes in a transaction, the system guarantees that Pramod
will either read both data items before the update or both after the update. A common claim we hear is
that NoSQLdatabases don’t support transactions and thus can’t be consistent.

Such claim is mostly wrong because it glosses over lots of important details. Our first clarification is that
any statement about lack of transactions usually only applies to some NoSQL databases, in particular the
aggregate-oriented ones. In contrast, graph databases tend to support ACID transactions just the same as
relational databases. Secondly, aggregate-oriented databases do support atomic updates, but only within a
single aggregate.

This means that you will have logical consistency within an aggregate but not between aggregates. So in
the example, you could avoid running into that inconsistency if the order, the delivery charge, and the line
items are all part of a single order aggregate. Of course not all data can be put in the same aggregate, so
any update that affects multiple aggregates leaves open a time when clients could perform an inconsistent
read. The length of time an inconsistency is present is called the inconsistency window. A NoSQLsystem
may have a quite short inconsistency window:

As one data point, Amazon’s documentation says that the inconsistency window for its SimpleDB service
is usually less than a second This example of a logically inconsistent read is the classic example that
you’ll see in any book that touches database programming. Once you introduce replication, however, you
get a whole new kind of inconsistency. Let’s imagine there’s one last hotel room for a desirable event.
The hotel reservation system runs on many nodes. Martin and Cindy are a couple considering this room,
but they are discussing this on the phone because Martin is in London and Cindy is in Boston. Meanwhile
Pramod, who is in Mumbai, goes and books that last room. That updates the replicated room availability,
but the update gets to Boston quicker than it gets to London. When Martin and Cindy fire up their
browsers to see if the room is available, Cindy sees it booked and Martin sees it free. This is another
inconsistent read—but it’s a breach of a different form of consistency we call replication consistency

Eventually, of course, the updates will propagate fully, and Martin will see the room is fully booked.
Therefore this situation is generally referred to as eventually consistent, meaning that at any time nodes
may have replication inconsistencies but, if there are no further updates, eventually all nodes will be
updated to the same value. Data that is out of date is generally referred to as stale, which reminds us that a
cache is another form of replication—essentially following the master-slave distribution model. Although
replication consistency is independent from logical consistency, replication can exacerbate a logical
inconsistency by lengthening its inconsistency window. Two different updates on the master may be
performed in rapid succession, leaving an inconsistency window of milliseconds. But delays in
networking could mean that the same inconsistency window lasts for much longer on a slave

The presence of an inconsistency window means that different people will see different things at the same
time. If Martin and Cindy are looking at rooms while on a transatlantic call, it can cause confusion. It’s
more common for users to act independently, and then this is not a problem. But inconsistency windows
can be particularly problematic when you get inconsistencies with yourself. Consider the example of
posting comments on a blog entry. Few people are going to worry about inconsistency windows of even a
few minutes while people are typing in their latest thoughts. Often, systems handle the load of such sites
by running on a cluster and load-balancing incoming requests to different nodes. Therein lies a danger:
You may post a message using one node, then refresh your browser, but the refresh goes to a different
node which hasn’t received your post yet—and it looks like your post was lost. In situations like this, you
can tolerate reasonably long inconsistency windows, but you need readyour-writes consistency which
means that, once you’ve made an update, you’re guaranteed to continue seeing that update.

One way to get this in an otherwise eventually consistent system is to provide session consistency: Within
a user’s session there is read-your-writes consistency. This does mean that the user may lose that
consistency should their session end for some reason or should the user access the same system
simultaneously from different computers, but these cases are relatively rare. There are a couple of
techniques to provide session consistency. A common way, and often the easiest way, is to have a sticky
session: a session that’s tied to one node (this is also called session affinity). A sticky session allows you
to ensure that as long as you keep read-your-writes consistency on a node, you’ll get it for sessions too.
The downside is that sticky sessions reduce the ability of the load balancer to do its job. Another approach
for session consistency is to use version stamps (“Version Stamps,” p. 61) and ensure every interaction
with the data store includes the latest version stamp seen by a session. The server node must then ensure
that it has the updates that include that version stamp before responding to a request.

Maintaining session consistency with sticky sessions and master-slave replication can be awkward if you
want to read from the slaves to improve read performance but still need to write to the master. One way of
handling this is for writes to be sent the slave, who then takes responsibility for forwarding them to the
master while maintaining session consistency for its client. Another approach is to switch the session to
the master temporarily when doing a write, just long enough that reads are done from the master until the
slaves have caught up with the update. We’re talking about replication consistency in the context of a data
store, but it’s also an important factor in overall application design. Even a simple database system will
have lots of occasions where data is presented to a user, the user cogitates, and then updates that data. It’s
usually a bad idea to keep a transaction open during user interaction because there’s a real danger of
conflicts when the user tries to make her update, which leads to such approaches as offline locks

3.10Relaxing Consistency

Trading off consistency is a familiar concept even in single-server relational database systems. Here, our
principal tool to enforce consistency is the transaction, and transactions can provide strong consistency
guarantees. However, transaction systems usually come with the ability to relax isolation levels, allowing
queries to read data that hasn’t been committed yet, and in practice we see most applications relax
consistency down from the highest isolation level (serialized) in order to get effective performance. We
most commonly see people using the read-committed transaction level, which eliminates some read-write
conflicts but allows others. Many systems forgo transactions entirely because the performance impact of
transactions is too high.

We’ve seen this in a couple different ways. On a small scale, we saw the popularity of MySQL during the
days when it didn’t support transactions. Many websites liked the high speed of MySQL and were
prepared to live without transactions. At the other end of the scale, some very large websites, such as
eBay [Pritchett], have had to forgo transactions in order to perform acceptably— this is particularly true
when you need to introduce sharding. Even without these constraints, many application builders need to
interact with remote systems that can’t be properly included within a transaction boundary, so updating
outside of transactions is a quite common occurrence for enterprise applications.

NoSQL has faced criticism for not supporting transactions and ACID, hence it’s believed to be
unable to meet consistency requirements. However, this is not entirely correct because, as we saw
in previous articles, NoSQL does support atomic updates at the aggregate level. But, having said
that, transactions are still necessary, which is why modern solutions are polyglot, and architects
carefully choose databases for specific requirements; the one-size-fits-all era ended long ago.
Transactions do have limitations; even within transactional systems, human interventions are
sometimes required to resolve conflicts, and holding a single session for a long transaction is not
possible. Version stamps are a good option to handle such situations.
Once a client or user has made their updates, the transaction is usually open for a small window.
For the NoSQL environment, optimistic offline lock is a useful technique. The way to handle this
is by having the database contain some form of mechanism for records to have a version stamp.
Version stamps are commonly used in NoSQL databases and other distributed systems. They help
maintain data consistency and prevent conflicting modifications.

This technique changes every time the record is updated. In this technique, when a record is read,
the version stamp is noted, and the client ensures to update the version stamp with each write.
A version stamp is associated with each data item, such as a record or document, in the database.
It helps determine whether an item has been updated since it was last read. Whenever an item is
updated (modified, inserted, or deleted), its version stamp is also updated. For example, when a
user edits a record, the version stamp changes to reflect the new state of that record. Version
stamps play a crucial role in concurrency control. When multiple processes or users access the
same data concurrently, version stamps allow them to detect conflicts. Before updating an item, a
process can check its version stamp to see if it has changed since it was last read. If the version
stamp differs, it indicates that another update occurred in the meantime, and conflict resolution
strategies can be applied.

There are various ways to maintain this version stamp; let’s explore some of them below:
1. Counter — A counter can be used, which can be incremented when a record is updated.
Counters make it very easy to determine which version of the record is the latest. The flip
side of the coin is that the server needs to generate the counter, and a single primary is
required to ensure no duplication of counters.
2. GUID — GUIDs are large random numbers that are unique. These can be a combination
of dates, hardware info, and other sources of randomness. The advantage of using GUIDs
is that anyone can create them, and because of their randomness, they are unique. The flip
side of the coin in this case is that, since they are so large, they can’t be directly
compared to find the most recent record.
3. Hash — With a big enough hash key size, a content hash (such as SHA-1 or MD5) of the
content can be globally unique. The advantage is that any node will create the same hash
key for the same content, so no duplicates. However, like GUIDs, it’s hard to find the
most recent one because of their large size.
4. Timestamp — The usage of the timestamp of the last update. Similar to counters,
timestamps are short and easy to compare for recentness. There’s no need to have a single
master node to generate timestamps; multiple nodes can create timestamps, but the clocks
of each node need to be in sync for this to work properly. One node with an out-of-sync
clock can create bad data. There is a possibility of duplications in the case of too much
granulation of timestamps, meaning timestamps won’t be good if updates are frequently
happening at milliseconds; this will create duplicates:

One node with out of sync clock can create bad data.
There is possibility of duplications in case to too much granulation of timestamp meaning
timestamp won’t be good if the updates are frequently happening at milliseconds this will create
duplicates.
We can use a combination of these techniques. For example, a counter and hash key can be
combined to find unique and recent records. Remember that the choice of version stamp
technique depends on factors like system requirements, scalability, and consistency needs. Each
method has its trade-offs, so choose wisely!

Version stamping on multiple nodes

Version stamping works well in a single-node or primary and secondary system, where the
primary is responsible for maintaining the version stamp. Secondary nodes simply follow the
primary node. In the case of a peer-to-peer system, this system of version stamping needs to be
enhanced because version stamping is not happening in one place; every node in a peer-to-peer
system is stamping the version, which requires coordination and reconciliation.

If you receive updates from two nodes, and each one provides a different version, you have the
choice to get the latest version and discard the non-recent version, assuming that the non-recent
version node has not been able to get the update due to a network issue or something else.
Another scenario would be updates that are inconsistent from two nodes; in that case, you would
need a mechanism for reconciliation and conflict resolution.

A commonly used technique in peer-to-peer systems is using a special version stamp called a
vector stamp. Simply put, a vector stamp is a set of counters, one for each node. For example, a
vector stamp for three nodes (node1, node2, node3) would be [node1: 54, node2: 65, node3: 71].
Each time a node goes through an update, it updates its counter. For example, if node1 goes
through an update, the resulting vector stamp would be [node1: 55, node2: 65, node3: 71].
Whenever two nodes communicate, they synchronize their vector stamps. There are several
variations of how this synchronization happens.

Apart from version stamps, other options include vector clocks and version vectors.

Which version is recent

[node1:20 , node2: 21 , node3: 23] — — newer node2 counter is greater here rest remain same
[node1:20 , node2: 20 , node3: 23]
Write — write conflict
If both stamps have a counter greater than other
[node1:20 , node2: 21 , node3: 23]
[node1:21 , node2: 20 , node3: 23]
Missing value detection
[node1:20 , node3: 23]
[node1:20 , node2: 0 , node3: 23]

New node can be easily added without invalidating the existing vector stamp.

Vector stamp is important tool which help her spot inconsistencies but it dosen’t resolve it for
you. Conflict resolution would depend on the domain and business decision you working with.
This is part of consistency / latency tradeoffs. It’s the decision to make whether network partition
may make our system unavailable or we can detect and deal with the inconsistency.
I’m diving deeper into Designing Data-Intensive Applications and will be sharing insights on
specific whitepapers, concepts, and design patterns that capture my attention

NOSQL
No ratings yet
NOSQL
15 pages
NoSQL Databases for Tech Enthusiasts
No ratings yet
NoSQL Databases for Tech Enthusiasts
33 pages
NOSQL Concept 2
No ratings yet
NOSQL Concept 2
4 pages
Unit Ii Nosql Data Management
No ratings yet
Unit Ii Nosql Data Management
26 pages
Understanding NoSQL Databases
No ratings yet
Understanding NoSQL Databases
8 pages
Case Study On Different Nosql Data Models
No ratings yet
Case Study On Different Nosql Data Models
6 pages
No SQL
No ratings yet
No SQL
10 pages
Unit - 2
No ratings yet
Unit - 2
70 pages
Unit II Nosql Data Management
No ratings yet
Unit II Nosql Data Management
57 pages
Unit 2 BDA
No ratings yet
Unit 2 BDA
32 pages
BDA Module 5 - Part1 (No SQL) 2023
No ratings yet
BDA Module 5 - Part1 (No SQL) 2023
32 pages
Unit III (FSWD)
No ratings yet
Unit III (FSWD)
27 pages
Unit 2
No ratings yet
Unit 2
65 pages
BIG Data 2
No ratings yet
BIG Data 2
18 pages
Introduction To Nosql: What Is A Nosql Database Used For?
No ratings yet
Introduction To Nosql: What Is A Nosql Database Used For?
6 pages
Access Control Models in Nosql Databases: An Overview: Ashwaq A. Alotaibi, Reem M. Alotaibi Nermin Hamza
No ratings yet
Access Control Models in Nosql Databases: An Overview: Ashwaq A. Alotaibi, Reem M. Alotaibi Nermin Hamza
10 pages
BGD Mod 2 QB Solns
No ratings yet
BGD Mod 2 QB Solns
11 pages
NOSQL
No ratings yet
NOSQL
25 pages
What Is NoSQL
No ratings yet
What Is NoSQL
8 pages
NoSQL Databases Overview
No ratings yet
NoSQL Databases Overview
7 pages
Unit 2
No ratings yet
Unit 2
18 pages
Unit 5 - 230601 - 174540-1
No ratings yet
Unit 5 - 230601 - 174540-1
14 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Unit 6
No ratings yet
Unit 6
143 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
38 pages
Unit 1 (Iot)
No ratings yet
Unit 1 (Iot)
11 pages
No SQL DB
No ratings yet
No SQL DB
18 pages
No SQL
No ratings yet
No SQL
12 pages
Unit 2
No ratings yet
Unit 2
41 pages
Chapter 5: No SQL Data Management and Mongodb: Unit-2
No ratings yet
Chapter 5: No SQL Data Management and Mongodb: Unit-2
65 pages
NoSQL Databases: Overview & Benefits
No ratings yet
NoSQL Databases: Overview & Benefits
8 pages
Nosql Database: Abstract
No ratings yet
Nosql Database: Abstract
6 pages
Unit Ii
No ratings yet
Unit Ii
70 pages
Unit 2 (Big Data Analytics)
No ratings yet
Unit 2 (Big Data Analytics)
11 pages
Unit 3 Nosql Databases Adt
No ratings yet
Unit 3 Nosql Databases Adt
64 pages
Unit 2
No ratings yet
Unit 2
48 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
15 pages
Unit 5
No ratings yet
Unit 5
27 pages
Module 5 - Nosql
No ratings yet
Module 5 - Nosql
45 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
Full Stack-Unit-Iii
No ratings yet
Full Stack-Unit-Iii
56 pages
Non Relational Database-NoSQL
No ratings yet
Non Relational Database-NoSQL
4 pages
Data Manipulation at Scale
No ratings yet
Data Manipulation at Scale
8 pages
Aggregate Data Models Explained
No ratings yet
Aggregate Data Models Explained
1 page
Unit 3
No ratings yet
Unit 3
10 pages
Unit III
No ratings yet
Unit III
12 pages
Lecture 08
No ratings yet
Lecture 08
8 pages
Unit No 1
No ratings yet
Unit No 1
34 pages
Chapter 1 - Introducing Big Data & NoSQL
No ratings yet
Chapter 1 - Introducing Big Data & NoSQL
14 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
Comprehensive Guide to NoSQL Databases
No ratings yet
Comprehensive Guide to NoSQL Databases
1 page
NoSQL for Developers and IT Pros
No ratings yet
NoSQL for Developers and IT Pros
3 pages
Nosql Database
No ratings yet
Nosql Database
19 pages
NoSQL DATABSES
No ratings yet
NoSQL DATABSES
12 pages
NoSQL Database
No ratings yet
NoSQL Database
10 pages
Nosql Architecture Patterns
No ratings yet
Nosql Architecture Patterns
5 pages
No SQL
No ratings yet
No SQL
3 pages
Unit 1
No ratings yet
Unit 1
23 pages
Parallel and Distributed Systems
No ratings yet
Parallel and Distributed Systems
8 pages
Unit 1.1
No ratings yet
Unit 1.1
1 page
Safe Drinking Water and Toilet Facility in Public
No ratings yet
Safe Drinking Water and Toilet Facility in Public
7 pages
Naveen CSP
No ratings yet
Naveen CSP
48 pages
Machen e Learning
No ratings yet
Machen e Learning
9 pages
Sai Krishna Water Pollution
No ratings yet
Sai Krishna Water Pollution
49 pages
Monthly Status of Desktops, Laptops & Tablets in Power Point
No ratings yet
Monthly Status of Desktops, Laptops & Tablets in Power Point
2 pages
Sri Lanka Economic Crisis WORD
No ratings yet
Sri Lanka Economic Crisis WORD
1 page
Part-A Unit-3 Session 2
No ratings yet
Part-A Unit-3 Session 2
3 pages
Chazey Partners RPA - 2017
No ratings yet
Chazey Partners RPA - 2017
20 pages
CC UNIT 2 (Chapter-2)
No ratings yet
CC UNIT 2 (Chapter-2)
17 pages
Sagar's Resume
No ratings yet
Sagar's Resume
1 page
10 Coolest Jobs in Cybersecurity
No ratings yet
10 Coolest Jobs in Cybersecurity
1 page
Larman Chapter 12
No ratings yet
Larman Chapter 12
16 pages
LinkedIn Learning - How To Login
No ratings yet
LinkedIn Learning - How To Login
22 pages
REA Data Model for AIS Design
No ratings yet
REA Data Model for AIS Design
138 pages
Devart ODBCExact Target
No ratings yet
Devart ODBCExact Target
94 pages
DWM Question Bank
No ratings yet
DWM Question Bank
3 pages
AWS Cloud Services Overview
No ratings yet
AWS Cloud Services Overview
6 pages
4 - Data Visualization For Decison Making
100% (1)
4 - Data Visualization For Decison Making
64 pages
ERP & Business Process Analyst Expertise
No ratings yet
ERP & Business Process Analyst Expertise
1 page
Mumbai Customer Leads & Orders
No ratings yet
Mumbai Customer Leads & Orders
8 pages
Proposal ITT320
100% (1)
Proposal ITT320
11 pages
Cloud Computing File
No ratings yet
Cloud Computing File
13 pages
FYIT Dbms March14
No ratings yet
FYIT Dbms March14
4 pages
Lab - Manual: Operating System (6KS03)
No ratings yet
Lab - Manual: Operating System (6KS03)
33 pages
Chapter 4 Logical Database Design Logical DB Design For Relational
No ratings yet
Chapter 4 Logical Database Design Logical DB Design For Relational
48 pages
Top 25 Microservices Interview Questions and Answers For Experienced Professionals in 2025 - Medium
No ratings yet
Top 25 Microservices Interview Questions and Answers For Experienced Professionals in 2025 - Medium
6 pages
Django Web Development Guide
No ratings yet
Django Web Development Guide
139 pages
Instruction On How To Install SAP IDES 4 7
No ratings yet
Instruction On How To Install SAP IDES 4 7
55 pages
SAS94 M6 30SEP2021 Win X64 WRKSTN SRV
No ratings yet
SAS94 M6 30SEP2021 Win X64 WRKSTN SRV
3 pages
Zimbra Email Policy - Password Reset Note - V1
No ratings yet
Zimbra Email Policy - Password Reset Note - V1
3 pages
Survey of Current Network Intrusion Detection Techniques
No ratings yet
Survey of Current Network Intrusion Detection Techniques
7 pages
Ping PDF
No ratings yet
Ping PDF
3 pages
Itil Exam
No ratings yet
Itil Exam
22 pages
Lab 2 Port Scanning
No ratings yet
Lab 2 Port Scanning
14 pages
EPP vs EDR vs XDR: Endpoint Security Guide
No ratings yet
EPP vs EDR vs XDR: Endpoint Security Guide
12 pages
Oracle Cloud Infrastructure FAQ Guide
No ratings yet
Oracle Cloud Infrastructure FAQ Guide
12 pages

Unit

Uploaded by

Unit

Uploaded by

UNIT – II

3.1 AGGREGATE DATA MODELS

A NoSQL Database, also known as a non SQL or non-relational Database is a non-tabular

Types of Aggregate Data Models

3. Column Family Model

3.2 Key-Value Data Model in NoSQL

A key-value data model or database is also referred to as a key-value store. It is a non-

Some examples of key-value databases:

Data Model In Neo4j

One to One Relation

account can have only one user associated with it.

have multiple relations.

3.4 NoSQL Graph Database

Semantically Rich NoSQL Graph Database

3.5 Schema less database

No pre-defined database schemas

Suitable for real-time analytics functions

Enhanced scalability and flexibility

3.6 Materialized view

There are a few important implications of a view being “materialized:”

1. When referenced in a query, a materialized view doesn’t need to be recomputed. — The

In principle: Materialized views should update automatically. A “view” implies an anchored

Materialized views for applications

3.7 Distribution Models

3.8 Peer-to-Peer Replication

Combining Sharding and Replication

Version stamping on multiple nodes

Which version is recent

You might also like