0% found this document useful (0 votes)

4 views70 pages

Bda Mod 3

The document provides an overview of NoSQL databases, highlighting their schema-less design, horizontal scalability, and ability to handle large volumes of diverse data types. It contrasts NoSQL with traditional relational databases, emphasizing the advantages and disadvantages of each, as well as the CAP theorem which outlines the trade-offs between consistency, availability, and partition tolerance. Additionally, it discusses various architectural patterns of NoSQL databases, including key-value, column-based, and document-based systems, along with their use cases and limitations.

Uploaded by

1032220135

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views70 pages

Bda Mod 3

Uploaded by

1032220135

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

BIG DATA ANALYTICS niketamoda@gmail.

com
UNIT 3: NO SQL
What is NoSQL?
Not Only SQL Schema-less Design
A versatile family of database Dynamic data structures that
technologies engineered to address eliminate the need for predefined
specialized data challenges that schemas, allowing developers to
traditional relational database iterate quickly and adapt seamlessly
models cannot efficiently handle. to evolving business requirements.

Horizontal Scalability
Architected to distribute workloads across multiple commodity servers,
enabling cost-effective scaling for high-volume applications without requiring
expensive hardware upgrades.
Evolution of Data Storage
1970s-1990s
Relational databases revolutionized enterprise computing with structured
SQL queries, ACID transactions, and normalized data models that became
industry standard.

2000s
The explosive growth of web applications strained traditional RDBMS
capabilities, revealing scalability challenges as data volumes grew
exponentially and global distribution became essential.

2010s-Present
NoSQL databases transformed the landscape with specialized solutions for
diverse workloads, offering horizontal scalability, flexible schemas, and
distributed architectures optimized for cloud environments.
Making Sense of NoSQL_ A guide for managers and the rest of us

NOT ONLY SQL

DBMS has the following
advantages

• ACID properties
• Designed for all purpose
• Strong consistency,
concurrency, recovery
• Standard Query language
(SQL)
ACID PROPERTIES

1) Atomicity:
It means if any operation is performed on the data, either it should be performed or
executed completely or should not be executed at all.
It further means that the operation should not break in between or execute partially.
In the case of executing operations on the transaction, the operation should be
completely executed and not partially.
ACID PROPERTIES

2) Consistency:
The word consistency means that the value should remain preserved always.
In DBMS, the integrity of the data should be maintained, which means if a change in
the database is made, it should remain preserved always.
In the case of transactions, the integrity of the data is very essential so that the
database remains consistent before and after the transaction.
The data should always be correct.
ACID PROPERTIES

3) Isolation: :
The term 'isolation' means separation.
In DBMS, Isolation is the property of a database where no data should affect the
other one and may occur concurrently.
In short, the operation on one database should begin when the operation on the first
database gets complete.
It means if two operations are being performed on two different databases, they may
not affect the value of one another.
In the case of transactions, when two or more transactions occur simultaneously, the
consistency should remain maintained.
ACID PROPERTIES

4) Durability:
Durability ensures the permanency of something.
In DBMS, the term durability ensures that the data after the successful execution of
the operation becomes permanent in the database.
The durability of the data should be so perfect that even if the system fails or leads to
a crash, the database still survives.
However, if gets lost, it becomes the responsibility of the recovery manager for
ensuring the durability of the database. For committing the values, the COMMIT
command must be used every time we make changes.
Making Sense of NoSQL_ A guide for managers and the rest of us

NOT ONLY SQL

Need

In real time data requirements are

changed a lot.

Data is readily available with Facebook,

Google, Twitter etc.

This data include the user information,

social graphs, geographic locations etc.

To provide the quality services to the

user we must be able to use the relevent
technology which can operate on this
data
NoSQL Key Characteristics

Horizontal Scalability
Seamlessly expand capacity by adding more servers to the cluster

Distributed Architecture
2 Intelligently partitions data across multiple nodes for redundancy and
performance

Schema Flexibility
Dynamically evolve data models without downtime or migrations
NoSQL Business Drivers Overview

Volume
Scale effortlessly to accommodate petabytes of structured and unstructured data

Velocity
Ingest and analyze real-time data streams with millisecond response times

Variability
Flexibly store and process heterogeneous data types without fixed schemas

Agility
Quickly adapt database structures to evolving business requirements and market
demands
Business Driver: Volume

2.5EB 40ZB 1B+

Daily Data Creation Projected Data Active Users
Volume Generating content across
Generated globally each Global data sphere will major social platforms
day, equivalent to 2.5 reach 40 zettabytes by every minute
quintillion bytes 2025
Business Driver: Velocity
Real-time Processing
Immediate analysis and action on streaming data without batching delays

Low Latency
Sub-millisecond response times enabling critical business decisions

Continuous Ingestion
Seamlessly handling billions of events per second across distributed
systems
Business Driver: Variability
Structured
Rigidly organized JSON and XML documents with well-defined fields, consistent schemas, and predictable nested hierarchies.

Semi-structured
Flexible data formats with dynamic schemas, optional fields, and extensible attribute sets that adapt to changing business requirements.

Unstructured
Raw content like text documents, media files, and IoT sensor streams that lack inherent organization but contain valuable insights when properly analyzed.
Business Driver: Agility
Rapid Development
Schema-less architecture eliminates migration bottlenecks, accelerating development cycles.

Development teams can implement new features with greater autonomy and reduced database
dependencies.

A/B Testing
Seamlessly maintain multiple concurrent data models without structural conflicts.

Evaluate performance metrics across different implementation strategies with minimal overhead.

MVP Launches
Accelerate time-to-market with flexible, evolution-ready data structures.

Continuously refine products based on customer feedback without database architecture

constraints.
Business Scalability with NoSQL

Cloud-Native Architecture
Engineered specifically for cloud environments with auto-scaling capabilities that
respond to traffic demands in real-time.

Horizontal Scaling
Seamlessly distribute workloads across additional servers to achieve linear
performance growth without application downtime.

Cost-Effective Operations
Optimize expenditure with precise resource allocation that automatically scales
up during peak periods and down during low-demand intervals.
Making Sense of NoSQL_ A guide for managers and the rest of us

NOT ONLY SQL

Advantage of NoSQL

• Good resource scalable

• Lower operation cost
• Supports semi-structure data
• No static schema
• Supports distributed computing
• Faster data processing
• Relatively simple data model

Disadvantage of NoSQL

• Not a defined standard

• Limited query capabilities
Making Sense of NoSQL_ A guide for managers and the rest of us

CAP / BREWER’S THEOREM

• It is very important to understand the limitations of
NoSQL database.

• NoSQL can not provide consistency and high

availability together.

• This was first expressed by Eric Brewer in CAP

Theorem.

• CAP theorem states that we can only achieve at most

two out of three guarantees for a database:
Consistency, Availability and Partition Tolerance.
Making Sense of NoSQL_ A guide for managers and the rest of us

CAP / BREWER’S THEOREM

• Consistency means that all nodes in the network see
the same data at the same data.

• Availability is a guarantee that every request receives

a response about whether it was successful or failed.

• Partition Tolerance is a guarantee that the system

continues to operate despite arbitrary message loss or
failure of part of the system.

• In other words, even if there is a network outage in

the data center and some of the computers are
unreachable, still the system continues to perform.
Making Sense of NoSQL_ A guide for managers and the rest of us

CAP / BREWER’S THEOREM

• Out of these three guarantees, no system can provide
more than 2 guarantees.

• Since in the case of a distributed systems, the

partitioning of the network is must, the tradeoff is
always between consistency and availability.

• RDBMS can provide only consistency but not

partition tolerance.

• MongoDB, HBASE and Redis can provide

Consistency and Partition tolerance.

• CouchDB, Cassandra and Dynamo guarantee only

availability but no consistency.
Making Sense of NoSQL_ A guide for managers and the rest of us

CAP / BREWER’S THEOREM

• Let us take a look at various scenarios or architectures of systems
to better understand the CAP theorem.

• The first one is RDBMs where Reading and writing of data

happens on the same machine.

• Such systems are consistent but not partition tolerant because if

this machine goes down, there is no backup.

• If one user is modifying the record, others would have to wait thus
compromising the high availability.
Making Sense of NoSQL_ A guide for managers and the rest of us

CAP / BREWER’S THEOREM

• The second diagram is of a system which has two machines.

• Only one machine can accept modifications while the reads

can be done from all machines.

• In such systems, the modifications flow from that one machine

to the rest.

• Such systems are highly available as there are multiple

machines to serve.

• Such systems are partition tolerant because if one machine

goes down, there are other machines available to take up that
responsibility.
Making Sense of NoSQL_ A guide for managers and the rest of us

CAP / BREWER’S THEOREM

• Since it takes time for the data to reach other machines from
the node A, the other machine would be serving older data.

• This causes inconsistency. Though the data is eventually going

to reach all machine and after a while, things are going to
okay.

• There we call such systems eventually consistent instead of

strongly consistent.

• This kind of architecture is found in Zookeeper and

MongoDB.
Making Sense of NoSQL_ A guide for managers and the rest of us

CAP / BREWER’S THEOREM

• In the third design of any storage system, we have one machine
similar to our first diagram along with its backup.

• Every new change or modification at A in the diagram is

propagated to the backup machine B.

• There is only one machine which is interacting with the readers

and writers.

• So, It is consistent but not highly available. If A goes down, B

can take A's place. Therefore this system is partition tolerant.

• Examples of such system we are HDFS having secondary

Namenode and even relational databases having a regular
backup.
Making Sense of NoSQL_ A guide for managers and the rest of us
COMPARISON BETWEEN SQL AND
NOSQL
NoSQL is a non-relational DBMS

It does not require a fixed schema, avoids joins, and

is easy to scale.

NoSQL database is used for distributed data stores

with huge data storage needs.

NoSQL is used for Big data and real-time web apps.

For example, companies like Twitter, Facebook,
Google that collect terabytes of user data every
single day.
Making Sense of NoSQL_ A guide for managers and the rest of us
COMPARISON BETWEEN SQL AND
NOSQL
Traditional RDBMS uses SQL syntax to store and
retrieve data for further insights.

NoSQL database system can store structured,

semi-structured, unstructured data.

NoSQL databases became popular with Internet

giants like Google, Facebook, Amazon, etc. who deal
with huge volumes of data.

The system response time becomes slow when you

use RDBMS for massive volumes of data.
Making Sense of NoSQL_ A guide for managers and the rest of us
COMPARISON BETWEEN SQL AND
NOSQL
To resolve this problem, we could "scale up" our
systems by upgrading our existing hardware. This
process is expensive.

The alternative for this issue is to distribute

database load on multiple hosts whenever the load
increases. This method is known as "scaling out.“
Making Sense of NoSQL_ A guide for managers and the rest of us
COMPARISON BETWEEN SQL AND May 2015
NOSQL
SQL NoSQL
Full form Structured query language Full form Not only SQL
Relational database Non relational database
SQL is declarative query language Non declarative query language
SQL database works on ACID properties NoSQL follow Cap theorem
Atomicity Consistency
Consistency Availability
Isolation Partition Tolerace
Durability
Structured and organized data Unstructured and replicable data
Relational database tables are used Key-value pair storage, Column Store, Document Store,
Graph Database
Tightly consistent Eventually consistent
MySQL, Oracle, MS SQL, PostgreSQL, SQLite, DB2 Mongo DB, Big Table, Neo4j, Couch DB, Cassandra,
HBase
Making Sense of NoSQL_ A guide for managers and the rest of us

NOSQL DATABASES TIME LINE

• 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source
relational database
• 2000- Graph database Neo4j is launched
• 2004- Google BigTable is launched
• 2005- CouchDB is launched
• 2007- The research paper on Amazon Dynamo is released
• 2008- Facebooks open sources the Cassandra project
• 2009- The term NoSQL was reintroduced
Making Sense of NoSQL_ A guide for managers and the rest of us

FEATURES OF NOSQL
Non Relational

• NoSQL databases never follow the

relational model
• Never provide tables with flat fixed-column
records
• Work with self-contained aggregates or
BLOBs (Binary large object)
• Doesn't require object-relational mapping
and data normalization
• No complex features like query languages,
query planners, referential integrity joins,
ACID
Making Sense of NoSQL_ A guide for managers and the rest of us

FEATURES OF NOSQL
Schema-Free

• NoSQL databases are either schema-free or

have relaxed schemas

• Do not require any sort of definition of the

schema of the data

• Offers heterogeneous structures of data in

the same domain
Making Sense of NoSQL_ A guide for managers and the rest of us

FEATURES OF NOSQL
There are mainly four categories of NoSQL
databases.

Each of these categories has its unique

attributes and limitations.

No specific database is better to solve all

problems.

You should select a database based on your

product needs.
DIFFERENT
ARCHITECTURAL
PATTERS ON
1. Key Value Pair Based
NOSQL
This is very simple NoSQL database

Data is stored in key/value pairs.

It is designed to store as schema free data.

Such data is stored along with indexed key.

Example: Cassandra, DynamoDB, Azure

Table Storage

IT DEC 2016
DIFFERENT
Making Sense of NoSQL_ A guide for managers and the rest of us

ARCHITECTURAL
PATTERS
Use Case: ON NOSQL
This type is generally used when you need
quick performance for basic Create – Read –
Update – Delate operations.

Example:

Storing and retrieving session information for

Web pages
Storing user profile and preferences
Storing shopping cart data for ecommerce
DIFFERENT
ARCHITECTURAL
PATTERS ON NOSQL
Limitations:

It may not work for complex queries

attempting to connect multiple relations

If data contain many to many relationship a

key value pair is likely to show poor
performance
DIFFERENT
ARCHITECTURAL
PATTERS
Column-based ON NOSQL
Database

Column-oriented databases work on columns and

are based on BigTable paper by Google.

They deliver high performance on aggregation

queries like SUM, COUNT, AVG, MIN etc. as the
data is readily available in a column.

Instead of storing data in relational tuples (rows),

it is stored in cells grouped in columns
Making Sense of NoSQL_ A guide for managers and the rest of us
DIFFERENT ARCHITECTURAL
PATTERS ON NOSQL
Example:
HBase, Hyper Table, Big Table.

Use Cases:
It is used for storing blogs
Making Sense of NoSQL_ A guide for managers and the rest of us
DIFFERENT ARCHITECTURAL
PATTERS
Document-based ON NOSQL
database:

Document-Oriented NoSQL DB works on key value storage where document

contain lot of complex data.

The document is stored in JSON or XML formats. The value is understood by the
DB and can be queried.
JSON (JavaScript Object Notation) is a lightweight data-interchange format.

Every document contains a unique key to retrieve the document

Key is used for storing, retrieving and managing document – oriented information
Making Sense of NoSQL_ A guide for managers and the rest of us

DIFFERENT ARCHITECTURAL
PATTERS ON NOSQL
Eg: Amazon SimpleDB, CouchDB, MongoDB, Lotus Notes are popular Document
originated DBMS systems.

Use Cases:

Used for storing Event logging information and online blogging

All document would contain information about type of document, userid, post
content, timestamp etc.

Limitation:
It may be good for blogging but not good for aggregation.
DIFFERENT
ARCHITECTURAL
PATTERS ON NOSQL
Graph-Based

A graph type database stores entities as well the

relations amongst those entities.

The entity is stored as a node with the relationship as

edges.

An edge gives a relationship between nodes. Every

node and edge has a unique identifier.

Compared to a relational database where tables are

loosely connected, a Graph database is a
multi-relational in nature.
DIFFERENT
ARCHITECTURAL
PATTERS
Example:
ON NOSQL
Neo4J, Infinite Graph, OrientDB, FlockDB are some
popular graph-based databases.

Use cases:
Very important application is social networking site,
it can quickly locate friend, friends of friends
Google maps useful for navigarion and finding the
closest location
Making Sense of NoSQL_ A guide for managers and the rest of us
DIFFERENT ARCHITECTURAL
PATTERS ON NOSQL
Database Model Performanc Scalability Flexibility
e
Key value store database High High High
Column store database High High Moderate
Document store database High Variable High
Graph database Variable Variable High
Making Sense of NoSQL_ A guide for managers and the rest of us

ADVANTAGES OF NOSQL
1. Growth of Big Data
1. Big data is one of the main driving factor of
NoSQL for business
2. Web data has increased exponentially within last
two years
2. Continuous Availability of data
1. Hardware failure are possible but NoSQL is built
on distributed architecture which is robust.
2. If data node goes down we have replication factor,
if name node goes down we have secondary name
node
3. Location Independent
1. It is ability to read and write the database from
anywhere
Making Sense of NoSQL_ A guide for managers and the rest of us

ADVANTAGES OF NOSQL
1. Flexible data models
1. NoSQL has more flexible data models as
compared to others which is schema less
2. Better Architecture
1. NoSQL has more business oriented architecture
for a particular application
2. Organizations migrate their data to NoSQL
platform which allows them to maintain very
volume of data
3. Analytics and Business Intelligence
1. Extracting meaningful information from vey high
volume of data is very difficult task for RDBMS
2. Modern NoSQL provides integrated data analysis
and better understanding of complex data sets
which facilitate flexible decision-making.
IMPLEMENTATION OF KEY
VALUE DATABASE
This is very simple NoSQL
It is designed for storing data as schema free
In this data is stored in the form of indexed key
Key: 1 ID: 123 First Name: Ganesh

Key: 2 Email: abc@gmail.com Location: Mumbai Pin: 401209

Key: 3 Facebook ID: xyz Password: ******* Name: Tom

Working
The schema less format of a key value database is required for data storage needs.
The key can be auto generated while the value can be string
IMPLEMENTATION OF KEY
VALUE DATABASE
Key value uses a hash table in which there exists a
Unique key and pointer to each data item

The logical group of keys is known as bucket

It will improve the performance because of cache mechanism

Read Write values

• Get(key): It will return the value associated with the key

• Multi-get(key1, key2, …, keyN): It will return the list of values associated with
the
key
• Put(key, value): It will associate the value with the key
• Delete(key): it will delete entry for the key from the data store
Alex Homes, Hadoop in Practice 2nd Edition Page 28

COLUMN STORE DATA

Instead of storing the data in in rows, it stored in cells
grouped in columns
It offers high performance and high scalability

Working:
• In column-oriented NoSQL database, data is stored
in cells grouped in columns rather than rows
• Read and write is done using columns
• It offers fast search and access of data &
Aggregation.
Alex Homes, Hadoop in Practice 2nd Edition Page 28

COLUMN STORE DATA

Data Model:
• Column Family: Single structure that can group
Columns

• Example:
• Hbase, BigTable, Hyper Table
Alex Homes, Hadoop in Practice 2nd Edition Page 28

COLUMN STORE DATA

Document Based:
• It is based on the concept of key value store where
“documents” contain a lot of complex data.
• Every document contain a unique key used to
retrieve a document
• Key is used for managing, storing and retrieving
document oriented information

Working:
This type of data is collection of key-value pair where
value is a compressed document
JSON and XML are commonly used documents

Eg: MongoDB , CoutchDB

Alex Homes, Hadoop in Practice 2nd Edition Page 28

COLUMN STORE DATA

Graph Database:
• Data is stored I graph and their
relationship are stored as a link where as
entity act as a node.

Working:
In this a flexible graphical notation is used
with edges and nodes
Data can be easily transformed from one
model to another model by using graph
based NoSQL database

Eg: Neo4j, Polyglot

SQL CASE STUDIES
AmazonDB

It has the largest ecommerce operations in the world

Customers across the globe shop 24*7
Initially Amazon used RDBMS systems for shopping and checkout system
Amazon DynamoDB a NoSQL brought a turning point
DynamoDB addresses the core problem of RDBMS scalability and partition
tolerance
Developers can store unlimited amount of data by creating a database table
DynamoDB saves the table in multiple servers
DynamoDB is a Key-Value store NoSQL

Salient features of key-value store are as follows:

Scalable: If the application requirement changes, AWS management console can

SQL CASE STUDIES
Automated storage scaling: More storage can be obtained when ever more storage
is required.

Built-in fault tolerance: DynamoDB automatically replicates data across various

nodes

Flexible: DynamoDB has a schema free format. Multiple data types can be used

Efficient Indexing: Every item is defined by a primary key. It allows secondary

indexes on non key attributes.

Strong Consistency: DynamoDb ensures strong consistency on reads (reads only

the latest value)

Secure: DynamoDB used cryptography to authenticate users

SQL CASE STUDIES
Google Big Table:

Motivation for developing BigTable is to achieve massive scalability, better

performance and ability to run commodity hardware.

The volume of Google data is generally in petabytes and is distributed over

1,00,000 nodes

Big table is column based NoSQL.

SQL CASE STUDIES
MongoDB

MongoDB was designed by Eliot Horowitz

Mongo DB was designed for building large scale, high availability, robust systems
MongoDB changed the transformed the relational data to document based data to
manage speed, agility, schema less databases
MongoDB is a document data model that stores data in JSON document
Making Sense of NoSQL_ A guide for managers and the rest of us

DATATYPES USED IN MONGO DB

Data Types Description
String String is the most commonly used datatype. It is used to store data. A string must be UTF 8 valid in
mongodb.
Integer Integer is used to store the numeric value. It can be 32 bit or 64 bit depending on the server you are
using.
Boolean This datatype is used to store boolean values. It just shows YES/NO values.

Double Double datatype stores floating point values.

Min/Max Keys This datatype compare a value against the lowest and highest bson elements.

Arrays This datatype is used to store a list or multiple values into a single key.

Object Object datatype is used for embedded documents.

Null It is used to store null values.
Symbol It is generally used for languages that use a specific type.
Date This datatype stores the current date or time in unix time format. It makes you possible to specify your
own date time by creating object of date and pass the value of date, month, year into it.
SQL CASE STUDIES
Neo4j

Neo4j is open source sponsored by Neo Technologies

It is graph based NoSQL which is implemented in Java and Scala
Its development was started in 2003 and was made public in 2007
Neo4j is used by many organizations for scientific research, routing, matchmaking,
network management, recommendations, social networks, software analytics and
project management
SHARED NOTHING”
ARCHITECTURE
A Shared Nothing Architecture is one
in which you have a number of nodes.

These nodes do not share resources like

memory or storage with any one.

On the other hand One Alternative

Architecture shares every resource
when requested.
SHARED NOTHING”
ARCHITECTURE
Advantages of shared nothing
architecture:
• easier scaling
• non-disruptive upgrades
• elimination of a single point of
failure self-healing capabilities.
SHARED NOTHING”
ARCHITECTURE
Scaling becomes simpler when things
such as disks are not shared.

For example, scaling up a single shared

disk to get more storage space can lead
to enormous problems if things do not
go well.

On the other hand, if you are using

several nodes that do not share the
space, scaling up the disk space
becomes quite a bit easier.
SHARED NOTHING”
ARCHITECTURE
If the scaling should fail on one of the
resources, the others will still continue
to do their work normally.

“This architecture is followed by

essentially all high-performance,
scalable, DBMSs, including Teradata,
Netezza, Greenplum, as well as several
Morpheus integrations.

It is also used by most of the high-end

e-commerce platforms, including
Amazon, Akamai, Yahoo, Google, and
Facebook.”
SHARED NOTHING”
ARCHITECTURE
Enables Non-disruptive Upgrades

Similar to the scaling advantages, you

can use shared nothing architecture to
perform non-disruptive upgrades to
your services.

Instead of having a certain amount of

downtime while you are upgrading an
infrastructure with shared resources,
you can upgrade a node at a time.

The redundancy in the other nodes will

continue to run so that you do not need
to shut everything down for the
amount of time it takes to perform the
SHARED NOTHING”
ARCHITECTURE
Eliminates Single Point of Failure

With shared systems, a single point of

failure can take down your site or app
entirely.

As noted, the ability to have separate

systems on separate nodes with
redundancy can make things much
easier while avoiding the disaster of a
single failure causing unexpected
downtime.
SHARED NOTHING”
ARCHITECTURE
Avoids Unexpected Downtime

Shared Nothing architecture allows for

some amount of self-healing that can
be another line of defense against
unexpected downtime.

For example, when you have

redundant copies of data or databases
on different disks, a disk that loses data
may be able to recover it when the
redundancies are synced.
SHARED NOTHING”
ARCHITECTURE
Had it instead been a single, shared
disk, the data would be lost and
downtime would be indefinite.

As you can see, shared nothing

architecture can be very helpful.
DISTRIBUTION MODELS
NoSQL has its ability to run databases on a large cluster.

The ability to process a greater read or write traffic, or more availability in the
face of network slowdowns or breakages.

There are two paths to data distribution: Replication and Sharding.

• Replication: Replication takes the same data and copies it over multiple nodes.
• Sharding: Sharding is a method for storing data across multiple machines.

Replication comes into two forms:

• Master-slave
• Peer-to-peer.
DISTRIBUTION MODELS
NoSQL has its ability to run databases on a large cluster.

The ability to process a greater read or write traffic, or more availability in the
face of network slowdowns or breakages.

There are two paths to data distribution: Replication and Sharding.

• Replication: Replication takes the same data and copies it over multiple nodes.
• Sharding: Sharding is a method for storing data across multiple machines.

Replication comes into two forms:

• Master-slave
• Peer-to-peer.
SHARDING
In a busy data store different people are accessing different
parts of the dataset.

In these circumstances we can support horizontal

scalability by putting different parts of the data onto
different servers—a technique that’s called sharding.

The load is balanced out nicely between servers—for

example, if we have ten servers, each one only has to handle
10% of the load.

In order to do it we have to ensure that data that’s accessed

together is clumped together on the same node and that
these clumps are arranged on the nodes to provide the best
data access.
SHARDING
The first part of this question is how to clump the data up
so that one user mostly gets her data from a single server.

When it comes to arranging the data on the nodes, there

are several factors that can help improve performance.

If you know that most accesses of certain aggregates are

based on a physical location, you can place the data close to
where it’s being accessed. If you have orders for someone
who lives in Boston, you can place that data in your eastern
US data center.
MASTER-SLAVE REPLICATION
With master-slave distribution, you replicate data
across multiple nodes.

One node is designated as the master, or primary.

This master is the authoritative source for the data
and is usually responsible for processing any updates
to that data. The other nodes are slaves, or
secondaries.
PEER-TO-PEER REPLICATION
Master-slave replication helps with read scalability
but doesn’t help with scalability of writes.

Peer-to-peer replication attacks these problems by not

having a master.

All the replicas have equal weight, they can all accept
writes, and the loss of any of them doesn’t prevent
access to the data store.

No SQL
No ratings yet
No SQL
19 pages
Module 3 NOSQL
No ratings yet
Module 3 NOSQL
69 pages
2 - NoSQL
No ratings yet
2 - NoSQL
32 pages
Module 2
No ratings yet
Module 2
100 pages
Module 2
No ratings yet
Module 2
104 pages
Bda Module 3
No ratings yet
Bda Module 3
20 pages
Bda Module 3
No ratings yet
Bda Module 3
35 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
Intro To NoSQL DBs
No ratings yet
Intro To NoSQL DBs
44 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
22 pages
2.1 Nosql
No ratings yet
2.1 Nosql
25 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
13 pages
Nosql
No ratings yet
Nosql
20 pages
NoSQL for Tech Professionals
No ratings yet
NoSQL for Tech Professionals
29 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
NoSQL Notes
No ratings yet
NoSQL Notes
11 pages
Unit VI - 1
No ratings yet
Unit VI - 1
31 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Database Management Systems: UNIT-5: Nosql Databases
No ratings yet
Database Management Systems: UNIT-5: Nosql Databases
39 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
No SQL
No ratings yet
No SQL
13 pages
NoSQL for Tech Professionals
No ratings yet
NoSQL for Tech Professionals
30 pages
Unit 4 Cap Mongodb
No ratings yet
Unit 4 Cap Mongodb
23 pages
Unit 4
No ratings yet
Unit 4
47 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
CAP Theorem vs ACID in Databases
100% (1)
CAP Theorem vs ACID in Databases
22 pages
No SQL
No ratings yet
No SQL
109 pages
No SQL
No ratings yet
No SQL
12 pages
NoSQL for Software Engineers
No ratings yet
NoSQL for Software Engineers
84 pages
DBMS - Unit 6 (Advances in Databases)
No ratings yet
DBMS - Unit 6 (Advances in Databases)
19 pages
Module 3 - NoSQL
No ratings yet
Module 3 - NoSQL
53 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
NoSQL vs. Cloud Data Storage Systems
No ratings yet
NoSQL vs. Cloud Data Storage Systems
17 pages
Cassandra: Types of Nosql Databases
No ratings yet
Cassandra: Types of Nosql Databases
6 pages
RK NoSQL
No ratings yet
RK NoSQL
35 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
41 NoSQL Introduction
No ratings yet
41 NoSQL Introduction
18 pages
Module 2.3
No ratings yet
Module 2.3
25 pages
NoSQL Databases Overview
No ratings yet
NoSQL Databases Overview
8 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
BDA Unit2 Complete
No ratings yet
BDA Unit2 Complete
56 pages
BDS Session 10
No ratings yet
BDS Session 10
70 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
NGD Unit 1-4
No ratings yet
NGD Unit 1-4
43 pages
Big Data Analysis
No ratings yet
Big Data Analysis
9 pages
DBMS Lecture13 NoSQL
No ratings yet
DBMS Lecture13 NoSQL
31 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
NoSQL
No ratings yet
NoSQL
18 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
1504846528session31 NoSQL
No ratings yet
1504846528session31 NoSQL
12 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
Untitled Document
No ratings yet
Untitled Document
30 pages
On Introdution To NoSQL
No ratings yet
On Introdution To NoSQL
56 pages
Intro No SQL
No ratings yet
Intro No SQL
44 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
Module 1
No ratings yet
Module 1
69 pages
Abdms-Unit 2 and Unit 5 Notes
No ratings yet
Abdms-Unit 2 and Unit 5 Notes
10 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
8 pages
(Splunk Case Study) (Splunk Case Study)
No ratings yet
(Splunk Case Study) (Splunk Case Study)
16 pages
3.1-8 Windows Server Backup - Copy (Full Permission)
No ratings yet
3.1-8 Windows Server Backup - Copy (Full Permission)
19 pages
WireGuard Setup Guide for IT Pros
No ratings yet
WireGuard Setup Guide for IT Pros
24 pages
Website Vulnerability Scanner Report (Light)
No ratings yet
Website Vulnerability Scanner Report (Light)
6 pages
Android Fire Alarm App Project Report
No ratings yet
Android Fire Alarm App Project Report
88 pages
Assignment Week 6 Group
No ratings yet
Assignment Week 6 Group
8 pages
LabCycle DBD PDF
No ratings yet
LabCycle DBD PDF
11 pages
The Design and Implementation of Visitor Recognition and Reminder System Based On Face Recognition
No ratings yet
The Design and Implementation of Visitor Recognition and Reminder System Based On Face Recognition
4 pages
G - TM - AIX Basic - R1.0
No ratings yet
G - TM - AIX Basic - R1.0
214 pages
FP Growth Algorithm
No ratings yet
FP Growth Algorithm
17 pages
SQL Helper
No ratings yet
SQL Helper
5 pages
Chapter 2 Data Model
No ratings yet
Chapter 2 Data Model
75 pages
Gym Management System
No ratings yet
Gym Management System
27 pages
R3 Corda Resp
No ratings yet
R3 Corda Resp
37 pages
SAP S/4HANA Asset Management Guide
No ratings yet
SAP S/4HANA Asset Management Guide
4 pages
Dbms Chapter 6
No ratings yet
Dbms Chapter 6
40 pages
Kubernetes Roadmap
No ratings yet
Kubernetes Roadmap
9 pages
Java Tag Handling & Patterns Guide
No ratings yet
Java Tag Handling & Patterns Guide
7 pages
Azure Virtual Machines
100% (1)
Azure Virtual Machines
1 page
Salesforce Architect 'S Tuning & Best Practises Guide
No ratings yet
Salesforce Architect 'S Tuning & Best Practises Guide
10 pages
Connect : With Other CRM Customers
No ratings yet
Connect : With Other CRM Customers
44 pages
Inventory Management System Guide
No ratings yet
Inventory Management System Guide
34 pages
Application Access To People Data/Hub Process Flow: Identityaccessrequ Ests@yale - Edu
No ratings yet
Application Access To People Data/Hub Process Flow: Identityaccessrequ Ests@yale - Edu
1 page
Introduction To Django
No ratings yet
Introduction To Django
10 pages
DBMS One Mark Questions-New
75% (8)
DBMS One Mark Questions-New
60 pages
Processing XML With AWS Glue and Databricks Spark
No ratings yet
Processing XML With AWS Glue and Databricks Spark
23 pages
Cyber Security
100% (1)
Cyber Security
36 pages
Rais12 SM CH08
No ratings yet
Rais12 SM CH08
27 pages
JavaScript Basics and Examples
No ratings yet
JavaScript Basics and Examples
20 pages
Software Development Models
No ratings yet
Software Development Models
7 pages