MERN - Unit 2
MERN - Unit 2
Example Usage:
Here’s a simple example of how you can use Material UI to create a
button in a React application:
npm install @mui/material @emotion/react @emotion/styled
function MyApp() {
return (
<div>
<Button variant="contained" color="primary">
Click Me
</Button>
</div>
);
}
2. AppBar
To use the AppBar from Material UI in a basic React application, you can follow these steps:
Here's a simple example of how to use the AppBar component in a React app:
return (
<AppBar position="static">
<Toolbar>
My Website
</Typography>
<Button color="inherit">Login</Button>
</Toolbar>
</AppBar>
);
Now you can include this MyAppBar component in your main App.js or
wherever you want the AppBar to appear in your application:
function App() {
return (
<div>
<MyAppBar />
</div>
);
}
First, you need to install Material UI in your project if you haven't already done so:
function MyToolbar() {
return (
<AppBar position="static">
<Toolbar>
<IconButton
size="large"
edge="start"
color="inherit"
aria-label="menu"
sx={{ mr: 2 }}
>
<MenuIcon />
</IconButton>
My Toolbar
</Typography>
<Button color="inherit">Login</Button>
</Toolbar>
</AppBar>
);
function App() {
return (
<div>
<MyToolbar />
</div>
);
4. SQL Transactions
An SQL transaction is a sequence of one or more SQL operations (such as
INSERT, UPDATE, DELETE, etc.) that are executed as a single unit of work.
Transactions are primarily used to ensure data integrity, consistency, and reliability
in a database, particularly in systems where multiple users or processes are
accessing and modifying the database concurrently.
BEGIN TRANSACTION;
2. COMMIT: Commits the transaction, meaning all the changes made during the
transaction are saved to the database permanently.
COMMIT;
3. ROLLBACK: Reverts the database to the state before the transaction began. This is used
when something goes wrong and you want to cancel the transaction.
ROLLBACK;
4. SAVEPOINT: Allows you to set a point within a transaction to which you can later
rollback.
SAVEPOINT savepoint_name;
BEGIN TRANSACTION;
UPDATE accounts
WHERE account_id = 1;
UPDATE accounts
WHERE account_id = 2;
COMMIT;
Banking systems: Transferring money between accounts involves multiple steps, and all
must succeed, or none should.
E-commerce: Managing an order in an online shopping system (e.g., updating inventory,
charging the customer) requires consistency across different operations.
Booking systems: Reserving tickets or appointments must be atomic to avoid double
bookings.
In MongoDB, we can insert documents with different structures into the same collection without
any issues. Here’s a basic example:
1. Install MongoDB Driver for Node.js: You can use Node.js to interact with MongoDB.
First, install the required MongoDB package.
try {
// Connect to MongoDB
await client.connect();
console.log("Connected to MongoDB!");
main().catch(console.error);
Explanation:
Output:
[
{ "_id": ObjectId("..."), "name": "Alice", "age": 30, "city": "New York" },
{ "_id": ObjectId("..."), "name": "Bob", "profession": "Software Engineer",
"skills": ["JavaScript", "Node.js", "MongoDB"] }
]
1. Flexibility: You can evolve your application without the need for complex migrations.
2. Heterogeneous Data: Store and query different types of data in the same collection.
3. Schema-less: No need to predefine a schema, making it great for handling evolving and
varied datasets.
In this program, MongoDB shows how easily you can manage dynamic structures and data using
different schemas within the same collection.
The createIndex() method is used to create an index on a field or set of fields in a collection.
Without an index, MongoDB performs a collection scan (i.e., it scans every document) to find
the required data. Indexes can drastically improve query performance.
Syntax:
db.collection.createIndex({ field: 1 }) // 1 is for ascending order, -1 for
descending order
Example:
db.users.createIndex({ name: 1 })
Here, MongoDB creates an index for the name field in ascending order. If you search by name,
the query will be faster since MongoDB can directly use the index instead of scanning the entire
collection.
The getIndexes() method returns a list of all the indexes for a collection, including the default
_id index (which is created automatically when you create a collection).
Syntax:
db.collection.getIndexes()
Example:
db.users.getIndexes()
The output will include the index on the _id field (which exists by default) and the one you
created on the name field:
[
{ "v": 2, "key": { "_id": 1 }, "name": "_id_", "ns": "test.users" },
{ "v": 2, "key": { "name": 1 }, "name": "name_1", "ns": "test.users" }
]
The dropIndex() method is used to remove an index. Dropping an index means that MongoDB
will no longer use that index for queries, and instead will revert to scanning documents
(collection scan) for operations on that field.
Syntax:
db.collection.dropIndex({ field: 1 })
Example:
db.users.dropIndex({ name: 1 })
This will remove the index from the name field. You can also pass the index name directly:
javascript
Copy code
db.users.dropIndex("name_1")
Complete Example:
db.users.createIndex({ age: 1 })
db.users.getIndexes()
[
{ "v": 2, "key": { "_id": 1 }, "name": "_id_", "ns": "test.users" },
{ "v": 2, "key": { "age": 1 }, "name": "age_1", "ns": "test.users" }
]
db.users.dropIndex({ age: 1 })
4. Verify that the index has been removed by running getIndexes() again.
Speed up Queries: Indexes improve query performance by reducing the number of documents
MongoDB needs to scan.
Optimize Sorting: Indexes can also optimize queries that involve sorting.
Reduce Performance Penalty: Without indexes, as collections grow, queries will become slower
since MongoDB must scan more documents.
This shows how to manage indexes effectively in MongoDB, making your queries faster and
more efficient.
Replication:
Replication in MongoDB refers to the process of synchronizing data across multiple servers to
ensure high availability and data redundancy. By having multiple copies of your data (called
replicas), you can maintain data integrity and service continuity, even in the event of hardware
failures, network outages, or system crashes.
1. Replica Set:
o A replica set is a group of MongoDB servers (nodes) that hold the same data.
o A replica set consists of:
Primary Node: The server that accepts all write operations.
Secondary Nodes: Servers that replicate the data from the primary and
serve as backups.
Arbiter (optional): A node that does not hold data but participates in
elections to help choose a new primary during a failover.
2. Replication Process:
o The primary node receives all the write operations (inserts, updates, deletes).
o The secondary nodes replicate the operations from the primary node’s oplog
(operations log) to maintain identical copies of the data.
o Read operations can occur from the primary or secondary nodes, depending on
the application’s configuration.
3. Automatic Failover:
o If the primary node fails (due to hardware issues, network problems, etc.), the
replica set will automatically elect a new primary from the secondary nodes.
o This ensures that your database continues to function without interruption.
4. Data Redundancy:
o By replicating data across multiple nodes, you avoid data loss and increase data
availability. Even if one node goes down, the others can continue serving
requests.
5. Read Scaling:
o In some cases, you can configure MongoDB to allow read operations from
secondary nodes. This helps distribute the read load across multiple nodes,
improving performance in read-heavy applications.
Benefits of Replication:
1. High Availability:
o By replicating data across multiple servers, you ensure that your application stays
available even in case of hardware or network failures.
2. Data Redundancy:
o Replication provides multiple copies of your data, minimizing the risk of data
loss.
3. Fault Tolerance:
o In case of primary node failure, MongoDB can automatically promote a
secondary node to primary, ensuring that the database continues to accept write
operations.
4. Disaster Recovery:
o If a catastrophic event occurs at one data center, secondary nodes in different
locations can continue to serve requests and maintain data integrity.
If a user inserts data into the database, the following process occurs:
How It Works:
The master logs the SQL statement that performs the operation (like INSERT, UPDATE, DELETE,
etc.) into a binary log.
The replica then retrieves these SQL statements from the master and executes them locally.
Example:
This SQL statement will be logged on the master and sent to the replicas. The replicas will then
execute the same UPDATE statement on their local copies of the employees table.
Advantages:
Simplicity: Since only the SQL statements are replicated, the log size is smaller compared to
binary replication.
Readability: SQL statements are human-readable, making it easier to debug and understand
what changes are being replicated.
Efficiency for Certain Operations: If a large number of rows are affected by a query, only the
statement is replicated, not the individual row changes.
Disadvantages:
In binary replication, also known as row-based replication (RBR), the actual changes to the
individual rows (rather than the SQL statement) are logged and sent to the replicas. This ensures
that the exact data modifications are applied on both the master and replicas.
How It Works:
The master logs the changes to specific rows in a binary format in its binary log (e.g., "change
row with id = 101 to have salary = 5500").
The replica reads these binary logs and directly applies the row changes to its local data.
Example:
In row-based replication, the change to the specific row (e.g., "set salary to 5500 for id =
101") is recorded in binary form and sent to the replicas. The replicas will then apply this change
directly to the row with id = 101.
Advantages:
Exact Data Replication: Since row changes are replicated exactly as they occur, there are fewer
chances of inconsistencies. This is especially useful for non-deterministic queries.
Deterministic: All row changes are deterministic, meaning they will always result in the same
final state on all replicas.
More Reliable for Complex Operations: For operations like triggers, stored procedures, and
non-deterministic functions (e.g., UUID(), NOW()), row-based replication ensures that the data
is consistent across all nodes.
Disadvantages:
Larger Log Size: Since the actual row changes are replicated, the binary log can be larger,
especially for bulk updates or inserts.
Less Human-Readable: The binary log is not human-readable, making it harder to debug
replication issues.
More Overhead for Small Changes: For small changes or updates affecting only a few rows, the
overhead of logging the exact changes may be more significant than logging the SQL statement
itself.
Some systems use a hybrid approach, where they combine statement-based replication and
row-based replication. This is called mixed replication. In this mode, the database engine
dynamically chooses which replication method to use based on the type of query being executed:
Simple queries (like INSERT, UPDATE, or DELETE that don't involve non-deterministic
operations) might use statement-based replication.
Complex queries (like those involving non-deterministic functions or triggers) might use row-
based replication to ensure consistency.
Handling of Non-Deterministic
May cause inconsistencies Ensures consistency
Queries
Key Concepts:
Shard: A shard is an individual MongoDB instance that stores a portion of the data.
Shard Key: This is the field or set of fields used to determine how data is distributed across
shards. MongoDB splits data into ranges based on this key or uses a hashed value of the key.
Cluster: A sharded cluster consists of several components: shards, query routers (mongos), and
config servers.
Mongos: This is the query router that routes client requests to the appropriate shard(s) based
on the shard key.
Config Servers: These store metadata and information about the shards and their data
distribution.
How Auto-Sharding Works:
1. Data Partitioning:
o When you enable sharding on a MongoDB collection, MongoDB partitions the data
based on the shard key. Data is distributed across the available shards.
o For example, if you shard by the user_id field, MongoDB will distribute documents
with different user_ids across different shards.
2. Automatic Data Distribution:
o As the dataset grows, MongoDB automatically adds more data to the existing shards or
rebalances data across new shards added to the system.
o MongoDB manages the movement of data between shards and automatically handles
routing of queries to the correct shard.
3. Load Balancing:
o MongoDB ensures even distribution of data and queries by monitoring the load on each
shard. If one shard becomes overloaded, MongoDB will migrate chunks of data to less
busy shards to maintain balance.
4. Horizontal Scalability:
o Auto-sharding allows the database to scale horizontally, meaning you can add more
shards as your data grows, instead of upgrading to more powerful (but expensive)
hardware.
Example:
Assume you have a MongoDB collection with 1 million user documents, and you want to shard
the collection by the user_id field:
MongoDB automatically distributes the documents across the available shards based on the
user_id.
When you query for user_id: 1234, MongoDB (via the mongos router) sends the request to
the specific shard that holds that document.
This ensures that no single server is overwhelmed with the load of managing the entire dataset.
Benefits of Auto-Sharding:
1. Memory-Mapped Storage:
o MongoDB uses the WiredTiger storage engine, which maps data files to memory. The
operating system's virtual memory manager then manages the data in memory.
o Frequently accessed data is kept in memory (RAM), meaning MongoDB can read from
memory instead of performing a slower disk read.
2. In-Memory Storage:
o For frequently accessed queries or indexes, MongoDB keeps the data in the working set,
which is a portion of the dataset that resides in RAM.
o The more frequently a document is accessed, the more likely it will remain in memory,
improving read performance for repetitive queries.
3. Caching Layers:
o Hot Data Cache: MongoDB keeps the most frequently accessed data in memory to
reduce disk I/O and latency.
o Eviction Policy: MongoDB uses an eviction policy that determines when data should be
removed from memory to make room for new data. Data that is no longer frequently
accessed is gradually removed from the cache.
4. Index Caching:
o MongoDB also caches indexes in memory. Since queries typically involve index lookups,
caching indexes in memory can significantly reduce query latency.
Reduced Latency: Reads from memory are much faster than reads from disk, leading to reduced
query latency.
Automatic Caching: MongoDB automatically manages what data to keep in memory based on
usage patterns, without requiring manual intervention.
Efficient Use of Resources: By utilizing the available system memory efficiently, MongoDB
ensures optimal performance without the need for a separate caching layer like Redis or
Memcached.
Example:
When a query is executed, MongoDB first checks if the required data or index is present in
memory.
o If the data is in memory (cache hit), MongoDB serves the request quickly from RAM.
o If the data is not in memory (cache miss), MongoDB retrieves it from disk and then
stores it in memory for future requests.
This approach ensures faster access to frequently queried data, improving overall read
performance.
Distributes data across multiple servers for Improves read performance by storing
Purpose
scalability frequently accessed data in memory
When your dataset outgrows the capacity of a When you need fast access to frequently
When to Use
single server queried data
Large-scale applications that need distributed Applications with high read traffic (e.g.,
Example
storage (e.g., e-commerce, social media) dashboards, reporting tools)
Conclusion:
Auto-sharding allows MongoDB to scale horizontally by distributing data across multiple shards,
ensuring that the database can handle large volumes of data and traffic.
Integrated caching optimizes MongoDB's performance by storing frequently accessed data and
indexes in memory, allowing for faster reads and reduced disk I/O.
Together, these features enable MongoDB to handle both large-scale datasets and performance-
intensive queries efficiently.
Aggregation and scalability in mongodb:
Aggregation in MongoDB
1. Aggregation Pipeline:
o The aggregation pipeline is a series of stages that process data in a sequence.
o Each stage in the pipeline takes input, transforms it, and passes the output to the next
stage.
o This allows for efficient data transformations and computations.
2. Stages in the Aggregation Pipeline: MongoDB provides several key pipeline stages:
o $match: Filters documents to pass only those that meet specified criteria (similar to
SQL’s WHERE clause).
o $project: Reshapes the documents, allowing you to include or exclude fields and create
new fields.
{ $sort: { totalSales: -1 } }
o $limit and $skip: Limit the number of documents returned or skip a specific number of
documents.
{ $limit: 5 }
Benefits of Aggregation:
Efficiency: The aggregation pipeline allows you to perform complex data analysis in a single
query, reducing the need for multiple database round trips.
Flexibility: You can reshape documents, apply complex transformations, and perform advanced
calculations on your data using a wide variety of operators and stages.
Scalability: Aggregations can be distributed across MongoDB shards (in a sharded cluster) for
large datasets, improving performance for big data applications.
Scalability in MongoDB
MongoDB is designed to scale horizontally, allowing it to handle large volumes of data and high
throughput by distributing data across multiple servers. This makes MongoDB suitable for
applications with big data and high concurrency requirements.
o The shard key determines how the data is distributed. For example, you can shard a
collection by a field like user_id, which distributes documents with different
user_ids across different shards.
o Config servers store metadata about the shards and help manage the distribution of
data.
o Query routers (mongos) route queries to the appropriate shard(s) based on the shard
key.
Example:
javascript
Copy code
// Enable sharding on a database and collection
sh.enableSharding("myDatabase");
sh.shardCollection("myDatabase.myCollection", { "user_id": 1 });
In this example, MongoDB will distribute the data across multiple shards based on the
user_id field.
Conclusion:
Aggregation in MongoDB provides a flexible and powerful way to perform complex data analysis
and transformations.
Scalability is a core strength of MongoDB, allowing it to handle massive datasets and high-
throughput applications through sharding and replication.
The combination of aggregation and scalability ensures that MongoDB can perform efficiently
even with large-scale data and complex queries, making it ideal for big data applications.