ADDIS ABABA UNIVERSITY
COLLEGE OF NATURAL AND COMPUTATIONAL
                        SCIENCES DEPARTMENT OF
                        COMPUTER SCIENCE
                 Selected Topics in Data Management Systems (CoSc6031)
                                       Lab Project Report
Project Title: Implementation of a Scalable and Distributed E-commerce Database System
using MongoDB Compass
                       1. Solomon Zinabu                           GSE/2578/17
                       2. Sasebih Nega                             GSE/7037/17
                      3. Dawit Kebede                              GSE/4878/17
                      4. Minale Ejigu                              GSE/5402/17
                               Date:    12/04/2025
1. Introduction
This project demonstrates a comprehensive understanding and practical implementation
of key features in MongoDB, using Compass as the primary tool. The e-commerce
dataset comprises five collections: customers, orders, order_items, payments, and
products, each containing 38,279 documents. The objective is to apply and showcase the
following functionalities:
         CRUD operations (Create, Read, Update, Delete)
         Projections
         Indexing
         Sorting and Aggregations
         Replica Set (1 Primary + 2 Secondaries)
         Sharding (Scalable distribution)
         Advanced MongoDB features (Lookups, Array functions, Grouping, Conditional
          logic)
2. Data Collections Overview
Customers Collection
{
    "customer_id": "I74lXDOfoqsp",
    "customer_zip_code_prefix": 6020,
    "customer_city": "goiania",
    "customer_state": "GO"
}
Orders Collection
{
    "order_id": "u6rPMRAYIGig",
    "customer_id": "I74lXDOfoqsp",
    "order_purchase_timestamp": "2017-11-18 12:29:57",
    "order_approved_at": "2017-11-18 12:46:08"
}
Order Items Collection
{
    "order_id": "u6rPMRAYIGig",
    "product_id": "1slxdgbgWFax",
    "seller_id": "3jwvL6ihC45G",
    "price": 24.1,
    "shipping_charges": 20.9
}
Payments Collection
{
    "order_id": "u6rPMRAYIGig",
    "payment_type": "credit_card",
    "payment_installments": 2,
    "payment_value": 155.77
}
Products Collection
{
    "product_id": "1slxdgbgWFax",
    "product_category_name": "toys",
    "product_weight_g": 50,
    "product_length_cm": 16,
    "product_height_cm": 5,
    "product_width_cm": 11
}
3. Functional Implementation Using MongoDB Compass
3.1 CRUD Operations
Each collection supports Create, Read, Update, and Delete operations performed directly
in Compass.
         Insert – Sample records added to each collection.
         Read – Filtered and projected queries to retrieve documents.
         Update – Updates on customer city/state.
         Delete – Removal based on customer_id, order_id, or product_id.
3.2 Projection
Used to limit fields during queries for performance:
{
    "customer_id": 1,
    "customer_city": 1,
    "_id": 0
}
3.3 Indexing
Created indexes on:
         customer_id
         order_id
         product_id
         payment_type
         product_category_name
Improves query speed for searches and aggregations.
3.4 Aggregations
Customers: Cities with most customers
[ { "$group": { _id: "$customer_city", total: { "$sum": 1 } } }, { "$sort": { total: -1 } } ]
Orders: Monthly order count
[ { "$project": { month: { "$substr": ["$order_purchase_timestamp", 0, 7] } } }, { "$group":
{ _id: "$month", count: { "$sum": 1 } } } ]
Payments: Average by type
[ { "$group": { _id: "$payment_type", avg_payment: { "$avg": "$payment_value" } } } ]
3.5 Advanced Aggregation & Lookup
Join Orders with Customers:
[ { "$lookup": { from: "customers", localField: "customer_id", foreignField: "customer_id",
as: "customer_info" } } ]
Group products by category with average size:
[ { "$group": {
    _id: "$product_category_name",
    avg_weight: { "$avg": "$product_weight_g" },
    avg_length: { "$avg": "$product_length_cm" }
}} ]
Find high-value orders:
[ { "$match": { payment_value: { "$gt": 1000 } } } ]
4. Replication & Sharding Setup
4.1 Replication
Replication in MongoDB provides high availability by creating copies of the same data
on multiple servers. For this project, we configured a Replica Set consisting of one
primary and two secondary nodes.
Objective: Ensure data redundancy and availability in case of hardware failure or server
crash.
Replica Set Configuration Steps:
   1. Start MongoDB instances (in separate terminal windows or services):
   2. mongod --replSet rs0 --port 27017 --dbpath /data/rs0 --bind_ip localhost
   3. mongod --replSet rs0 --port 27018 --dbpath /data/rs1 --bind_ip localhost
      mongod --replSet rs0 --port 27019 --dbpath /data/rs2 --bind_ip localhost
   4. Initiate Replica Set from any node (done via mongosh):
   5. rs.initiate({
   6. _id: "rs0",
   7. members: [
   8.     { _id: 0, host: "localhost:27017" },
   9.     { _id: 1, host: "localhost:27018" },
   10. { _id: 2, host: "localhost:27019" }
   11. ]
      })
   12. Check status using:
       rs.status()
   13. Read/Write Setup in Compass:
          o Use connection string:
            mongodb://localhost:27017,localhost:27018,localhost:27019/?replicaSet=rs
            0
          o Read Preference: Primary or Secondary
          o Write always routes to Primary
Benefits Observed:
      Reads from secondaries for load balancing.
      Automatic failover if primary is unavailable.
      No data loss during network partitions.
4.2 Sharding
Sharding enables horizontal scaling by distributing data across multiple machines.
MongoDB uses mongos as a query router and distributes collections based on a shard
key.
Sharding Configuration Steps:
   1. Start Config Server Replica Set:
       mongod --configsvr --replSet configReplSet --port 26017 --dbpath /data/config --
       bind_ip localhost
   2. Start Shard Replica Set (rs0) (same as replication step above)
   3. Start mongos instance:
       mongos --configdb configReplSet/localhost:26017 --port 27020 --bind_ip localhost
   4. Connect to mongos using Compass:
        o Connection String:
            mongodb://localhost:27020
   5. Add Shard (from mongosh):
       sh.addShard("rs0/localhost:27017,localhost:27018,localhost:27019")
   6. Enable Sharding for Database:
       sh.enableSharding("ecommerceDB")
   7. Shard Specific Collections:
   8. sh.shardCollection("ecommerceDB.customers", { "customer_id": "hashed" })
      sh.shardCollection("ecommerceDB.orders", { "order_id": "hashed" })
   9. Verify Shard Status:
       sh.status()
Benefits Observed:
      Collections evenly distributed across shards
      High scalability with ability to add new shards
      Improved performance on large datasets
5. Read/Write Distribution Testing
Using Compass, operations tested on both primary and secondary nodes.
      Read Preference: Secondary
      Write Preference: Primary
      Observations: Compass allows controlling read/write behaviors via connection
       options.
6. Conclusion
This project successfully implements a highly available, scalable e-commerce system using
MongoDB Compass. From core CRUD operations to advanced aggregations, replication,
and sharding, all requirements were fulfilled through UI-based interactions—
demonstrating complete mastery of MongoDB in a distributed environment.
Appendix
Tools Used:
      MongoDB (v8.0.5)
      MongoDB Compass (GUI)
      mongosh (for cluster setup only)
Environment:
      Windows 10
      Ports used: 27017 (primary), 27018, 27019 (secondaries), 27020 (mongos)
Sample Data Volume:
      ~38,279 documents per collection