In today's data-driven world, the ability to make informed decisions swiftly based on real-time data is crucial. However, ODB systems may struggle with this due to their sheer volume. As a solution, DW are employed for focused analytical tasks. A DW can be any form of database, typically selected based on specific data needs. Furthermore, data warehouses often include a front-end interface that provides visualizations for better data interpretation. In this project, we will implement three distinct DW for an E-Commerce dataset: one based on SQL and the other two on NoSQL. A unified front-end will serve all three DW. The objective is to evaluate the advantages and disadvantages of each system in handling specific analytical tasks.
The project is organized into several directories, each dedicated to a specific component of the data architecture:
grafana: Contains Docker configurations for running Grafana.mongodb: Contains Docker configurations, scripts, and files for setting up MongoDB DW.neo4j: Contains Docker configurations, scripts, and files for setting up Neo4j DW.sql: Contains Docker configurations, script, and files for setting up SQL ODB and DW.
Navigate to the SQL directory:
cd sqlCreate a virtual environment and install the required dependencies:
pip install -r requirements.txtStart MySQL server and phpMyAdmin:
docker-compose up -dInitialize the database schema:
python3 manage.py --createpulate the database with user, category, product, and event data from CSV files. Alternatively, if you have access to the database dump, use the propagate_db_from_csv.sh bash script for slightly faster execution:
python3 manage.py --insert_users_and_categories
python3 manage.py --insert_products
python3 manage.py --insert_eventsThis process may take several hours, but once completed, the ODB will be fully configured.
Ensure that you are still in the correct directory and the environment is active. Create a new database named dw on your MySQL server, then execute the following script to synchronize it with the ODB:
python3 sync_dw_with_odb.pyNext, create or update the summary tables:
python3 sync_summary_table.pyNavigate to the Neo4j Folder:
cd neo4jInstall the required dependencies:
pip install --no-cache-dir -r requirements.txtStart the neo4j server:
docker compose up -dRun the following script to migrate the DW to neo4j:
python3 sync_neo4j_with_dw.pyThis will take some time. After its complete, add the neo4j dashboard to grafana using the JSON file in the Neo4j folder.
Navigate to the MongoDB Folder:
cd mongodbInstall the required dependencies:
pip install -r requirements.txtStart the MongoDB server:
docker compose up -dMigrate the DW to MongoDB using this script:
python3 migrate_to_mongodb.pyAfter its done loading, add the MongoDB dashboard to grafana using the JSON file in the MongoDB folder.
Navigate to the Grafana folder:
cd grafanaStart the Grafana interface server:
docker compose up -dNow, if you have followed the previous steps, you should be able to create the dashboards. Start by creating the data sources for each DW. Then, import the dashboard files from their respective dirs.