Skip to content

A comprehensive data engineering solution that transforms sales data from PostgreSQL into a star schema, moves it to Azure Synapse Analytics via Blob Storage, and creates Power BI dashboards for business insights.

Notifications You must be signed in to change notification settings

ShawonSimon/SuperStore-Sales-Data-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SuperStore-Sales-Analysis

Project Overview

This data engineering project implements an end-to-end data pipeline for sales analytics, transforming raw transactional data from a PostgreSQL OLTP database into actionable business insights using a modern cloud-based data warehouse and visualization solution.

Dataset

The original dataset was obtained from the GTS.AI website. It contains 9993 sales transactions that occurred from 2019 to 2022. This dataset encompasses a wide range of information, including order specifics, geographical data, and product-related data. There are no missing values or any irrelevant data types and values

Technology Stack:

  • Database: PostgreSQL
  • ETL Orchestration: Apache Airflow
  • Cloud Storage: Azure Blob Storage
  • Data Warehouse: Azure Synapse Analytics
  • BI Tool: Power BI
  • Local Development Environment: Windows Subsystem for Linux (WSL)

Project Architecture

Project Architecture

  1. Source Database: PostgreSQL stores transactional sales data.
  2. ETL Process:
    • Extract: Data fetched from PostgreSQL.
    • Transform: Data cleaning, deduplication, Standardizing data formats and creation of fact/dimension tables (star schema).
    • Load: Processed data uploaded to Azure Blob Storage.
  3. Data Warehousing:
    • Data moved to Azure Synapse Analytics.
    • Dedicated SQL pool (shawonDSQL) tables populated with cleaned data.
    • Analytical queries run for insights.
  4. Visualization: Power BI dashboards for presenting KPIs and trends.
  5. Orchestration: Airflow DAG automates the ETL and data movement processes.

Screenshots

Airflow DAG runs Data extracted successfully files in WSL file system synapse workspace role assignment for blob storage access Data analysis

About

A comprehensive data engineering solution that transforms sales data from PostgreSQL into a star schema, moves it to Azure Synapse Analytics via Blob Storage, and creates Power BI dashboards for business insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published