0% found this document useful (0 votes)
17 views30 pages

Data Warehousing Record

The document outlines the academic structure and objectives for the Data Warehousing Laboratory course at SRI Shanmugha College of Engineering and Technology for the 2023-2024 academic year. It includes the vision and mission of the institute and department, program outcomes, educational objectives, specific objectives, course objectives, and outcomes, along with detailed lab exercises involving data exploration, validation, architecture planning, schema definition, and data warehouse design using WEKA. The document serves as a comprehensive guide for students to understand the course requirements and practical applications in data warehousing.

Uploaded by

e22cs064
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views30 pages

Data Warehousing Record

The document outlines the academic structure and objectives for the Data Warehousing Laboratory course at SRI Shanmugha College of Engineering and Technology for the 2023-2024 academic year. It includes the vision and mission of the institute and department, program outcomes, educational objectives, specific objectives, course objectives, and outcomes, along with detailed lab exercises involving data exploration, validation, architecture planning, schema definition, and data warehouse design using WEKA. The document serves as a comprehensive guide for students to understand the course requirements and practical applications in data warehousing.

Uploaded by

e22cs064
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

SRI SHANMUGHA COLLEGE OF ENGINEERING AND TECHNOLOGY

(An Autonomous Institution)


Pullipalayam, Morur (Po.), Sankari (Tk.), Salem (Dt.) - 637 304.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ACADEMIC YEAR: 2023-2024

EVEN SEMESTER

LAB RECORD

(REGULATION-2021)

CCS341–DATA WAREHOUSING LABORATORY

NAME:

REGISTER NUMBER:

YEAR / SEMESTER:

BRANCH:
SRI SHANMUGHA COLLEGE OF ENGINEERING AND TECHNOLOGY
(An Autonomous Institution)
Pullipalayam, Morur (Po.), Sankari (Tk.), Salem (Dt.) - 637 304.

RECORD NOTE BOOK

REGISTER NUMBER : ……………………………………………

Certified that this is a Bonafide observation of Practical work done by Mr. /Ms.

........................................ of the ............................................ Semester. Branch during the

Academic year ….......................................... in

the……………………………………………… Laboratory.

Submitted for the University Practical Examination held on………………………..

STAFF-IN-CHARGE HEAD OF THE DEPARTMENT

INTERNAL EXAMINER EXTERNAL EXAMINER


Vision and Mission of the Institute

VISION
To be an institute of repute in all fields of education by implementing the best
practices akin to global standards for fostering domain knowledge and developing
research attitude among students to make them globally competent.
MISSION
• Achieving excellence in Teaching & Learning process using state-of-the-art resources.
• Extending opportunity to upgrade faculty knowledge and skills.
• Implementing the best student training practices for requirements of industrial
scenario of the state.
• Motivating faculty and students in research activity for real time application.

Vision and Mission of the Department

VISION
To create a conducive environment for the development of academic and
innovative technocrats employable at the global level, socially responsible and
professionally competent to sustain the challenges
MISSION
M1: To develop competent and quality IT professionals by imparting state-of the
art technology learning methodologies.

M2: To promote Industry – Institution relationships among the students to become


more employable and better citizens to solve societal issues.

M3: To constantly upgrade the Faculty qualification with cutting edge technology
to achieve high status in technical and research areas.
PROGRAMME OUTCOMES
The UG program in Information Technology will prepare students to attain:
PO1 Engineering knowledge: Apply the knowledge of mathematics, science,
engineering fundamentals and an engineering specialization to the solution of
complex engineering problems.
PO2 Problem analysis: Identify, formulate, review research literature, and
analyze complex engineering problems reaching substantiated conclusions using
first principles of mathematics, natural sciences and engineering sciences.
PO3 Design/development of solutions: Design solutions for complex
engineering problems and design system components or processes that meet the
specified needs with appropriate consideration for the public health, safety,
cultural, societal and environmental considerations.
PO4 Conduct investigations of complex problems: Use research-based
knowledge and research methods including design of experiments, analysis, and
interpretation of data and synthesis of the information to provide valid
conclusions.
PO5 Modern tool usage: Create, select, apply appropriate techniques, resources,
modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.
PO6 The engineer and society: Apply reasoning informed by the contextual
knowledge to assess societal, health, safety, legal, cultural issues and the
consequent responsibilities relevant to the professional engineering practice.
PO7 Environment and sustainability: Understand the impact of the professional
engineering solutions in societal, environmental contexts, demonstrate the
knowledge and need for sustainable development.
PO8 Ethics: Apply ethical principles, commit to professional ethics,
responsibilities and norms of the engineering practice.
PO9 Individual and team work: Function effectively as an individual, as a
member or leader in diverse teams and in multidisciplinary settings.
PO10 Communication: Communicate effectively on complex engineering
activities with the engineering community with society at large being able to
comprehend, write effective reports, design documentation, make effective
presentations and receive clear instructions.
PO11 Project management and finance: Demonstrate knowledge, understanding
of the engineering and management principles and apply these to one’s own work,
as a member and leader in a team, to manage projects and in multidisciplinary
environments.
PO12 Life-long learning: Recognize the need, ability to engage in independent
and life- long learning in the broadest context of technological change.
PROGRAM EDUCATIONAL OBJECTIVES (PEO)

PEO 1: To ensure graduates will be proficient and core competent by utilizing the
fundamental knowledge of Basic Sciences, Engineering, Mathematics and Information
Technology for the applications of various Engineering and Technology disciplines.

PEO 2: To enable graduates to think logically and will have the capacity to understand the
social, business, environmental based hardware and software issues by designing
optimal solutions through lifelong learning.

PEO 3: To enable graduates to gain employment in organizations and establish


themselves as professionals by applying their technical skills to solve real world
problems and meet the diversified needs of industry, academia and research.

PROGRAM SPECIFIC OBJECTIVES (PSOs)

1. To create, select, and apply appropriate techniques, resources, modern engineering


and IT tools including prediction and modelling to complex engineering activities
with an understanding of the limitations.
2. To manage complex IT projects with consideration of the human, financial,
ethical and environmental factors and an understanding of risk management
processes, and operational and policy implications.
COURSE OBJECTIVES

• To know the details of data warehouse Architecture.


• To understand the OLAP Technology.
• To understand the partitioning strategy.
• To differentiate various schema.
• To understand the roles of process manager & system manager.

COURSE OUTCOMES

CO1: Design data warehouse architecture for various Problems


CO2: Apply the OLAP Technology
CO3: Analyze the partitioning strategy
CO4: Critically analyze the differentiation of various schema for given problem
CO5: Frame roles of process manager & system manager
TABLE OF CONTENTS

PAGE
S.NO DATE NAME OF THE EXPERIMENT NO MARKS SIGN
EX.NO: 1
DATA EXPLORATION AND INTEGRATION WITH WEKA
DA
DATE:

AIM:

To explore data and integrate with WEKA.

ALGORITHM AND EXPLORES:

1. Download and install Weka. You can find it here:


http://www.cs.waikato.ac.nz/mn/weka/downloading.html
2. Open the WEKA tool and select the explorer option.
3. New window will be opened which consists of different options
(Preprocess, Association etc.)
4. In the preprocess, click the ―open file‖ option.
5. Go to C:\Program Files\Weka-3-6\data for finding different existing. arff
datasets.
Click on any dataset for loading the data then the data will be displayed as shown
below.

Load each dataset and observe the following:

Here we have taken IRIS.arff dataset as sample for observing all the below things.

i)List the attribute names and their types:


There are 5 attributes& its datatype present in the above loaded dataset
(IRIS.arff) sepallength – Numeric sepalwidth – Numeric petallength –
Numeric petallength – Numeric Class – Nominal.

ii)Number of records in each dataset:


There are total 150 records (Instances) in dataset (IRIS.arff).

1
iii)Identify the class attribute (if any):
There is one class attribute which consists of 3 labels.
They are:
1. Iris-setosa
2. Iris-versicolor
3. Iris-virginica

iv)Plot Histogram :

v)Determine the number of records for each class:


There is one class attribute (150 records) which consists of 3 labels. They are shown
below :
1. Iris-setosa - 50 records
2. Iris-versicolor – 50 records

2
3. Iris-virginica – 50 records

vi)Visualize the data in various dimensions:

RESULT:

Thus the data exploration and integration with WEKA executed successfully.

3
EX.NO: 2
APPLY WEKA TOOLS FOR DATA VALIDATION
DATE:

AIM:

To Apply WEKA tool for Data Validation

STEPS AND APPLY:

1. Load the dataset (Iris-2D. arff) into weka tool.


2. Go to classify option & in left-hand navigation bar we can see different
classification algorithms under rules section.
3. In which we selected JRip (If-then) algorithm & click on start option with
-use training set‖ test option enabled.
4. Then we will get detailed accuracy by class consists of-measure, TP rate,
FP rate, Precision, Recall values& Confusion Matrix as represented below.

Using Cross-Validation Strategy with 10 folds:


Here, we enabled cross-validation test option with 10 folds & clicked start button
as represented below.

4
Using Cross-Validation Strategy with 20 folds:
Here, we enabled cross-validation test option with 20 folds & clicked start button
as represented below.

If we see the above results of cross validation with 10 folds & 20 folds. As per our
observation the error rate is lesser with 20 folds got 97.3% correctness when
compared to 10 folds got 94.6% correctness.

RESULT:

Thus the WEKA tool for data validation executed successfully.

5
EX.NO: 3
PLAN THE ARCHITECTURE FOR REAL TIME APPLICATION
DATE:

AIM:

To plan the architecture for a real-time application using Weka, you need to
consider several factors. Weka is a popular machine learning library that provides
various algorithms for data mining and predictive modelling.

HERE ARE THE STEPS TO PLAN THE ARCHITECTURE:

1. Define the problem: Clearly understand the problem you are trying to solve
with your real-time application. Identify the specific tasks and goals you want
to achieve using Weka.

2. Data collection and preprocessing: Determine the data sources and collect the
required data for your application. Preprocess the data to clean, transform, and
prepare it for analysis using Weka. This may involve tasks like data cleaning,
feature selection, normalization, and handling missing values.

3. Choose the appropriate Weka algorithms: Weka offers a wide range of


machine learning algorithms. Select the algorithms that are suitable for your
problem and data. Consider factors like the type of data (classification,
regression, clustering), the size of the dataset, and the computational
requirements.

4. Real-time data streaming: If your application requires real-time data


processing, you need to set up a mechanism to stream the data continuously.
This can be done using technologies like Apache Kafka, Apache Flink, or
Apache Storm. Ensure that the data streaming infrastructure is integrated with
Weka for seamless processing.

5. Model training and evaluation: Train the selected Weka algorithms on your
training dataset. Evaluate the performance of the models using appropriate
evaluation metrics like accuracy, precision, recall, or F1-score. Fine-tune the
models if necessary.

6. Integration and deployment: Integrate the trained models into your real-time
application. This may involve developing APIs or microservices to expose the
models' functionality. Ensure that the application can handle real-time requests
and provide predictions or insights in a timely manner.

6
7. Monitoring and maintenance: Set up monitoring mechanisms to track the
performance of your real-time application. Monitor the accuracy and
performance of the models over time. Update the models periodically to adapt
to changing data patterns or to improve performance.

Remember to document your architecture design and implementation decisions for


future reference. Regularly review and update your architecture as your application
evolves and new requirements arise.

RESULT:

Thus the architecture for real time applications was planned successfully.

7
EX.NO: 4
WRITE THE QUERY FOR SCHEMA DEFINITION
DATE:

AIM:

To write the query for schema definition.

ALGORITHM:

1. Create a new table called users with fields for ID, username, email, password
hash, and timestamp.
2. Insert some sample user records into the users table.
3. Query the table to find all users who registered within the last 24 hours.
4. (Optional): Drop the table when done, for cleanup.

PROGRAM:

-- Step 1: Create the users table

CREATE TABLE users (


user_id INTEGER PRIMARY KEY AUTOINCREMENT,
username VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Step 2: Insert sample users

INSERT INTO users (username, email, password_hash, created_at) VALUES


('alice', 'alice@example.com', 'hash1', CURRENT_TIMESTAMP),
('bob', 'bob@example.com', 'hash2', CURRENT_TIMESTAMP),
('charlie', 'charlie@example.com', 'hash3', DATETIME('now', '-2 days'));

-- Step 3: Select users created in the last 24 hours

SELECT username, email, created_at


FROM users
WHERE created_at >= DATETIME('now', '-1 day');

-- Step 4 (Optional): Drop the table to clean up

DROP TABLE users;

8
OUTPUT:

username email created_at


alice alice@example.com 2025-05-03 05:20:12
bob bob@example.com 2025-05-03 05:20:12

RESULT:

Thus the schema definition was written and executed successfully.

9
EX.NO: 5
DESIGN DATA WAREHOUSE FOR REAL TIME APPLICATION
DATE:

AIM:

To design data warehouse for real time applications.

ALGORITHM:

1. Create a dim_customer table to store customer information.


2. Create a dim_product table to store product details.
3. Create a dim_time table to store date and time information.
4. Create a fact_order table to store order transactions linked to the dimension
tables.
5. Insert sample data into the customer, product, and time dimension tables.
6. Insert sample order data into the fact table.
7. Query the database to find total sales by each customer.
8. Query to get daily order volumes and total revenue.
9. Query to find total sales grouped by product category.
10. Use these queries for real-time reporting and business insights.

PROGRAM:

-- Step 1: Create dimension tables

CREATE TABLE dim_customer (


customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
customer_region VARCHAR(50)
);

CREATE TABLE dim_product (


product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
price DECIMAL(10, 2)
);

CREATE TABLE dim_time (


time_id INT PRIMARY KEY,
date DATE,
day_of_week VARCHAR(10),
month INT,
year INT

10
);

-- Step 2: Create fact table

CREATE TABLE fact_order (


order_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
time_id INT,
quantity INT,
total_amount DECIMAL(10, 2),

FOREIGN KEY (customer_id) REFERENCES dim_customer(customer_id),


FOREIGN KEY (product_id) REFERENCES dim_product(product_id),
FOREIGN KEY (time_id) REFERENCES dim_time(time_id)
);

-- Step 3: Insert sample data into dimension tables

INSERT INTO dim_customer VALUES


(1, 'Alice', 'North'),
(2, 'Bob', 'South');

INSERT INTO dim_product VALUES


(101, 'Laptop', 'Electronics', 1000.00),
(102, 'Phone', 'Electronics', 500.00);

INSERT INTO dim_time VALUES


(1, '2025-05-01', 'Thursday', 5, 2025),
(2, '2025-05-02', 'Friday', 5, 2025);

-- Step 4: Insert data into fact table

INSERT INTO fact_order VALUES


(1001, 1, 101, 1, 2, 2000.00), -- Alice bought 2 Laptops
(1002, 2, 102, 2, 1, 500.00); -- Bob bought 1 Phone

-- Step 5: Example Queries

-- Total sales by customer


SELECT dc.customer_name, SUM(fo.total_amount) AS total_sales
FROM fact_order fo
JOIN dim_customer dc ON fo.customer_id = dc.customer_id
GROUP BY dc.customer_name;

11
-- Daily order volume
SELECT dt.date, COUNT(*) AS orders_placed, SUM(fo.total_amount) AS revenue
FROM fact_order fo
JOIN dim_time dt ON fo.time_id = dt.time_id
GROUP BY dt.date
ORDER BY dt.date;

-- Sales by product category


SELECT dp.category, SUM(fo.total_amount) AS total_sales
FROM fact_order fo
JOIN dim_product dp ON fo.product_id = dp.product_id
GROUP BY dp.category;

OUTPUT:

customer_name total_sales
Alice 2000
Bob 500

date orders_placed revenue


2025-05-01 1 2000
2025-05-02 1 500

category total_sales
Electronics 2500

RESULT:

Thus the data warehouse for real time application was designed and executed
sxuccessfully.

12
EX.NO: 6
ANALYZE THE DIMENSIONAL MODELING
DATE:

AIM:

To analyze the dimensional modeling.

ALGORITHM:

1. Identify the business process to model (e.g., product sales).


2. Define the fact table to store measurable data like quantity and revenue.
3. Define the dimension tables for customers, products, and time.
4. Create all tables with primary and foreign keys.
5. Insert sample records into dimension tables.
6. Insert records into the fact table referencing the dimension keys.
7. Write queries to analyze sales by customer, product, or time.

PROGRAM:

-- Step 1: Create Dimension Tables

CREATE TABLE dim_customer (


customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
region VARCHAR(50)
);

CREATE TABLE dim_product (


product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
price DECIMAL(10, 2)
);

CREATE TABLE dim_time (


time_id INT PRIMARY KEY,
date DATE,
day_of_week VARCHAR(10),
month INT,
year INT
);

-- Step 2: Create Fact Table

CREATE TABLE fact_sales (

13
sale_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
time_id INT,
quantity INT,
total_amount DECIMAL(10, 2),

FOREIGN KEY (customer_id) REFERENCES dim_customer(customer_id),


FOREIGN KEY (product_id) REFERENCES dim_product(product_id),
FOREIGN KEY (time_id) REFERENCES dim_time(time_id)
);

-- Step 3: Insert Sample Data

INSERT INTO dim_customer VALUES


(1, 'Alice', 'North'),
(2, 'Bob', 'South');

INSERT INTO dim_product VALUES


(101, 'Laptop', 'Electronics', 1200.00),
(102, 'Smartphone', 'Electronics', 600.00);

INSERT INTO dim_time VALUES


(1, '2025-05-01', 'Thursday', 5, 2025),
(2, '2025-05-02', 'Friday', 5, 2025);

-- Step 4: Insert Data into Fact Table

INSERT INTO fact_sales VALUES


(1, 1, 101, 1, 1, 1200.00),
(2, 2, 102, 2, 2, 1200.00);

-- Step 5: Example Analytical Queries

-- Total revenue per customer


SELECT dc.customer_name, SUM(fs.total_amount) AS total_spent
FROM fact_sales fs
JOIN dim_customer dc ON fs.customer_id = dc.customer_id
GROUP BY dc.customer_name;

-- Revenue by product category


SELECT dp.category, SUM(fs.total_amount) AS revenue
FROM fact_sales fs
JOIN dim_product dp ON fs.product_id = dp.product_id
GROUP BY dp.category;

14
-- Daily sales summary
SELECT dt.date, COUNT(*) AS total_orders, SUM(fs.total_amount) AS
total_revenue
FROM fact_sales fs
JOIN dim_time dt ON fs.time_id = dt.time_id
GROUP BY dt.date
ORDER BY dt.date;

OUTPUT:

customer_name total_spent
Alice 1200
Bob 1200

category revenue
Electronics
date total_orders 2400 total_revenue
2025-05-01 1 1200
2025-05-02 1 1200

RESULT:

Thus the dimensional modeling was analysed and executed successfully.

15
EX.NO: 7
CASE STUDY USING OLAP
DATE:

AIM:

To study case using OLAP.

INTRODUCTION:

In this case study, we will explore how Online Analytical Processing (OLAP)
technology was implemented in a retail data warehousing environment to improve data
analysis capabilities and support decision-making processes. The case study will focus
on a fictional retail company, XYZ Retail, and the challenges they faced in managing
and analyzing their vast amounts of transactional data.

BACKGROUND:

XYZ Retail is a large chain of stores with locations across the country. The
company has been experiencing rapid growth in recent years, leading to an increase in
the volume of data generated from sales transactions, inventory management, customer
interactions, and other operational activities. The existing data management system
was struggling to keep up with the demand for timely and accurate data analysis,
hindering the company's ability to make informed business decisions.

CHALLENGES:

1. Lack of real-time data analysis: The existing data warehouse system was
unable to provide real-time insights into sales trends, inventory levels, and
customer preferences.
2. Limited scalability: The data warehouse infrastructure was reaching its limits
in terms of storage capacity and processing power, making it difficult to handle
the growing volume of data.
3. Complex data relationships: The data stored in the warehouse was highly
normalized, making it challenging to perform complex queries and analyze data
across multiple dimensions.

SOLUTION:

To address these challenges, XYZ Retail decided to implement an OLAP


solution as part of their data warehousing strategy. OLAP technology allows for
multidimensional analysis of data, enabling users to easily slice and dice information
across various dimensions such as time, product categories, geographic regions, and
customer segments.

16
IMPLEMENTATION:

1. Data modeling: The data warehouse was redesigned using a star schema model,
which simplifies data relationships and facilitates OLAP cube creation.
2. OLAP cube creation: OLAP cubes were created to store pre-aggregated data
for faster query performance. The cubes were designed to support various
dimensions and measures relevant to the retail business.
3. Reporting and analysis: Business users were trained on how to use OLAP
tools to create ad-hoc reports, perform trend analysis, and drill down into
detailed data.

RESULTS:

1. Improved data analysis: With OLAP technology in place, XYZ Retail was
able to perform complex analyses on sales data, identify trends, and make
informed decisions based on real- time insights.
2. Faster query performance: OLAP cubes enabled faster query performance
compared to traditional relational databases, allowing users to retrieve data
more efficiently.
3. Enhanced decision-making: The ability to analyze data across multiple
dimensions helped XYZ Retail gain a deeper understanding of their business
operations and customer behavior, leading to more strategic decision-making.

CONCLUSION:

By leveraging OLAP technology in their data warehousing environment, XYZ


Retail was able to overcome the challenges of managing and analyzing vast amounts
of data. The implementation of OLAP not only improved data analysis capabilities
but also empowered business users to make informed decisions based on real-time
insights. This case study demonstrates the value of OLAP in enhancing data analysis
and decision-making processes in a retail environment.

RESULT:

Thus the case study using OLAP done successfully.

17
EX.NO: 8
CASE STUDY USING OTLP
DATE:

AIM:

To study case using OLTP.

INTRODUCTION:

This case study explores the implementation of the Operational Data Layer
Pattern (OTLP) in a data warehousing environment to improve data integration,
processing, and analytics capabilities. The case study focuses on a fictional company,
Tech Solutions Inc., and how they leveraged OTLP to enhance their data warehousing
operations.

BACKGROUND:

Tech Solutions Inc. is a technology consulting firm that provides IT solutions to


various clients. The company collects a vast amount of data from different sources,
including customer interactions, sales transactions, and operational activities. The
existing data warehouse infrastructure was struggling to handle the growing volume of
data and provide real-time insights for decision-making.

CHALLENGES:

1. Data silos: Data from different sources were stored in separate silos, making it
difficult to integrate and analyze data effectively.
2. Real-time data processing: The existing data warehouse was not capable of
processing real-time data streams, leading to delays in data analysis and
decision-making.
3. Scalability: The data warehouse infrastructure was reaching its limits in terms
of storage capacity and processing power, hindering the company's ability to
scale with the growing data volume.

SOLUTION:

To address these challenges, Tech Solutions Inc. decided to implement the


OTLP pattern in their data warehousing environment. OTLP combines elements of
both Operational Data Store (ODS) and Traditional Data Warehouse (TDW)
architectures to enable real-time data processing, data integration, and analytical
capabilities.

18
IMPLEMENTATION:

1. Data integration: Tech Solutions Inc. integrated data from various sources into
the operational data layer, where data transformations and cleansing processes
were applied.
2. Real-time processing: The OTLP architecture allowed for real-time data
processing, enabling the company to analyze streaming data and generate
insights in near real-time.
3. Analytics and reporting: Business users were provided with self-service
analytics tools to create ad-hoc reports, perform trend analysis, and gain
actionable insights from the integrated data.

RESULTS:

1. Improved data integration: The OTLP architecture facilitated seamless


integration of data from multiple sources, breaking down data silos and
enabling a unified view of the company's operations.
2. Real-time analytics: With OTLP in place, Tech Solutions Inc. was able to
analyze streaming data in real-time, allowing for faster decision-making and
response to market trends.
3. Scalability: The OTLP architecture provided scalability to handle the growing
volume of data, ensuring that the company's data warehousing operations
could support future growth.

CONCLUSION:

By implementing the Operational Data Layer Pattern (OTLP) in their data


warehousing environment, Tech Solutions Inc. was able to overcome the challenges
of data silos, real-time data processing, and scalability. The adoption of OTLP not
only improved data integration and analytics capabilities but also empowered
business users to make informed decisions based on real-time insights. This case
study highlights the benefits of leveraging OTLP in enhancing data warehousing
operations for improved business outcomes.

RESULT:

Thus the case study using OTLP done successfully.

19
EX.NO: 9
IMPLEMENTATION OF WAREHOUS TESTING
DATE:

AIM:

To implement warehouse testing.

ALGORITHM:

1. Create a staging table to hold raw sales data (like customer, product, quantity,
price, and date).
2. Insert sample sales data into the staging table.
3. Create dimension tables to store unique customers, products, and dates with
IDs.
4. Create a fact table to store actual sales records linked to dimension tables using
IDs.
5. Insert data into dimension tables for known customers, products, and dates.
6. Load data into the fact table by mapping customer, product, and date from
staging to the corresponding IDs, and calculate total amount (quantity × price).
7. Run tests to check if:
• The number of rows matches between staging and fact tables.
• Total amounts are calculated correctly.
• All foreign key links work (joins with dimension tables).
• Sales totals per customer are accurate.

PROGRAM:

-- STEP 1: Drop tables if they exist (optional for clean run)

DROP TABLE IF EXISTS fact_sales;


DROP TABLE IF EXISTS dim_customer;
DROP TABLE IF EXISTS dim_product;
DROP TABLE IF EXISTS dim_time;
DROP TABLE IF EXISTS staging_sales;

-- STEP 2: Create Staging Table (Source System Simulation)

CREATE TABLE staging_sales (


sale_id INT,
customer_name VARCHAR(100),
product_name VARCHAR(100),
quantity INT,
price DECIMAL(10,2),
sale_date DATE
);

20
-- Insert sample data into staging
INSERT INTO staging_sales VALUES
(1, 'Alice', 'Laptop', 2, 1000.00, '2025-05-01'),
(2, 'Bob', 'Phone', 1, 600.00, '2025-05-02');

-- STEP 3: Create Dimension Tables

CREATE TABLE dim_customer (


customer_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);

CREATE TABLE dim_product (


product_id INT PRIMARY KEY,
product_name VARCHAR(100),
price DECIMAL(10,2)
);

CREATE TABLE dim_time (


time_id INT PRIMARY KEY,
sale_date DATE
);

-- STEP 4: Create Fact Table

CREATE TABLE fact_sales (


sale_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
time_id INT,
quantity INT,
total_amount DECIMAL(10,2),
FOREIGN KEY (customer_id) REFERENCES dim_customer(customer_id),
FOREIGN KEY (product_id) REFERENCES dim_product(product_id),
FOREIGN KEY (time_id) REFERENCES dim_time(time_id)
);

-- STEP 5: Populate Dimension Tables

INSERT INTO dim_customer VALUES (1, 'Alice'), (2, 'Bob');


INSERT INTO dim_product VALUES (101, 'Laptop', 1000.00), (102, 'Phone',
600.00);
INSERT INTO dim_time VALUES (1, '2025-05-01'), (2, '2025-05-02');

21
-- STEP 6: ETL - Load Fact Table from Staging

INSERT INTO fact_sales (


sale_id, customer_id, product_id, time_id, quantity, total_amount
)
VALUES
(1, 1, 101, 1, 2, 2 * 1000.00),
(2, 2, 102, 2, 1, 1 * 600.00);

-- STEP 7: TESTING

-- Test 1: Row count match


SELECT
(SELECT COUNT(*) FROM staging_sales) AS staging_count,
(SELECT COUNT(*) FROM fact_sales) AS fact_count;

-- Test 2: Validate total amount


SELECT fs.sale_id, fs.quantity, fs.total_amount,
dp.price,
fs.quantity * dp.price AS expected_amount
FROM fact_sales fs
JOIN dim_product dp ON fs.product_id = dp.product_id;

-- Test 3: Foreign key joins


SELECT fs.sale_id, dc.customer_name, dp.product_name, dt.sale_date
FROM fact_sales fs
JOIN dim_customer dc ON fs.customer_id = dc.customer_id
JOIN dim_product dp ON fs.product_id = dp.product_id
JOIN dim_time dt ON fs.time_id = dt.time_id;

-- Test 4: Sales summary per customer


SELECT dc.customer_name, SUM(fs.total_amount) AS total_sales
FROM fact_sales fs
JOIN dim_customer dc ON fs.customer_id = dc.customer_id
GROUP BY dc.customer_name;

22
OUTPUT:

staging_count fact_count
2 2

sale_id quantity total_amount price expected_amount


1 2 2000 1000 2000
2 1 600 600 600

sale_id customer_name product_name sale_date


1 Alice Laptop 2025-05-01
2 Bob Phone 2025-05-02

customer_name total_sales
Alice 2000
Bob 600

RESULT:

Thus the implementation of warehouse testing was executed successfully.

23

You might also like