0% found this document useful (0 votes)

16 views6 pages

DWDM Unit-2

Data Warehouse and Data Mining unit 2 notes

Uploaded by

Pramath Pathak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views6 pages

DWDM Unit-2

Data Warehouse and Data Mining unit 2 notes

Uploaded by

Pramath Pathak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

DWDM UNIT-2

Summary Table: Components Of Data Mining Architecture

Component Description

Data Sources Raw input data repositories

Data Warehouse Centralized data storage for analysis

Data Preprocessing Cleaning, transforming, and selecting data

Data Mining Engine Core engine that applies mining algorithms

Pattern Evaluation Filters and evaluates interesting patterns

Knowledge Base Domain knowledge and metadata support

User Interface Interaction layer for users

Market Basket Analysis
A data mining technique that is used to uncover purchase patterns in any retail setting is
known as Market Basket Analysis. Basically, market basket analysis in data mining
involves analyzing the combinations of products that are bought together.

This is a technique that gives the careful study of purchases done by a customer in a
supermarket. This concept identifies the pattern of frequent purchase items by
customers. This analysis can help to promote deals, offers, sale by the companies, and
data mining techniques helps to achieve this analysis task.

Types of Market Basket Analysis

There are three types of Market Basket Analysis. They are as follow:
1. Descriptive market basket analysis: This sort of analysis looks for patterns and
connections in the data that exist between the components of a market basket.
This kind of study is mostly used to understand consumer behavior, including
what products are purchased in combination and what the most typical item
combinations. Retailers can place products in their stores more profitably by
understanding which products are frequently bought together with the aid of
descriptive market basket analysis.

2. Predictive Market Basket Analysis: Market basket analysis that predicts future
purchases based on past purchasing patterns is known as predictive market
basket analysis. Large volumes of data are analyzed using machine learning
algorithms in this sort of analysis in order to create predictions about which
products are most likely to be bought together in the future. Retailers may make
data-driven decisions about which products to carry, how to price them, and
how to optimize shop layouts with the use of predictive market basket research.

3. Differential Market Basket Analysis: Differential market basket analysis

analyses two sets of market basket data to identify variations between them.
Comparing the behavior of various client segments or the behavior of customers
over time is a common usage for this kind of study. Retailers can respond to
shifting consumer behavior by modifying their marketing and sales tactics with
the help of differential market basket analysis.

Benefits of Market Basket Analysis

1. Enhanced Customer Understanding

2. Improved Inventory Management

3. Better Pricing Strategies

4. Sales Growth

Measures Of Central Tendency

Parallel Processors and Cluster Systems in Data Warehouse Process
Technology

In data warehouse technology, Parallel Processors and Cluster Systems play a

crucial role in enhancing performance, scalability, and reliability.

Parallel Processing involves dividing large tasks into smaller sub-tasks and
executing them simultaneously across multiple processors. This significantly
speeds up data loading, querying, and analysis in data warehouses. Parallel
processing systems include:

• Shared Memory Systems, where all processors share a global memory.

• Shared Nothing Systems, where each processor has its own memory and disk,
suitable for large-scale data warehousing.

Cluster Systems consist of multiple interconnected computers (nodes) that work

together as a single system. These are cost-effective and offer high availability. If
one node fails, others can take over, ensuring uninterrupted operations. Clusters
support load balancing and failover mechanisms, which are essential for
managing large volumes of data in real time.

Both technologies allow data warehouses to handle massive datasets efficiently,

support complex queries, and scale horizontally by adding more processors or
nodes. They are essential for modern Business Intelligence (BI) and Big Data
analytics platforms.

In summary, parallel processors and cluster systems form the backbone of high-
performance data warehouses, enabling fast, reliable, and scalable data
processing.
Warehousing Software and Warehouse Schema Design in Data Warehouse
Process Technology

Warehousing Software provides the tools needed to build, manage, and access a
data warehouse. It supports data integration, extraction, transformation, loading
(ETL), query processing, and reporting. Popular warehousing software includes
Microsoft SQL Server, Oracle Warehouse Builder, Informatica, and Snowflake. These
tools ensure efficient data storage, real-time access, and business intelligence
support.

Warehouse Schema Design refers to how data is logically structured in the

warehouse. It affects query performance and data organization. The main types of
schemas are:

• Star Schema: Central fact table linked to multiple dimension tables. Simple and
fast for queries.

• Snowflake Schema: Extension of star schema with normalized dimensions.

Saves storage but more complex.

• Galaxy (Fact Constellation) Schema: Contains multiple fact tables sharing

dimension tables. Used for complex applications.

Effective schema design ensures efficient data retrieval, minimizes redundancy, and
improves performance.

In summary, warehousing software handles the technical operations of the

warehouse, while schema design structures the data logically for optimized access
and analysis. Both are essential for a functional and scalable data warehouse
system.

a. Differentiate between:

(i) Min-Max Normalization vs Z-score Normalization

Aspect Min-Max Normalization Z-score Normalization

Transforms data based on

Scales data to a fixed range, usually
Definition mean and standard
[0, 1]
deviation

X′=X−XminXmax−XminX' = \frac{X -
Z=X−μσZ = \frac{X -
Formula X_{min}}{X_{max} - X_{min}}X′=Xmax
\mu}{\sigma}Z=σX−μ
−XminX−Xmin
Aspect Min-Max Normalization Z-score Normalization

When data distribution

Use Case When data range is known and fixed
needs to be standardized

If X = 80, min = 50, max = 100 ⇒ X' = If X = 80, μ = 70, σ = 10 ⇒ Z

Example
(80-50)/(100-50) = 0.6 = (80-70)/10 = 1.0

(ii) Binary Data Variables vs Nominal Data Variables

Aspect Binary Variables Nominal Variables

Variables with only two Categorical variables with more

Definition
values (0 or 1) than two values

True/False, Yes/No, Red, Blue, Green; Apple, Orange,

Values
Male/Female Banana

Example Gender: Male (1), Female (0) Color: Red, Green, Blue

DMBI Answers
No ratings yet
DMBI Answers
61 pages
1.3 Tasks of Data Mining
No ratings yet
1.3 Tasks of Data Mining
10 pages
Data Mining
No ratings yet
Data Mining
14 pages
Proposal
No ratings yet
Proposal
5 pages
6 1 DWM 2019 S
No ratings yet
6 1 DWM 2019 S
7 pages
Business Analytics (MIS171) Summary Notes
No ratings yet
Business Analytics (MIS171) Summary Notes
6 pages
The Need of Data Analysis
No ratings yet
The Need of Data Analysis
12 pages
Abinitio Vijay - 8553385664
No ratings yet
Abinitio Vijay - 8553385664
28 pages
R21 DM Unit1
No ratings yet
R21 DM Unit1
77 pages
Bana Midterm
No ratings yet
Bana Midterm
6 pages
Data Warehousing and Data Mining: DR Seema Agarwal
No ratings yet
Data Warehousing and Data Mining: DR Seema Agarwal
72 pages
Unit 2
No ratings yet
Unit 2
37 pages
Unit 3 Data Warehousing and Data Mining
No ratings yet
Unit 3 Data Warehousing and Data Mining
7 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
Unit 3 PPT (BA)
No ratings yet
Unit 3 PPT (BA)
19 pages
Dmbi
No ratings yet
Dmbi
9 pages
Business Intelligence For Big Data Analytics
No ratings yet
Business Intelligence For Big Data Analytics
8 pages
DWM Chp2 Notes
No ratings yet
DWM Chp2 Notes
21 pages
Unit - 1
No ratings yet
Unit - 1
32 pages
AIML Unit 2 Understanding Data
No ratings yet
AIML Unit 2 Understanding Data
51 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Data Mining Questions
No ratings yet
Data Mining Questions
2 pages
Data Mining for Tech Enthusiasts
No ratings yet
Data Mining for Tech Enthusiasts
61 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
Solutions To DM I MID (A)
100% (1)
Solutions To DM I MID (A)
19 pages
Chapter-04-Analisis Dan Drfinisi Kebutuhan Datawarehouse
No ratings yet
Chapter-04-Analisis Dan Drfinisi Kebutuhan Datawarehouse
56 pages
DS Mini
No ratings yet
DS Mini
3 pages
Build The Models
No ratings yet
Build The Models
7 pages
Data Mining: Techniques & Applications
No ratings yet
Data Mining: Techniques & Applications
21 pages
Exploratory Data Analysis (Eda)
No ratings yet
Exploratory Data Analysis (Eda)
10 pages
Assignment 2
No ratings yet
Assignment 2
6 pages
Unit 2
No ratings yet
Unit 2
144 pages
Bana1 Visualization
No ratings yet
Bana1 Visualization
22 pages
Ch.3 Data Preprocessing
No ratings yet
Ch.3 Data Preprocessing
16 pages
Unit 4
No ratings yet
Unit 4
42 pages
What Is Data Mining
No ratings yet
What Is Data Mining
10 pages
Pptcs 1661
No ratings yet
Pptcs 1661
38 pages
Chapter-1 Introduction To Data Analytics
No ratings yet
Chapter-1 Introduction To Data Analytics
34 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
Class Test-1: Manpreet Singh 2K19/DMBA/48 Ans 1)
No ratings yet
Class Test-1: Manpreet Singh 2K19/DMBA/48 Ans 1)
2 pages
Data Warehousing and Data Mining - Thara - M.Tech Cse
No ratings yet
Data Warehousing and Data Mining - Thara - M.Tech Cse
11 pages
A Material
No ratings yet
A Material
191 pages
DAI101 4 Data Preparation
No ratings yet
DAI101 4 Data Preparation
45 pages
Chap 2 Dimensional Modelling
No ratings yet
Chap 2 Dimensional Modelling
59 pages
AD3301 Data Exploration and Visualization
No ratings yet
AD3301 Data Exploration and Visualization
278 pages
Advanced Databases: Data Warehousing & OLAP
No ratings yet
Advanced Databases: Data Warehousing & OLAP
34 pages
Comprehensive Guide to Data Analytics
No ratings yet
Comprehensive Guide to Data Analytics
4 pages
Dmbi Ia2 Ans
No ratings yet
Dmbi Ia2 Ans
17 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
Unit - I
No ratings yet
Unit - I
65 pages
Business Intelligence, Data Analytics and Reporting Training
No ratings yet
Business Intelligence, Data Analytics and Reporting Training
5 pages
DCS802DataMiningProject PDF
No ratings yet
DCS802DataMiningProject PDF
10 pages
Unit - 2 Data Warehouse
No ratings yet
Unit - 2 Data Warehouse
11 pages
Data Preparation For Analytics Using SAS
100% (1)
Data Preparation For Analytics Using SAS
440 pages
Characteristics of The Data: Unprocessed, Unorganised and Discrete
No ratings yet
Characteristics of The Data: Unprocessed, Unorganised and Discrete
4 pages
Techniques Used in BI: Data Visualization
No ratings yet
Techniques Used in BI: Data Visualization
2 pages
BI NEP Unit 2
No ratings yet
BI NEP Unit 2
22 pages
Data Mining Concepts & Techniques
No ratings yet
Data Mining Concepts & Techniques
28 pages
B Buletin K3 Februari 2023
No ratings yet
B Buletin K3 Februari 2023
4 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
3 pages
TCS Azure Data Engineer Interview Questions and Answers
No ratings yet
TCS Azure Data Engineer Interview Questions and Answers
7 pages
Answer All Questions, Each Carries 4 Marks
No ratings yet
Answer All Questions, Each Carries 4 Marks
3 pages
Emerging Technology Final Exam
100% (4)
Emerging Technology Final Exam
6 pages
JCL Notes
No ratings yet
JCL Notes
24 pages
Literature Review On Information Retrieval System
50% (2)
Literature Review On Information Retrieval System
5 pages
Joint Research Proposal Call 2025
No ratings yet
Joint Research Proposal Call 2025
3 pages
ICDAR2021-Information Extraction From Invoices
No ratings yet
ICDAR2021-Information Extraction From Invoices
17 pages
Ritika Mishra Resume
No ratings yet
Ritika Mishra Resume
2 pages
Cosc 442
No ratings yet
Cosc 442
3 pages
AI & ML: A Beginner's Guide
No ratings yet
AI & ML: A Beginner's Guide
23 pages
Semantic Relations in NLP
No ratings yet
Semantic Relations in NLP
6 pages
Dr.D.karthika Renuka CV
No ratings yet
Dr.D.karthika Renuka CV
17 pages
B49 - Experiment No.1 (DWM)
No ratings yet
B49 - Experiment No.1 (DWM)
3 pages
B.tech. 3rd Yr CSE (IOT) 2022 23 Revised
No ratings yet
B.tech. 3rd Yr CSE (IOT) 2022 23 Revised
32 pages
100 Important Questions With Solutions For Data Warehousing & Data Mining (BCS058)
No ratings yet
100 Important Questions With Solutions For Data Warehousing & Data Mining (BCS058)
119 pages
Cartography Thesis
100% (2)
Cartography Thesis
7 pages
AI-based Hotel Recommendation System
No ratings yet
AI-based Hotel Recommendation System
30 pages
BBS140 Course Outline
No ratings yet
BBS140 Course Outline
4 pages
Data Quality Framework Eu Medicines Regulation - en
No ratings yet
Data Quality Framework Eu Medicines Regulation - en
42 pages
Chapter 2 - Types of Digital Data
No ratings yet
Chapter 2 - Types of Digital Data
12 pages
Msit 102
No ratings yet
Msit 102
306 pages
Note - 3, 18 Jan 2025
No ratings yet
Note - 3, 18 Jan 2025
1 page
Data Security for Developers
No ratings yet
Data Security for Developers
2 pages
100+ Spring Boot Interview Questions and Answers For 2025
No ratings yet
100+ Spring Boot Interview Questions and Answers For 2025
43 pages
Fraud Calls Detection Software
No ratings yet
Fraud Calls Detection Software
4 pages
Advanced OS & Security Courses
No ratings yet
Advanced OS & Security Courses
9 pages
Forest Fire Prediction Using Machine Learning
No ratings yet
Forest Fire Prediction Using Machine Learning
28 pages
5th Semester B.Tech AI&DS Syllabus
No ratings yet
5th Semester B.Tech AI&DS Syllabus
30 pages

DWDM Unit-2

Uploaded by

DWDM Unit-2

Uploaded by

DWDM UNIT-2

Summary Table: Components Of Data Mining Architecture

Data Sources Raw input data repositories

Data Warehouse Centralized data storage for analysis

Data Preprocessing Cleaning, transforming, and selecting data

Data Mining Engine Core engine that applies mining algorithms

Pattern Evaluation Filters and evaluates interesting patterns

Knowledge Base Domain knowledge and metadata support

User Interface Interaction layer for users

Types of Market Basket Analysis

3. Differential Market Basket Analysis: Differential market basket analysis

Benefits of Market Basket Analysis

1. Enhanced Customer Understanding

2. Improved Inventory Management

3. Better Pricing Strategies

Measures Of Central Tendency

In data warehouse technology, Parallel Processors and Cluster Systems play a

• Shared Memory Systems, where all processors share a global memory.

Cluster Systems consist of multiple interconnected computers (nodes) that work

Both technologies allow data warehouses to handle massive datasets efficiently,

Warehouse Schema Design refers to how data is logically structured in the

• Snowflake Schema: Extension of star schema with normalized dimensions.

• Galaxy (Fact Constellation) Schema: Contains multiple fact tables sharing

In summary, warehousing software handles the technical operations of the

(i) Min-Max Normalization vs Z-score Normalization

Aspect Min-Max Normalization Z-score Normalization

Transforms data based on

When data distribution

If X = 80, min = 50, max = 100 ⇒ X' = If X = 80, μ = 70, σ = 10 ⇒ Z

(ii) Binary Data Variables vs Nominal Data Variables

Aspect Binary Variables Nominal Variables

Variables with only two Categorical variables with more

True/False, Yes/No, Red, Blue, Green; Apple, Orange,

You might also like