DWDM UNIT-2
Summary Table: Components Of Data Mining Architecture
Component           Description
Data Sources        Raw input data repositories
Data Warehouse      Centralized data storage for analysis
Data Preprocessing Cleaning, transforming, and selecting data
Data Mining Engine Core engine that applies mining algorithms
Pattern Evaluation Filters and evaluates interesting patterns
Knowledge Base      Domain knowledge and metadata support
User Interface      Interaction layer for users
                            Market Basket Analysis
A data mining technique that is used to uncover purchase patterns in any retail setting is
known as Market Basket Analysis. Basically, market basket analysis in data mining
involves analyzing the combinations of products that are bought together.
This is a technique that gives the careful study of purchases done by a customer in a
supermarket. This concept identifies the pattern of frequent purchase items by
customers. This analysis can help to promote deals, offers, sale by the companies, and
data mining techniques helps to achieve this analysis task.
Types of Market Basket Analysis
There are three types of Market Basket Analysis. They are as follow:
   1. Descriptive market basket analysis: This sort of analysis looks for patterns and
      connections in the data that exist between the components of a market basket.
      This kind of study is mostly used to understand consumer behavior, including
      what products are purchased in combination and what the most typical item
      combinations. Retailers can place products in their stores more profitably by
      understanding which products are frequently bought together with the aid of
      descriptive market basket analysis.
   2. Predictive Market Basket Analysis: Market basket analysis that predicts future
      purchases based on past purchasing patterns is known as predictive market
      basket analysis. Large volumes of data are analyzed using machine learning
      algorithms in this sort of analysis in order to create predictions about which
      products are most likely to be bought together in the future. Retailers may make
      data-driven decisions about which products to carry, how to price them, and
      how to optimize shop layouts with the use of predictive market basket research.
   3. Differential Market Basket Analysis: Differential market basket analysis
      analyses two sets of market basket data to identify variations between them.
      Comparing the behavior of various client segments or the behavior of customers
      over time is a common usage for this kind of study. Retailers can respond to
      shifting consumer behavior by modifying their marketing and sales tactics with
      the help of differential market basket analysis.
Benefits of Market Basket Analysis
   1. Enhanced Customer Understanding
   2. Improved Inventory Management
   3. Better Pricing Strategies
   4. Sales Growth
                             Measures Of Central Tendency
Parallel Processors and Cluster Systems in Data Warehouse Process
Technology
In data warehouse technology, Parallel Processors and Cluster Systems play a
crucial role in enhancing performance, scalability, and reliability.
Parallel Processing involves dividing large tasks into smaller sub-tasks and
executing them simultaneously across multiple processors. This significantly
speeds up data loading, querying, and analysis in data warehouses. Parallel
processing systems include:
•   Shared Memory Systems, where all processors share a global memory.
•   Shared Nothing Systems, where each processor has its own memory and disk,
    suitable for large-scale data warehousing.
Cluster Systems consist of multiple interconnected computers (nodes) that work
together as a single system. These are cost-effective and offer high availability. If
one node fails, others can take over, ensuring uninterrupted operations. Clusters
support load balancing and failover mechanisms, which are essential for
managing large volumes of data in real time.
Both technologies allow data warehouses to handle massive datasets efficiently,
support complex queries, and scale horizontally by adding more processors or
nodes. They are essential for modern Business Intelligence (BI) and Big Data
analytics platforms.
In summary, parallel processors and cluster systems form the backbone of high-
performance data warehouses, enabling fast, reliable, and scalable data
processing.
Warehousing Software and Warehouse Schema Design in Data Warehouse
Process Technology
Warehousing Software provides the tools needed to build, manage, and access a
data warehouse. It supports data integration, extraction, transformation, loading
(ETL), query processing, and reporting. Popular warehousing software includes
Microsoft SQL Server, Oracle Warehouse Builder, Informatica, and Snowflake. These
tools ensure efficient data storage, real-time access, and business intelligence
support.
Warehouse Schema Design refers to how data is logically structured in the
warehouse. It affects query performance and data organization. The main types of
schemas are:
•   Star Schema: Central fact table linked to multiple dimension tables. Simple and
    fast for queries.
•   Snowflake Schema: Extension of star schema with normalized dimensions.
    Saves storage but more complex.
•   Galaxy (Fact Constellation) Schema: Contains multiple fact tables sharing
    dimension tables. Used for complex applications.
Effective schema design ensures efficient data retrieval, minimizes redundancy, and
improves performance.
In summary, warehousing software handles the technical operations of the
warehouse, while schema design structures the data logically for optimized access
and analysis. Both are essential for a functional and scalable data warehouse
system.
a. Differentiate between:
(i) Min-Max Normalization vs Z-score Normalization
Aspect         Min-Max Normalization                   Z-score Normalization
                                                       Transforms data based on
               Scales data to a fixed range, usually
Definition                                             mean and standard
               [0, 1]
                                                       deviation
               X′=X−XminXmax−XminX' = \frac{X -
                                                       Z=X−μσZ = \frac{X -
Formula        X_{min}}{X_{max} - X_{min}}X′=Xmax
                                                       \mu}{\sigma}Z=σX−μ
               −XminX−Xmin
Aspect        Min-Max Normalization                    Z-score Normalization
                                                       When data distribution
Use Case      When data range is known and fixed
                                                       needs to be standardized
              If X = 80, min = 50, max = 100 ⇒ X' =    If X = 80, μ = 70, σ = 10 ⇒ Z
Example
              (80-50)/(100-50) = 0.6                   = (80-70)/10 = 1.0
(ii) Binary Data Variables vs Nominal Data Variables
Aspect        Binary Variables                  Nominal Variables
              Variables with only two           Categorical variables with more
Definition
              values (0 or 1)                   than two values
              True/False, Yes/No,               Red, Blue, Green; Apple, Orange,
Values
              Male/Female                       Banana
Example       Gender: Male (1), Female (0)      Color: Red, Green, Blue