UNIT -3 Mining Association Rules in large databases
1. Association Rule Mining
Description:
Association rule mining is a technique used in data mining to find relationships between
variables in large databases. It identifies if-then rules that show how the occurrence of one
item is related to the occurrence of another item.
Application:
Retail: To find patterns like if a customer buys bread, they are likely to buy butter.
Example:
Rule: If a customer buys diapers, then they also buy beer.
Support: 1% of all transactions include both items.
Confidence: 50% of the transactions that include diapers also include beer.
2. Mining Methods
Description:
Different algorithms and techniques are used to discover association rules, like Apriori, FP-
Growth, and Eclat.
Application:
Apriori: Used to find frequent item sets.
FP-Growth: A faster alternative to Apriori.
Eclat: Uses a depth-first search to find frequent item sets.
Example:
Apriori: It starts with single items and extends them step by step.
3. Mining Multi-Level Association Rules
Description:
These rules are discovered at multiple levels of abstraction, such as finding patterns at
category levels like electronics -> computers -> laptops.
Application:
E-commerce: To find buying patterns across different product categories and
subcategories.
Example:
Rule: If a customer buys electronics, they are likely to buy a computer, and if they buy a
computer, they might buy a laptop.
Diagram:
A tree structure where the root is a broad category (electronics) and branches to
subcategories (computers, laptops).
4. Multi-Dimensional Association Rule Mining
Description:
Association rules that involve more than one dimension or attribute, such as time, location,
or customer demographic.
Application:
Market Basket Analysis: To find patterns like customers buying certain products in certain
seasons or regions.
Example:
Rule: If a customer from New York buys a coat in December, they are likely to buy gloves.
5. Mining Correlation Analysis
Description:
Finding patterns where item sets have a correlation, not just co-occurrence.
Application:
Healthcare: To find correlations between symptoms and diseases.
Example:
Rule: Patients with symptom A are likely to have disease B.
6. Constraint-Based Association Mining
Description:
Mining association rules with certain constraints to make the process efficient and
relevant, such as constraints on item sets’ size, support, or confidence.
Application:
Inventory Management: To find association rules that only involve items with high sales.
Example:
Rule: Only consider items that appear in at least 5% of transactions.
7. Multidimensional Association Rules from Relational Databases (DBS) and Data
Warehouses (DWS)
Description:
Finding association rules in databases and data warehouses that have multiple
dimensions and large volumes of data.
Application:
Business Intelligence: To analyze sales data across different dimensions like time,
geography, and product lines.
Example:
Rule: Sales of product X in the eastern region increase in the first quarter.
Diagram:
A cube representing a data warehouse with dimensions like time, location, and product.
8. Correlation Analysis
Description:
Analyzing how closely two variables are related, not just their co-occurrence.
Application:
Finance: To find correlations between stock prices and economic indicators.
Example:
Rule: There is a high correlation between the stock price of company A and the overall
market index.
9. Constraint-Based Association Mining (repeated, covered earlier)
Description:
Using specific criteria (constraints) to focus on the most relevant patterns in data mining.
Types of Constraints:
1. Data Constraints: Limit the data scope (e.g., by time or location).
2. Rule Constraints: Define conditions for the rules (e.g., must include certain items).
3. Length Constraints: Set the minimum/maximum number of items in itemsets.
4. Support Constraints: Ensure itemsets appear frequently enough.
5. Confidence Constraints: Ensure rules have high reliability.
Applications and Examples:
1. Retail:
- Constraint: Only holiday season transactions.
- Rule: If a customer buys gift wrap, they are likely to buy a greeting card (support > 5%,
confidence > 70%).
2. Healthcare:
- Constraint: Patients with chronic conditions.
- Rule: If a patient reports fatigue, they likely have anemia (confidence > 80%).
3. Finance:
- Constraint: Transactions over $500.
- Rule: If a transaction happens at an ATM, another is likely within an hour (support > 3%,
confidence > 60%).
10. Mining Frequent Patterns
Description:
Identifying patterns that appear frequently within a dataset.
Application:
Retail: To find frequently bought together items.
Example:
Rule: Bread and butter are frequently bought together.
Diagram:
A bar graph showing the frequency of different item sets.
11. Mining Various Kinds of Association Rules
Description:
Mining not only traditional association rules but also negative association rules (items that
do not co-occur), temporal rules (time-based), etc.
Application:
Retail: To find items that are rarely bought together.
Example:
Rule: Customers who buy coffee rarely buy tea.