0% found this document useful (0 votes)

22 views27 pages

Module1 1 Introduction

The document outlines a syllabus for a course on Data Mining and Warehousing, covering topics such as data preprocessing, association rules, classification, clustering, and data mining functionalities. It emphasizes the importance of data mining in extracting knowledge from large datasets and includes references to core texts and additional resources. The document also details the stages of the knowledge discovery process and various types of data that can be mined.

Uploaded by

donmathew666666

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views27 pages

Module1 1 Introduction

Uploaded by

donmathew666666

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

CP1444: DATA MINING &

WAREHOUSING
SYLLABUS
Module I:
Introduction-: Introduction: -Data, Information, Knowledge, KDD, types of data for mining,
technologies for mining, issues in data mining, data mining functionalities/tasks. Data pre-
processingoverview, Data cleaning, Data integration, Data reduction, Data transformation and
discretization. Data Warehouses-basic concepts, Data Mart, Databases Vs Data warehouses, Data
ware houses Vs Data mart, OLTP Vs OLAP, OLAP operations/functions, OLAP Multi-Dimensional
Models- Data cubes, Star, Snow Flakes, Fact constellation data models.
Module II:
Association rules- Market Basket Analysis, Frequent Item sets, Closed Item sets, and Association
Rules, Frequent Item sets Mining Methods- Apriori Algorithm: Finding Frequent Itemset by
Confined Candidate Generation, Generating Association Rules from Frequent item sets, Improving
the Efficiency of Apriori.

2
Module III:

Classification– Basic Concepts, Decision Tree Induction, Bayesian Classification, Rule Based
Classification, Classification by Back propagation, Support Vector Machines, Associative Classification,
Lazy Learners

Module IV:

Clustering- Cluster analysis: definition and Requirements, Characteristics of clustering techniques, Types
of data in cluster analysis, Overview of Basic Clustering Methods, Partitioning methodsK-Means and K -
medoid methods, Outlier detection- definition and types of outliers, Outlier Detection Methods-
Supervised, Semi-Supervised, and Unsupervised Methods, Statistical Methods, Proximity-Based
Methods, and Clustering-Based Methods (basic concepts only)

3
CORE TEXT

1. Jiawei Han & Micheline Kamber & Jian Pei Data Mining Concepts &
Techniques

https://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/The-Morgan-Ka
ufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Ji
an-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-
2011.pdf

ADDITIONAL REFERENCES

2. Sunitha Tiwari & Neha Chaudary, Data Mining and Warehousing, Dhanpat
Rai & Co. 4
Reference :
Chapter 1

https://hanj.cs.illinois.edu/bk3/
bk3_slidesindex.htm

5
WHAT IS DATA MINING?

• To refer to the mining of gold from rocks or sand, we say gold mining
instead of rock or sand mining.
• Analogously, data mining should have been more appropriately named
“knowledge mining from data,” which is unfortunately somewhat long.
• Data mining refers to extracting or mining knowledge from large
amounts of data.

6
WHAT IS DATA MINING?

• It is the computational process of discovering patterns in large data sets involving

methods at the intersection of artificial intelligence, machine learning, statistics, and
database systems.
• The overall goal of the data mining process is to extract information from a data set
and transform it into an understandable structure for further use.
• The key properties of data mining are
• Automatic discovery of patterns
• Prediction of likely outcomes
• Creation of actionable information
• Focus on large datasets and databases

7
Data

• Data is a collection of facts, such as numbers, words, measurements,

observations, or just descriptions of things.
• Data can be qualitative or quantitative.
• Qualitative data is descriptive information (it describes something)
• Quantitative data is numerical information (numbers)

8
Information
• The raw data is collected, and after processing this raw data, the
outcome is information.
• This information can be defined as when the data is processed,
organized, and presented in a specific context to serve its use is
called information.
• The information doesn’t have any existence without data, most
information has to measure units like quantity, time, etc. There are
also a lot of differences between data and information. For
information to be useful, the process data has the following
characteristics which are:

9
• Time – Information should be available at any point in time whenever it is
required.
• Accuracy – Information should be actual and organized only then it can serve
its purpose.
• Completeness – Information should be finite and consistent.

• Some examples of information :

1. Information about transportation systems such as train schedules.
2. Geographical information such as direction.
3. Payslips
4. Bank passbook
5. Printed documents.

10
Knowledge
• Knowledge is information that has been processed, organized, or
structured in some way, or put into practice in some way.
• Knowledge means the familiarity and awareness of a person, place,
events, ideas, issues, ways of doing things or anything else, which is
gathered through learning, perceiving or discovering.
• It is the state of knowing something with cognizance through the
understanding of concepts, study and experience.

11
Why we need Data Mining?

Volume of information is increasing everyday that we can handle from business

transactions, scientific data, sensor data, Pictures, videos, etc. So, we need a system that will
be capable of extracting essence of information available and that can automatically
generate report,views or summary of data for better decision-making.

Why Data Mining is used in Business?

Data mining is used in business to make better managerial decisions by:

1. Automatic summarization of data

2. Extracting essence of information stored.
3. Discovering patterns in raw data.
12
KNOWLEDGE DISCOVERY FROM DATA, OR KDD

• KDD (Knowledge Discovery in Databases) is a process that involves

the extraction of useful, previously unknown, and potentially
valuable information from large datasets.

• The KDD process in data mining typically involves the following

steps:

13
KNOWLEDGE DISCOVERY FROM DATA, OR KDD
different I. Data cleaning (to remove noise and inconsistent data)
forms of
data II. Data integration (where multiple data sources may be combined)
preprocessi III. Data selection (where data relevant to the analysis task are retrieved from the database)
ng
IV. Data transformation (where data are transformed and consolidated into forms appropriate
for mining by performing summary or aggregation operations)
V. Data mining (an essential process where intelligent methods are applied to extract data
patterns)
VI. Pattern evaluation (to identify the truly interesting patterns representing knowledge based
on interestingness measures)
VII. Knowledge presentation (where visualization and knowledge representation techniques
are used to present mined knowledge to users)

The term data mining is often used to refer to the entire knowledge discovery process
14

PYQ: What is data mining? Outline the stages in the knowledge discovery process. (1
15
1. Selection: Select a relevant subset of the data for analysis.

2. Pre-processing: Clean and transform the data to make it ready for analysis. This may include tasks such
as data normalization, missing value handling, and data integration.

3. Transformation: Transform the data into a format suitable for data mining, such as a matrix or a graph.

4. Data Mining: Apply data mining techniques and algorithms to the data to extract useful information and
insights. This may include tasks such as clustering, classification, association rule mining, and anomaly
detection.

5. Interpretation: Interpret the results and extract knowledge from the data. This may include tasks such as
visualizing the results, evaluating the quality of the discovered patterns and identifying relationships and
associations among the data.

6. Evaluation: Evaluate the results to ensure that the extracted knowledge is useful, accurate, and
meaningful.

7. Deployment: Use the discovered knowledge to solve the business problem and make decisions.
16
WHAT KINDS OF DATA CAN BE MINED?

• The most basic forms of data for mining applications are database
data, data warehouse data and transactional data.

• Data mining can also be applied to other forms of data (e.g., data
streams, ordered/sequence data, graph or networked data, spatial
data, text data, multimedia data, and the WWW).

17
WHAT KINDS OF DATA CAN BE MINED?

1. Database Data
• A database system, also called a database management system (DBMS), consists of a
collection of interrelated data, known as a database, and a set of software programs to
manage and access the data.
• A relational database is a collection of tables, each of which is assigned a unique name.
Each table consists of a set of attributes (columns or fields) and usually stores a large set
of tuples (records or rows).
• When mining relational databases, we can go further by searching for trends or data
patterns. For example, data mining systems can analyze customer data to predict the
credit risk of new customers based on their income, age, and previous credit
information.
18
WHAT KINDS OF DATA CAN BE MINED?

II. Data Warehouses

• A data warehouse is a repository of information collected from multiple sources,

stored under a unified schema, and usually residing at a single site.
• Data warehouses are constructed via a process of data cleaning, data integration,
data transformation, data loading, and periodic data refreshing.
• A data warehouse is usually modeled by a multidimensional data structure, called a
data cube, in which each dimension corresponds to an attribute or a set of attributes
in the schema, and each cell stores the value of some aggregate measure such as
count or sum(sales amount ).
• A data cube provides a multidimensional view of data and allows the
precomputation and fast access of summarized data. 19
20
WHAT KINDS OF DATA CAN BE MINED?

III. Transactional Data

• Each record in a transactional database captures a transaction, such as a customer’s purchase, a flight booking,
or a user’s clicks on a web page.
• A transaction typically includes a unique transaction identity number (trans ID) and a list of the items making up
the transaction, such as the items purchased in the transaction.

IV. Other Kinds of Data

• Time-related or sequence data (e.g., historical records, stock exchange data, and time-series and biological
sequence data),
• data streams (e.g., video surveillance and sensor data, which are continuously transmitted),
• spatial data (e.g., maps),
• engineering design data (e.g., the design of buildings, system components, or integrated circuits),
• hypertext and multimedia data (including text, image, video, and audio data),
• graph and networked data (e.g., social and information networks), and
21
• the Web (a huge, widely distributed information repository made available by the Internet).
WHAT KINDS OF PATTERNS CAN BE MINED?
DATA MINING FUNCTIONALITIES (ESSAY)
1. characterization and discrimination
2. the mining of frequent patterns, associations, and correlations
3. classification and regression
4. clustering analysis and
5. outlier analysis

• Data mining functionalities are used to specify the kinds of patterns to be found in data
mining tasks.
• In general, such tasks can be classified into two categories: descriptive and predictive.
• Descriptive mining tasks characterize properties of the data in a target data set.
• Predictive mining tasks perform induction on the current data in order to make predictions.
22
DATA MINING FUNCTIONALITIES

1. Class/Concept Description: Characterization and Discrimination

• Data entries can be associated with classes or concepts.

• For example, classes of items for sale include computers and printers, and concepts
of customers include bigSpenders and budgetSpenders.
• It can be useful to describe individual classes and concepts in summarized, concise,
and yet precise terms. Such descriptions of a class or a concept are called
class/concept descriptions.
• These descriptions can be derived using
• Data Characterization − This refers to summarizing data of class under study. This class under study is called as
Target Class.
• Data Discrimination − It refers to the mapping or classification of a class with some predefined group or class.
23
DATA MINING FUNCTIONALITIES

II. Mining Frequent Patterns, Associations, and Correlations

• Frequent patterns, are patterns that occur frequently in data.
• There are many kinds of frequent patterns, including
• frequent item sets
• frequent sub sequences
• frequent substructures.
• A frequent itemset typically refers to a set of items that often appear together in a transactional
data set—for example, milk and bread, which are frequently bought together in grocery stores by
many customers.
• A frequently occurring subsequence, such as the pattern that customers, tend to purchase first a
laptop, followed by a digital camera, and then a memory card, is a (frequent) sequential pattern.
• A substructure can refer to different structural forms (e.g., graphs, trees, or lattices) that may be
combined with itemsets or subsequences. If a substructure occurs frequently, it is called a
(frequent) structured pattern. 24
DATA MINING FUNCTIONALITIES

III. Classification and Regression for Predictive Analysis

Classification is the process of finding a model (or function) that describes and distinguishes
data classes or concepts.

The model are derived based on the analysis of a set of training data (i.e., data objects for which the
class labels are known). The model is used to predict the class label of objects for which the class label
is unknown.

Regression is used to predict missing or unavailable numerical data values

rather than (discrete) class labels.

Regression analysis is a statistical methodology that is most often used for

numeric prediction

25
DATA MINING FUNCTIONALITIES

IV. Cluster Analysis

The objects are clustered or grouped based on the principle of maximizing the
intraclass similarity and minimizing the interclass similarity.

That is, clusters of objects are formed so that

objects within a cluster have high similarity in
comparison to one another, but are rather
dissimilar to objects in other clusters

26
DATA MINING FUNCTIONALITIES

V. Outlier Analysis

A data set may contain objects that do not comply with the general behavior
or model of the data.
These data objects are outliers.
Many data mining methods discard outliers as noise or exceptions.
In some applications (e.g., fraud detection) the rare events can be more
interesting than the more regularly occurring ones. The analysis of outlier data
is referred to as outlier analysis or anomaly mining

Eg: Outlier analysis may uncover fraudulent usage of credit cards by detecting
purchases of unusually large amounts for a given account number in
comparison to regular charges incurred by the same account. 27

Data Minng
No ratings yet
Data Minng
20 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
43 pages
Lecture 1.1.1 1.1.2
No ratings yet
Lecture 1.1.1 1.1.2
32 pages
Combine 056
No ratings yet
Combine 056
57 pages
Data Mining
No ratings yet
Data Mining
395 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
Module1 IntroToDataMining
No ratings yet
Module1 IntroToDataMining
36 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
DWDM LS1 Fall 24 25
No ratings yet
DWDM LS1 Fall 24 25
42 pages
Module - 1 - DM
No ratings yet
Module - 1 - DM
52 pages
Intro to Data Mining Course
No ratings yet
Intro to Data Mining Course
56 pages
Data Mining Essentials
No ratings yet
Data Mining Essentials
13 pages
Inf 444e - Datamining N Advanced Databases Introduction 2019
No ratings yet
Inf 444e - Datamining N Advanced Databases Introduction 2019
32 pages
Unit - 4 Introduction To Data Mining
No ratings yet
Unit - 4 Introduction To Data Mining
71 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
37 pages
8 Data Mining and Warehousing
No ratings yet
8 Data Mining and Warehousing
171 pages
01 Intro
No ratings yet
01 Intro
26 pages
CS-DM Module - 1
No ratings yet
CS-DM Module - 1
27 pages
DM Mod1
No ratings yet
DM Mod1
29 pages
Chapter 1 - Tagged
No ratings yet
Chapter 1 - Tagged
46 pages
Intro To Data Minning
No ratings yet
Intro To Data Minning
24 pages
Unit 4 Introduction To Data Mining
No ratings yet
Unit 4 Introduction To Data Mining
22 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
02-Introduction To Data Mining
No ratings yet
02-Introduction To Data Mining
40 pages
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
No ratings yet
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
36 pages
Lecture 1-Data Mining (Introduction)
No ratings yet
Lecture 1-Data Mining (Introduction)
30 pages
Data Mining: Applications and Techniques
No ratings yet
Data Mining: Applications and Techniques
60 pages
Introduction
No ratings yet
Introduction
46 pages
UNIT-1 Why We Need Data Mining?
No ratings yet
UNIT-1 Why We Need Data Mining?
99 pages
Datamining Unit - 1
No ratings yet
Datamining Unit - 1
20 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
Software
No ratings yet
Software
93 pages
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
No ratings yet
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
77 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
32 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
39 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
39 pages
01 Intro
No ratings yet
01 Intro
40 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
28 pages
Data Mining for Business Insights
100% (1)
Data Mining for Business Insights
39 pages
01 Intro
No ratings yet
01 Intro
23 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Data Mining for Business Growth
No ratings yet
Data Mining for Business Growth
7 pages
Data Mining
No ratings yet
Data Mining
18 pages
Comprehensive Guide to Data Mining
No ratings yet
Comprehensive Guide to Data Mining
32 pages
1 Chapter One
No ratings yet
1 Chapter One
54 pages
UNIT-3 DATA MINING - Part1
No ratings yet
UNIT-3 DATA MINING - Part1
111 pages
ICS 2408 Lecture 1 Introduction
No ratings yet
ICS 2408 Lecture 1 Introduction
32 pages
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
No ratings yet
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
31 pages
Unit III
No ratings yet
Unit III
101 pages
01 Intro
No ratings yet
01 Intro
52 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
01 Intro 1
No ratings yet
01 Intro 1
33 pages
Data Warehouse and Mining Techmax - Compressed
No ratings yet
Data Warehouse and Mining Techmax - Compressed
429 pages
Market Basket Analysis Unit-4
No ratings yet
Market Basket Analysis Unit-4
4 pages
Market Basket Analysis AProfit Based Approachto Apriori Algorithm
No ratings yet
Market Basket Analysis AProfit Based Approachto Apriori Algorithm
8 pages
Apriori Algorithm Explained
No ratings yet
Apriori Algorithm Explained
4 pages
Slips Data Analytics
No ratings yet
Slips Data Analytics
20 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
Big Data & Cloud Computing CME Unit 1
No ratings yet
Big Data & Cloud Computing CME Unit 1
23 pages
Review 3 Poster
No ratings yet
Review 3 Poster
1 page
Teaching Evaluation System by Use of Machine Learning and Artificial Intelligence Methods
No ratings yet
Teaching Evaluation System by Use of Machine Learning and Artificial Intelligence Methods
15 pages
Document 1702263234456
No ratings yet
Document 1702263234456
1 page
Data Mining in SIEM
No ratings yet
Data Mining in SIEM
5 pages
Data Warehouse & Mining Notes
No ratings yet
Data Warehouse & Mining Notes
88 pages
ML Imp Ques 1
No ratings yet
ML Imp Ques 1
22 pages
ML
No ratings yet
ML
19 pages
Machine Learning (Assignment 1-5)
No ratings yet
Machine Learning (Assignment 1-5)
3 pages
s-22 DWM
100% (2)
s-22 DWM
33 pages
CS8075_DWDM_UNIT_5
No ratings yet
CS8075_DWDM_UNIT_5
24 pages
Implementation of Apriori Algorithm Using Weka: Ajay Kumar Shrivastava R. N. Panda
No ratings yet
Implementation of Apriori Algorithm Using Weka: Ajay Kumar Shrivastava R. N. Panda
4 pages
Nov Dec 2023
No ratings yet
Nov Dec 2023
3 pages
1 s2.0 S0957417424031506 Main
No ratings yet
1 s2.0 S0957417424031506 Main
17 pages
Causality DMKD - Ps
No ratings yet
Causality DMKD - Ps
33 pages
Business Analytics: Enhancing Decision Making Association Analytics: A Mining Approach
No ratings yet
Business Analytics: Enhancing Decision Making Association Analytics: A Mining Approach
30 pages
Book Recs for Tech Students
No ratings yet
Book Recs for Tech Students
7 pages
Data Warehousing & Mining Course
No ratings yet
Data Warehousing & Mining Course
2 pages
1 Ijetst PDF
No ratings yet
1 Ijetst PDF
9 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
Quiz - Data Science and Big Data Analytics (1) (Autosaved)
No ratings yet
Quiz - Data Science and Big Data Analytics (1) (Autosaved)
43 pages
Association Rule Mining
No ratings yet
Association Rule Mining
20 pages
Automatic Building of An Ontology From A Corpus of
No ratings yet
Automatic Building of An Ontology From A Corpus of
8 pages
BE Aids - BI Syllabus
No ratings yet
BE Aids - BI Syllabus
3 pages

Module1 1 Introduction

Uploaded by

Module1 1 Introduction

Uploaded by

CP1444: DATA MINING &

• It is the computational process of discovering patterns in large data sets involving

• Data is a collection of facts, such as numbers, words, measurements,

• Some examples of information :

Volume of information is increasing everyday that we can handle from business

Why Data Mining is used in Business?

Data mining is used in business to make better managerial decisions by:

1. Automatic summarization of data

• KDD (Knowledge Discovery in Databases) is a process that involves

• The KDD process in data mining typically involves the following

II. Data Warehouses

• A data warehouse is a repository of information collected from multiple sources,

III. Transactional Data

IV. Other Kinds of Data

1. Class/Concept Description: Characterization and Discrimination

• Data entries can be associated with classes or concepts.

II. Mining Frequent Patterns, Associations, and Correlations

III. Classification and Regression for Predictive Analysis

Regression is used to predict missing or unavailable numerical data values

Regression analysis is a statistical methodology that is most often used for

IV. Cluster Analysis

That is, clusters of objects are formed so that

You might also like