Introduction

Data mining, also known as knowledge discovery from data (KDD), involves extracting useful patterns from large datasets, addressing the challenge of overwhelming data growth. The process includes steps like data cleaning, integration, selection, transformation, mining, evaluation, and presentation. Various data types can be mined, and functionalities include classification, regression, clustering, and outlier analysis, with applications spanning business intelligence, finance, and health informatics.

Uploaded by

yijac51850

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views26 pages

Introduction

Uploaded by

yijac51850

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Introduction

Data Mining
• Data mining (knowledge discovery from data)
• Extraction of interesting (non-trivial, implicit, previously unknown
and potentially useful) patterns or knowledge from huge amount
of data.
• Data mining: a misnomer?
• Alternative names
• Knowledge discovery (mining) in databases (KDD), (KDD)Name of Journal and TKDD
(Transactions) is also famous journal by

• knowledge extraction,
ACM which conducts SIGKDD(Special
Interest Group) conference from 1995 every
year.
• data/pattern analysis,
• data archeology,
• information harvesting,
• business intelligence, etc.
Why Data Mining?
• The Explosive Growth of Data: from terabytes to petabytes.
• Data collection and data availability.
• Automated data collection tools, database systems, Web,
computerized society.
• Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: Remote sensing, bioinformatics, scientific simulation, …
• Society and everyone: news, digital cameras, YouTube, …
• We are drowning in data, but starving for knowledge!
• “Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets. @Plato
Why Data Mining?
• The fast-growing, tremendous amount of data, collected and stored
in large and numerous data repositories, has far exceeded our human
ability for comprehension without powerful tools.

Data Rich but Information Poor

Business have so much data but lacks knowledge,skills,tools to make most out of it..
Data Mining
• Data mining—searching for knowledge
(interesting patterns) in data.
• Many people treat data mining as a
synonym for another popularly used
term, knowledge discovery from data,
or KDD.
• while others view data mining as
merely an essential step in the process
of knowledge discovery.
Data Mining
• The knowledge discovery process is an
iterative sequence of the following steps:
1. Data cleaning: to remove noise and
inconsistent data.
2. Data integration: multiple data sources
may be combined.
3. Data selection: data relevant to the
analysis task are retrieved from the
database.
4. Data transformation: data are
transformed and consolidated into forms
appropriate for mining by performing
summary or aggregation operations.
Data Mining
• The knowledge discovery process is an
iterative sequence of the following steps:
5. Data mining: intelligent methods are
applied to extract data patterns.
6. Pattern evaluation: to identify the truly
interesting patterns representing
knowledge based on interestingness
measures.
7. Knowledge presentation: visualization
and knowledge representation
techniques are used to present mined
knowledge to users.
Data Mining
• Data mining plays an essential role in the knowledge discovery
process.

• Data mining is the process of discovering interesting patterns and

knowledge from large amounts of data.
What Kinds of Data Can Be Mined?
• Data mining can be applied to any kind of data as long as the data are
meaningful for a target application.
• The most basic forms of Data:
• Database Data
• Data warehouse Data
• Transactional Data
• Other forms of Data
• Data Streams
• Ordered/Sequence Data
• Graph or Networked Data
• Spatial Data
• Text Data
• Multimedia Data
• WWW
Data Mining Functionalities and Tasks
• There are a number of data mining functionalities:
• Mining of frequent patterns, associations, and correlations
• Classification and Regression
• Clustering Analysis
• Outlier Analysis
• Data mining functionalities are used to specify the kinds of patterns
to be found in data mining tasks.
• Data mining tasks can be classified into two categories:
1) Descriptive: Descriptive mining tasks characterize properties of
the data in a target data set.
2) Predictive: Predictive mining tasks perform induction on the
current data in order to make predictions.
Data Mining Functionalities and Tasks

Use some variables to Find human-interpretable

predict unknown or future patterns that describe the
values of other variables data.
Mining Frequent Patterns, Associations, and Correlations
• Frequent patterns are patterns that occur frequently in data.
• There are many kinds of frequent patterns, including:
• Frequent itemsets,
• Frequent subsequences (or sequential patterns), and
• Frequent substructures.
• A frequent itemset typically refers to a set of items that often appear
together in a transactional data set.
• For example, milk and bread, which are frequently bought together in
grocery stores by many customers.
Mining Frequent Patterns, Associations, and Correlations
• A frequently subsequence, such as the pattern that customers, tend
to purchase first a laptop, followed by a digital camera, and then a
memory card, is a (frequent) sequential pattern.
• A substructure can refer to different structural forms (e.g., graphs,
trees, or lattices) that may be combined with itemsets or
subsequences. If a substructure occurs frequently, it is called a
(frequent) structured pattern. Example- Social Network Analysis
Here are some simplified examples of interactions:

Interaction 1: A -> B -> C (A is connected to B, and B is connected to C)

Interaction 2: D -> E -> F (D is connected to E, and E is connected to F)

Interaction 3: G -> H -> I -> J (G is connected to H, H is connected to I, and I is connected to J)

Interaction 4: A -> B -> C -> D (A is connected to B, B is connected to C, and C is connected to D)

Upon analyzing these interactions, we might find that the substructure "A -> B -> C" (a chain of three individuals connected sequentially) is a frequent
substructure, as it appears in multiple interactions.
Mining Frequent Patterns, Associations, and Correlations
• Association analysis: Suppose that, as a marketing manager at
AllElectronics, you want to know which items are frequently
purchased together (i.e., within the same transaction). An example of
such a rule, mined from the AllElectronics transactional database, is
Buys(X, computer”) → buys(X, software) [support = 1%,confidence = 50%]
• Association Rules are discarded as uninteresting if they do not satisfy
both a minimum support threshold and a minimum confidence
threshold.
• Additional analysis can be performed to uncover interesting statistical
correlations between associated attribute–value pairs.
Mining Frequent Patterns, Associations, and Correlations
• Correlation: is measured not only by its support and confidence but
also by the correlation between itemsets A and B.
• There are several correlation measures rules: Lift, Chi-square χ2
Classification and Regression
• Classification is the process of finding a model (or function) that
describes and distinguishes data classes or concepts.
• The model are derived based on the analysis of a set of training data
(i.e., data objects for which the class labels are known).
• The model is used to predict the class label of objects for which the
class label is unknown.
• The derived model may be represented in various forms, such as
classification rules (i.e., IF-THEN rules), decision trees, mathematical
formulae, neural networks.
• There are many other methods for constructing classification models,
such as na¨ıve Bayesian classification, support vector machines
(SVM),and k-nearest-neighbor classification.
Classification and Regression
• Classification predicts categorical (discrete, unordered) labels, and
regression models continuous-valued functions.
• Regression is used to predict missing or unavailable numerical data
values rather than (discrete) class labels.
• Regression analysis is a statistical methodology that is most often
used for numeric prediction.
Cluster Analysis
• Unlike classification and regression, which analyze class-labeled
(training) data sets, clustering analyzes data objects without
consulting class labels.
• In many cases, class labeled data may simply not exist at the
beginning.
• Clustering can be used to generate class labels for a group of data.
• The objects are clustered or grouped based on the principle of
maximizing the intraclass similarity and minimizing the interclass
similarity.
Outlier Analysis
• A data set may contain objects that do not comply with the general
behavior or model of the data. These data objects are outliers.
• Many data mining methods discard outliers as noise or exceptions
• . However, in some applications (e.g., fraud detection) the rare events
can be more interesting than the more regularly occurring ones.
• The analysis of outlier data is referred to as outlier analysis or
anomaly mining.
• Outliers may be detected using statistical tests that assume a
distribution or probability model for the data, or
• Using distance measures where objects that are remote from any
other cluster are considered outliers.
Which Technologies Are Used?
• data mining has incorporated many techniques from other domains
such as statistics, machine learning, pattern recognition, database
and data warehouse systems, information retrieval, visualization,
algorithms, high performance computing, and many application
domains.

• Data mining adopts

techniques from many
domains.
Machine Learning
• Machine learning investigates how computers can learn based on
data.
• Classic problems in machine learning that are highly related to data
mining.
▪ Supervised learning
▪ Unsupervised learning
▪ Semi-supervised learning
▪ Active learning
Machine Learning
▪ Supervised learning
▪ Unsupervised learning
▪ Semi-supervised learning
▪ Active learning
• synonym for classification
• The supervision in the learning comes from the labeled examples in
the training data set.
Machine Learning
▪ Supervised learning
▪ Unsupervised learning
▪ Semi-supervised learning
▪ Active learning
• synonym for clustering.
• The learning process is unsupervised since the input examples are not
class labeled.
• We may use clustering to discover classes within the data. .
Machine Learning
▪ Supervised learning
▪ Unsupervised learning
▪ Semi-supervised learning
▪ Active learning
• Semi-supervised learning is a class of machine learning techniques
that make use of both labeled and unlabeled examples when learning
a model.
• In one approach, labeled examples are used to learn class models and
unlabeled examples are used to refine the boundaries between
classes.
Machine Learning
▪ Supervised learning
▪ Unsupervised learning
▪ Semi-supervised learning
▪ Active learning
• Users play an active role in the learning process.
• An active learning approach can ask a user (e.g., a domain expert) to
label an example, which may be from a set of unlabeled examples or
synthesized by the learning program.
• The goal is to optimize the model quality by actively acquiring
knowledge from human users, given a constraint on how many
examples they can be asked to label.
Applications
• Data mining has many successful applications, such as:
• Business Intelligence
• Web Search
• Bioinformatics
• Health Informatics
• Finance
• Digital Libraries
• Digital Governments

DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
73 pages
Data Mining 1
No ratings yet
Data Mining 1
56 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
2 Data Mining
No ratings yet
2 Data Mining
20 pages
Data Mining
No ratings yet
Data Mining
25 pages
Introduction To Data Mining Unit1
No ratings yet
Introduction To Data Mining Unit1
37 pages
Data Mining for Business Insights
100% (1)
Data Mining for Business Insights
39 pages
DMWH M1
No ratings yet
DMWH M1
25 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
43 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Unit III
No ratings yet
Unit III
101 pages
Fundamentals of Data Mining
No ratings yet
Fundamentals of Data Mining
36 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
Data Mining Survey Overview
No ratings yet
Data Mining Survey Overview
8 pages
Data Mining Techniques Overview
No ratings yet
Data Mining Techniques Overview
33 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
DM Unit - 3
No ratings yet
DM Unit - 3
10 pages
Data Mining
No ratings yet
Data Mining
35 pages
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
No ratings yet
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
31 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
Data Mining
No ratings yet
Data Mining
3 pages
Lec 1
No ratings yet
Lec 1
19 pages
Data Mining Lecture One - Docx1
No ratings yet
Data Mining Lecture One - Docx1
12 pages
Data Mining and Warehousing Guide
No ratings yet
Data Mining and Warehousing Guide
27 pages
Data Mining Intro, Functionalities, Issues
No ratings yet
Data Mining Intro, Functionalities, Issues
30 pages
DM Notes
No ratings yet
DM Notes
91 pages
Unit - I
No ratings yet
Unit - I
22 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
Data Mining
No ratings yet
Data Mining
22 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Unit 1
No ratings yet
Unit 1
19 pages
Unit
No ratings yet
Unit
27 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
DMM Finals
No ratings yet
DMM Finals
30 pages
Data Mining Merged PDF CS1 CS8
No ratings yet
Data Mining Merged PDF CS1 CS8
272 pages
Chapter-1 - Introduction To Data Mining
No ratings yet
Chapter-1 - Introduction To Data Mining
10 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
41 pages
Module1 DataMining Ktustudents - in
No ratings yet
Module1 DataMining Ktustudents - in
24 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Lecture Notes 1.1 & 1.2
No ratings yet
Lecture Notes 1.1 & 1.2
8 pages
Unit 3
No ratings yet
Unit 3
23 pages
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
No ratings yet
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
6 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Module 4
No ratings yet
Module 4
54 pages
Data Mining for Business Decisions
100% (1)
Data Mining for Business Decisions
85 pages
FP Growth Algorithm
No ratings yet
FP Growth Algorithm
7 pages
JNTUH Used Papers
No ratings yet
JNTUH Used Papers
4 pages
Introduction to Data Mining Lecture
No ratings yet
Introduction to Data Mining Lecture
70 pages
DWDM Unit Wise Question Bank
No ratings yet
DWDM Unit Wise Question Bank
8 pages
Week 01 Lecture Material PDF
No ratings yet
Week 01 Lecture Material PDF
79 pages
Bca 4 Sem - Data Mining and Data Warehouse: Unit I Introduction
No ratings yet
Bca 4 Sem - Data Mining and Data Warehouse: Unit I Introduction
3 pages
Iare DWDM and WT Lab Manual PDF
No ratings yet
Iare DWDM and WT Lab Manual PDF
69 pages
Association Rules Ans
No ratings yet
Association Rules Ans
28 pages
Thesis Topics Mining Engineering
100% (2)
Thesis Topics Mining Engineering
8 pages
What Are The Data Mining Model
No ratings yet
What Are The Data Mining Model
1 page
Data Mining with Apriori in WEKA
100% (1)
Data Mining with Apriori in WEKA
6 pages
Assigment I Questions IT402
No ratings yet
Assigment I Questions IT402
2 pages
Data Mining: Structural Patterns
No ratings yet
Data Mining: Structural Patterns
30 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
122 pages
DWDM MID - 2 Question Paper and Online Bits
No ratings yet
DWDM MID - 2 Question Paper and Online Bits
3 pages
Market Basket Analysis Using Apriori Algorithm Gro
No ratings yet
Market Basket Analysis Using Apriori Algorithm Gro
9 pages
Data Mining and Machine Learning Overview
No ratings yet
Data Mining and Machine Learning Overview
12 pages
Lecture 5
No ratings yet
Lecture 5
43 pages
ML Module3
No ratings yet
ML Module3
83 pages
RDataMining Slides Association Rules PDF
No ratings yet
RDataMining Slides Association Rules PDF
75 pages
Papers
No ratings yet
Papers
9 pages
Implementation of Apriori Algorithm Using Weka: Ajay Kumar Shrivastava R. N. Panda
No ratings yet
Implementation of Apriori Algorithm Using Weka: Ajay Kumar Shrivastava R. N. Panda
4 pages
EAHUIM Enhanced Absolute High Utilit - 2022 - International Journal of Informat
No ratings yet
EAHUIM Enhanced Absolute High Utilit - 2022 - International Journal of Informat
8 pages
Chapter1 Introduction
No ratings yet
Chapter1 Introduction
92 pages
Data Mining for Retail Insights
No ratings yet
Data Mining for Retail Insights
12 pages
Finnish Traffic Accident Data Mining
No ratings yet
Finnish Traffic Accident Data Mining
11 pages
Apriori Algorithm Explained
No ratings yet
Apriori Algorithm Explained
4 pages
Data Analytics Important Questions
No ratings yet
Data Analytics Important Questions
5 pages
Chapter 4 Descriptive Data Mining
No ratings yet
Chapter 4 Descriptive Data Mining
6 pages

Introduction

Uploaded by

Introduction

Uploaded by

Introduction

Data Rich but Information Poor

• Data mining is the process of discovering interesting patterns and

Use some variables to Find human-interpretable

Interaction 1: A -> B -> C (A is connected to B, and B is connected to C)

Interaction 2: D -> E -> F (D is connected to E, and E is connected to F)

Interaction 3: G -> H -> I -> J (G is connected to H, H is connected to I, and I is connected to J)

Interaction 4: A -> B -> C -> D (A is connected to B, B is connected to C, and C is connected to D)

• Data mining adopts

You might also like