0% found this document useful (0 votes)

16 views66 pages

02 DM BI Data Mining

The document discusses the importance of data mining in uncovering hidden patterns and insights from large datasets, facilitating data-driven decision-making, and enhancing customer understanding. It outlines the Knowledge Discovery in Data (KDD) process, which includes steps like data cleaning, integration, selection, transformation, mining, evaluation, and presentation. Additionally, it highlights various data mining functionalities, applications, and the challenges faced in the field.

Uploaded by

batch0406sem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views66 pages

02 DM BI Data Mining

Uploaded by

batch0406sem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Data Mining

and
Business Intelligence
Data Mining
Module 2
Created/Adopted/Modified for
Data Mining and Business Intelligence – MCA II Semester
Vidya Vikas Institute of Engineering & Technology
Mysore
2023-24
GPD
Why Data Mining?
Why Data Mining?
 Data Mining

 Enables knowledge discovery by uncovering hidden patterns and insights

in large datasets.
 Supports data-driven decision-making across various domains.
 Enhances customer understanding and enables personalized marketing.
 Helps detect fraud and manage risks effectively.
 Facilitates market and competitive analysis for strategic decision-making.
 Optimizes business processes by identifying inefficiencies.
 Contributes to scientific research and advancements across diverse fields.
What Is Data Mining?
Data Mining (Knowledge
Discovery from Data)
Extraction of interesting
(non-trivial, implicit,
previously unknown and
potentially useful) patterns
or knowledge from huge
amount of data
Knowledge Discovery (KDD)
KDD Process
 This is a view from typical database systems Pattern Evaluation

and data warehousing communities

 Data mining plays an essential role in the Data Mining
knowledge discovery process
Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
Knowledge Discovery
(KDD)
KDD Process
 This is a view from typical database systems
and data warehousing communities
 Data mining plays an essential role in the
knowledge discovery process
Knowledge Discovery from Data (KDD) Process
 Many people treat data mining as a synonym for another
popularly used term, knowledge discovery from data, or KDD,
while others view data mining as merely an essential step in the
process of knowledge discovery.
 The knowledge discovery process is an iterative sequence of the
following 7 steps:
Knowledge Discovery from Data (KDD) Process
 1. Data cleaning (to remove noise and inconsistent data)
 2. Data integration (where multiple data sources may be
combined)
 3. Data selection (where data relevant to the analysis task are
retrieved from the database)
 4. Data transformation (where data are transformed and
consolidated into forms appropriate for mining by performing
summary or aggregation operations)
Knowledge Discovery from Data (KDD) Process
 5. Data mining (an essential process where intelligent methods
are applied to extract data patterns)
 6. Pattern evaluation (to identify the truly interesting patterns
representing knowledge based on interestingness measures)
 7. Knowledge presentation (where visualization and knowledge
representation techniques are used to present mined
knowledge to users)
Data Mining or KDD ?
 The term data mining is often used to refer to the entire knowledge
discovery process.
 Therefore, we adopt a broad view of data mining functionality:

 Data mining is the process of discovering interesting

patterns and knowledge from large amounts of
data.
 The data sources can include databases, data warehouses, the Web,
other information repositories, or data that are streamed into the
system dynamically.
Data Mining in Business Intelligence
Increasing potential
to support
business decisions End User
Decision
Making

Data Presentation Business

Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses

DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
KDD Process: A View from ML and Statistics

Input Data Data Pre- Data Post-

Processing Mining Processing

Data integration Pattern discovery Pattern evaluation

Classification Pattern selection
Normalization
Clustering Pattern interpretation
Feature selection Outlier analysis
Dimension reduction Pattern visualization
…………

 This is a view from typical machine learning and statistics communities

Data Mining: On What Kinds of Data?
 Data Mining is performed on different kinds of data :

1.Database
2.Data Warehouse
3.Transactional Database
4.Other Kinds of Data
 1. Database-oriented data sets and applications
 Relational database, Object-relational databases,
Heterogeneous databases and legacy databases
Data Mining: On What Kinds of Data?
 2. Data Warehouse - A data warehouse is usually
modeled by a multidimensional data structure, called a
data cube, in which each dimension corresponds to an
attribute or a set of attributes in the schema, and each
cell stores the value of some aggregate measure such as
count or sum(sales amount).
 A data cube provides a multidimensional view of data
and allows the precomputation and fast access of
summarized data.
Data Mining: On What Kinds of Data?
 2. Data Warehouse
Data Mining: On What Kinds of Data?
 2. Data Warehouse
 Data Cube Example :
 A data cube provides a
multidimensional view of
data and allows the
precomputation and fast
access of summarized data.
Data Mining: On What Kinds of Data?
 2. Data Warehouse
 Provides multidimensional data views and
precomputation of summarized data.
 OLAP (OnLine Analytical Processing) operations
make use of background knowledge regarding
the domain of the data being studied to
allow the presentation of data at
different levels of abstraction.
Data Mining: On What Kinds of Data?
 3. A transactional database captures a transaction, such as a
customer’s purchase, a flight booking, or a user’s clicks on a web
page.
 A transaction typically includes a unique transaction identity
number (trans ID) and a list of the items making up the transaction,
such as the items purchased in the transaction.
 A transactional database may have additional tables, which contain other
information related to the transactions, such as item description, information
about the salesperson or the branch, and so on.
Data Mining: On What Kinds of Data?
 4. Other kinds of data : Advanced data sets and advanced
applications
 Data streams and sensor data
 Time-series data, temporal data, sequence data (incl. bio-
sequences)
 Structure data, graphs, social networks and information
networks
 Spatial data and spatio-temporal data
 Multimedia database, Text databases, The World-Wide Web
What Kinds of Patterns Can Be Mined?
 There are a number of data mining functionalities.
 These include (1) characterization and discrimination; the (2) mining of
frequent patterns, associations, and correlations; (3) classification and
regression; (4) clustering analysis; and (5) outlier analysis.
 Data mining functionalities are used to specify the kinds of patterns to be
found in data mining tasks
 Such tasks can be classified into two categories:

 Descriptive mining tasks characterize properties of the data in a target

data set.
 Predictive mining tasks perform induction on the current data in order to
make predictions.
Data Mining Functionalities (Tasks)
 Descriptive Mining

 Goal: Find human-interpretable patterns that describe the

data.
 Example: Which products are often bought together?
 Predictive Mining

 Goal: Use some variables (observations from the past) to

predict unknown or future values of other variables.
 Example: Will a person click a online advertisement? - given
her browsing history
Data Mining & Machine Learning
 Machine Learning Terminology

 Descriptive Mining== Unsupervised Learning

 Predictive Mining == Supervised Learning

 Machine learning investigates how computers can learn (or improve

their performance) based on data.
 A main research area is for computer programs to automatically
learn to recognize complex patterns and make intelligent decisions
based on data.
Data Mining & Machine Learning
 Data Mining Tasks & Applications :

1.Classification [Predictive]
2.Cluster Analysis [Descriptive]
3.Association Analysis [Descriptive]
4.Regression Analysis [Predictive]
5.Sequential Pattern Discovery [Descriptive]
6.Deviation Detection (Anomaly Detection) [Predictive]
Data Mining Functions:(1)
Characterization and Discrimination
 Data characterization is a summarization of the general characteristics or features of a
target class of data.
 For example : To study the characteristics of software products with sales that increased
by 10% in the previous year.
 The data cube-based OLAP roll-up operation can be used
to perform user-controlled data summarization along a
specified dimension.
 Data discrimination is a comparison of the general features
of the target class data objects against the general features
of objects from one or multiple contrasting classes.
 For example, a user may want to compare the general features
of software products with sales that increased by 10% last
year against those with sales that decreased by at least 30%
during the same period.
Data Mining Functions: (2) Pattern Discovery
 Frequent patterns (or frequent itemsets)
 What items are frequently purchased together in your Supermarket?
 Association and Correlation Analysis

 A typical association rule

 Nail polish - Eyeliner [0.5%, 75%] (support, confidence)
 Are strongly associated items also strongly correlated?
Data Mining Functions: (3) Classification & Regression
 Classification and label prediction. Supervised Learning
 Construct models (functions) based on some training examples
 Describe and distinguish classes or concepts for future prediction
 Ex. 1. Classify countries based on (climate)
 Ex. 2. Classify cars based on (mileage)
 Predict some unknown class labels
 Typical methods
 Decision trees, naïve Bayesian classification,
support vector machines, neural networks, rule-based classification, pattern-based
classification, logistic regression, …
 Typical applications:
 Credit card fraud detection, direct marketing, classifying stars, diseases, web-pages,
…
Data Mining Functions: (3) Classification & Regression
 A decision tree is a flowchart-like tree structure, where each node denotes a test on an
attribute value, each branch represents an outcome of the test, and tree leaves
represent classes or class distributions.
 A neural network is acollection of neuron-like processing units with weighted
connections between the units.
 Regression analysis is a statistical methodology that is most often used for numeric
prediction.
 Regression models
continuous-valued
functions.
Data Mining Functions: (4) Cluster Analysis
 Unsupervised learning (i.e., Class
label is unknown)
 Group data to form new categories
(i.e., clusters), e.g., cluster houses to
find distribution patterns
 Principle: Maximizing intra-class
similarity & minimizing interclass
similarity
 Many methods and applications
Data Mining Functions: (5) Outlier Analysis
 Outlier analysis
 Outlier: A data object that does not comply with the general
behavior of the data
 Noise or exception?―One person’s garbage could be another
person’s treasure
 Methods: by product of clustering or regression analysis, …
 Useful in fraud detection, rare events analysis
Data Mining Functions: (6) Time and Ordering:
Sequential Pattern, Trend and Evolution Analysis
 Sequence, trend and evolution analysis
 Trend, time-series, and deviation analysis
 e.g., regression and value prediction
 Sequential pattern mining
 e.g., buy digital camera, then buy large memory cards
 Periodicity analysis
 Motifs and biological sequence analysis
 Approximate and consecutive motifs
 Similarity-based analysis
 Mining data streams
 Ordered, time-varying, potentially infinite, data streams
Data Mining Functions: (7) Structure and Network Analysis
 Graph mining
 Finding frequent subgraphs (e.g., chemical compounds), trees (XML),
substructures (web fragments)
 Information network analysis
 Social networks: actors (objects, nodes) and relationships (edges)
 e.g., author networks in CS, terrorist networks
 Multiple heterogeneous networks
 A person could be multiple information networks: friends, family, classmates, …
 Links carry a lot of semantic information: Link mining
 Web mining
 Web is a big information network: from PageRank to Google
 Analysis of Web information networks
 Web community discovery, opinion mining, usage mining, …
Are All Mined Knowledge Interesting ?
 A data mining system has the potential to generate thousands or even millions
of patterns, or rules.
 1. Are all of the patterns interesting?
 No.
No Only a small fraction would actually be of interest to a given user.
 2. What makes a pattern interesting?
 A pattern is interesting if it is (1) easily understood by humans, (2) valid on
new or test data with some degree of certainty, (3) potentially useful, and (4)
novel.
 A pattern is also interesting if it validates a hypothesis that the user sought to
confirm.

An interesting pattern represents knowledge.

Are All Mined Knowledge Interesting ?
Objective measures of pattern interestingness :
 Consider X ⇒ Y

 Support : the % of transactions from a transaction database that the given rule
satisfies. This is the probability P(X ∪ Y ),
where X ∪ Y indicates that a transaction contains both X and Y , that is, the union of
itemsets X and Y.
support(X ⇒ Y ) = P(X ∪ Y )
 Confidence : assesses the degree of certainty of the detected association. This is
conditional probability P(Y | X), that is, the probability that a transaction containing
X also contains Y.
confidence(X ⇒ Y ) = P(Y | X)
 Example: Idly Rice ⇒ Uddina Bele [0.4%, 80%] (support, confidence)
Are All Mined Knowledge Interesting ?
 3. Can a data mining system generate all of the interesting
patterns?
 Refers to the completeness of a data mining algorithm.
 It is unrealistic and inefficient for data mining systems to generate
all possible patterns.
 Association rule mining is an example where the use of constraints
and interestingness measures can ensure the completeness of
mining.
Are All Mined Knowledge Interesting ?
 4. Can the system generate only the interesting ones?
 is an optimization problem in data mining.
 It is highly desirable for data mining systems to generate only
interesting patterns.
 Progress has been made in this direction; however, such
optimization remains a challenging issue in data mining.
Data Mining: Which Technologies are Used?
Why Confluence of Multiple Disciplines?
 Tremendous amount of data
 Algorithms must be scalable to handle big data
 High-dimensionality of data
 Micro-array may have tens of thousands of dimensions
 High complexity of data
 Data streams and sensor data
 Time-series data, temporal data, sequence data
 Structure data, graphs, social and information networks
 Spatial, spatiotemporal, multimedia, text and Web data
 Software programs, scientific simulations
 New and sophisticated applications
Applications of Data Mining
 Web page analysis: classification, clustering, ranking
 Collaborative analysis & recommender systems
 Basket data analysis to targeted marketing
 Biological and medical data analysis
 Data mining and software engineering
 Data mining and text analysis
 Data mining and social and information network analysis
 Built-in (invisible data mining) functions in Google, MS, Yahoo!, Linked, Facebook, …
 Major dedicated data mining systems/tools
 SAS, MS SQL-Server Analysis Manager, Oracle Data Mining Tools)
Major Issues in Data Mining (1)
 Mining Methodology
 Mining various and new kinds of knowledge
 Mining knowledge in multi-dimensional space
 Data mining: An interdisciplinary effort
 Boosting the power of discovery in a networked environment
 Handling noise, uncertainty, and incompleteness of data
 Pattern evaluation and pattern- or constraint-guided mining
 User Interaction
 Interactive mining
 Incorporation of background knowledge
 Presentation and visualization of data mining results
Major Issues in Data Mining (2)
 Efficiency and Scalability
 Efficiency and scalability of data mining algorithms
 Parallel, distributed, stream, and incremental mining methods
 Diversity of data types
 Handling complex types of data
 Mining dynamic, networked, and global data repositories
 Data mining and society
 Social impacts of data mining
 Privacy-preserving data mining
 Invisible data mining
Summary
 Data mining: Discovering interesting patterns and knowledge from massive amount of data
 A natural evolution of science and information technology, in great demand, with wide
applications
 A KDD process includes data cleaning, data integration, data selection, transformation, data
mining, pattern evaluation, and knowledge presentation
 Mining can be performed in a variety of data
 Data mining functionalities: characterization, discrimination, association, classification,
clustering, trend and outlier analysis, etc.
 Data mining technologies and applications
 Major issues in data mining
Data,
Types of Data,
Datasets 42
Data

 Data sets differ in a number of ways

– Quantitative or Qualitative (Nominal, ordinal, interval, ratio)
– Binary or Discrete or Continuous
– Asymmetric, ordered, sequential, time-series, etc….
 The type of data determines which tools and techniques can be
used to analyze that data.

43
What is Data?

 Attributes
A Data Set is a collection of data objects.
Data objects are described by their
attributes Tid Refund Marital Taxable
Status Income Cheat
 An attribute is a property or characteristic of No
1 Yes Single 125K
an object that may vary from one object to 2 No Married 100K No
another or from one time to another 3 No Single 70K No

Objects
– Examples: eye color of a person, temperature, 4 Yes Married 120K No
etc. 5 No Divorced 95K Yes
 A collection of attributes describe an object 6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
44
10
Data Objects and Attributes

 Data Objects : other names :

– Record, point, vector, pattern, event, case, sample,
entity.

 Attributes : other names :

– Variable, characteristic, field, feature,
feature dimension.
dimension

45
Attribute Values

 Attribute values are numbers or symbols assigned to an

attribute for a particular object

 Distinction between attributes and attribute values

– Same attribute can be mapped to different attribute values
 Example: height can be measured in feet or meters

– Different attributes can be mapped to the same set of values

 Example: Attribute values for ID and age are integers
– But properties of attribute can be different than the properties of
the values used to represent the attribute
46
Attributes and Measurement

A Measurement Scale is a rule (function) that

associates a numerical or symbolic value with an
attribute of an object.

 The process of measurement is the application of a

measurement scale to associate a value with a
particular attribute of a specific object.

47
Types of Attributes or
Levels of Measurements
 There are different types of attributes
– Nominal
 You can categorize (distinct) your data by labelling them in mutually exclusive groups,
but there is no order between the categories.
– Ordinal
 You can categorize and rank (order) your data in an order, but you cannot say anything
about the intervals between the rankings.
– Interval
 You can categorize, rank, and infer equal intervals (differences are meaningful) between
neighboring data points, but there is no true zero point.
– Ratio
 You can categorize, rank, and infer equal intervals between neighboring data points, and
there is a true zero point (ratio).
48
Types of Attributes or Levels of Measurements
 Nominal : You can categorize (distinct) your data by labelling them in mutually exclusive groups, but there
is no order between the categories.
– Examples: ID numbers, eye color, zip codes, City of birth, Gender, Ethnicity, Car brands, Marital status
 Ordinal : You can categorize and rank (order) your data in an order, but you cannot say anything about the
intervals between the rankings.
– Examples: rankings (e.g., taste of potato chips on a scale from 1-10), grades, height {tall, medium,
short}, Top 5 Olympic medallists, Language ability (e.g., beginner, intermediate, fluent), Likert-type
questions (e.g., very dissatisfied to very satisfied)

 Interval : You can categorize, rank, and infer equal intervals (differences are meaningful) between
neighboring data points, but there is no true zero point.
– Examples: calendar dates, temperatures in Celsius or Fahrenheit, Test scores (e.g., IQ or exams).

 Ratio : You can categorize, rank, and infer equal intervals between neighboring data points, and there is a
true zero point (ratio).
– Examples: temperature in Kelvin, length, counts, elapsed time (e.g., time to run a race), Height, Age,
Weight. 49
Properties of Attribute Values

 Thetype of an attribute depends on which of the following

properties/operations it possesses:
– Distinctness: = ≠
– Order: < >
– Differences are meaningful : + -
– Ratios are meaningful : * /
– Nominal attribute : distinctness
– Ordinal attribute : distinctness & order
– Interval attribute : distinctness, order & meaningful differences
– Ratio attribute : all 4 properties/operations 50
Attribute Description Examples Operations
Type
Nominal Nominal attribute zip codes, employee mode, entropy,
Categorical

values only ID numbers, eye contingency

Qualitative

distinguish. (=, ) color, sex: {male, correlation, 2

female} test

Ordinal Ordinal attribute hardness of minerals, median,

values also order {good, better, best}, percentiles, rank
objects. grades, street correlation, run
(<, >) numbers tests, sign tests
Interval For interval calendar dates, mean, standard
Quantitative

attributes, temperature in deviation,

Numeric

differences between Celsius or Fahrenheit Pearson's

values are correlation, t and
meaningful. (+, - ) F tests
Ratio For ratio variables, temperature in Kelvin, geometric mean,
both differences and monetary quantities, harmonic mean,
ratios are counts, age, mass, percent variation
meaningful. (*, /) length, current

This categorization of attributes is done by S. S. Stevens

51
Types of Attributes by Number of Values

 Discrete Attribute
– Has only a finite or countably infinite set of values
– Examples: zip codes, counts, or the set of words in a collection of documents
– Often represented as integer variables.
– Note: binary attributes are a special case of discrete attributes
 Continuous Attribute
– Has real numbers as attribute values
– Examples: temperature, height, or weight.
– Practically, real values can only be measured and represented using a finite
number of digits.
– Continuous attributes are typically represented as floating-point variables.

52
Asymmetric Attributes

 Only presence (a non-zero attribute value) is regarded as

important – Sparsity of the data
 Words present in documents
 Items present in customer transactions
 Example : Student object has an attribute if the student has arrears(back
paper). The possible values are 1(true) and 0(false).
– Only few students will have arrears. So, it will be meaningful to consider 1
and not consider 0 for any comparison - Since most students will have 0.
 Consider only presence of an attribute. Don’t consider absence of it.

53
Asymmetric Attributes

 Binary attributes where only non-zero values are important are

called as Asymmetric Binary Attributes.
–
Useful in Association Analysis.
 Asymmetric Discrete and Asymmetric Continuous attributes are
also possible.

 Ifwe met a friend in the grocery store would we ever say the
following?
“I see our purchases are very similar since we didn’t buy most of the same
things.” 54
Types of data sets
 There are many different types of Data Sets. New types keep poping
up. We will group the types into 3 groups.
●
Record
●
Graph
●
Ordered
 But, first, we will look at some characteristics that apply to data sets :
●
Dimensionality
●
Sparcity
●
Resolution

55
Types of Data Sets
Important Characteristics of Data
●
Dimensionality (number of attributes that objects in the dataset
possess)
 Data with small number of dimensions tend to be qualitatively different than
moderate or high dimensional data.
 High dimensional data brings a number of challenges.
 Curse of Dimensionality : Difficulty associated with analysing high
dimensional data.
●
Sparsity
 Only presence counts – like in asymmetric attributes
 Advantage : Only non-zero values need to be stored and processed
●
Size
 Type of analysis may depend on size of data 56
Types of Data Sets
Important Characteristics of Data
●
Resolution
 Properties of data are
different at different
levels of resolution
 Patterns depend
on the scale

57
Types of Data Sets
Important Characteristics of Data
●
Resolution
 Patterns depend
on the scale
●
Perspective


58
Types of data sets
 Record
– Data Matrix
– Sparse Data Matrix (Document Data)
– Transaction Data
 Graph
– Data with Relationship among Objects (WWW)
– Data with Objects that are Graphs (Molecule)
 Ordered
– Spatial Data
– Temporal Data
– Sequential Data
– Genetic Sequence Data

59
Types of Data Sets: (1) Record Data
 Relational records
 Relational tables, highly structured
 Data matrix, e.g., numerical matrix, crosstabs

 Transaction data

timeout

season
coach

game
score
team

ball

lost
pla
TID Items

wi
n
y
1 Bread, Coke, Milk
2 Beer, Bread
Document 1 3 0 5 0 2 6 0 2 0 2
3 Beer, Coke, Diaper, Milk
Document 2 0 7 0 2 1 0 0 3 0 0
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk Document 3 0 1 0 0 1 2 2 0 3 0

 Document data: Term-frequency vector (matrix) of text documents

Types of Data Sets: (2) Graphs and Networks
 Transportation network

 World Wide Web

 Molecular Structures

 Social or information networks

Types of Data Sets: (3) Ordered Data
 Video data: sequence of images

 Temporal data: time-series

 Sequential Data: transaction sequences

 Genetic sequence data
Ordered Data – Time Series Data
 A special type of sequential data. Each record is a time series – a
series of measurements taken over time.
 Temporal Auto-correlation (if 2 measurements are close in time,
then their values are often similar).

63
Types of Data Sets: Spatial, image and multimedia Data
 Spatial data: maps

 Image data:

 Video data:
Spacial Data

 Objects have spacial attributes (like positions or areas).

 Spacial auto-correlation :
Objects that are
physically close
tend to be similar
in other ways also.
 Spatio-Temporal
Data

Average Monthly
Temperature of land
and ocean 65
Handling Non-Record Data

 Most data mining algorithms are designed for record data or its
variations.
 What do with non-record data ?

– Extract features from data objects and use these features to create a
record corresponding to each object.
 This works well for some cases.
 In some other cases, this type of representation does not capture all
information about the data.

Unit 1 and 2
No ratings yet
Unit 1 and 2
145 pages
Data Mining Essentials
No ratings yet
Data Mining Essentials
13 pages
Chap 1
No ratings yet
Chap 1
32 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
01 Intro
No ratings yet
01 Intro
45 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
Unit 3
No ratings yet
Unit 3
23 pages
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
No ratings yet
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
31 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
43 pages
01 Intro
No ratings yet
01 Intro
29 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
37 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
Inf 444e - Datamining N Advanced Databases Introduction 2019
No ratings yet
Inf 444e - Datamining N Advanced Databases Introduction 2019
32 pages
1 Intro
No ratings yet
1 Intro
33 pages
Unit 2 Introduction To Data Mining
No ratings yet
Unit 2 Introduction To Data Mining
38 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Cap481 - Business Communication Unit 4
No ratings yet
Cap481 - Business Communication Unit 4
90 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Data Mining
No ratings yet
Data Mining
88 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
01 - Data Mining Introduction
No ratings yet
01 - Data Mining Introduction
21 pages
DM 1 PDF
No ratings yet
DM 1 PDF
67 pages
0 Introduction
No ratings yet
0 Introduction
43 pages
Unit 1
No ratings yet
Unit 1
59 pages
01 Intro
No ratings yet
01 Intro
41 pages
Introduction To Data Mining: - Chapter 3
No ratings yet
Introduction To Data Mining: - Chapter 3
39 pages
Unit 1
No ratings yet
Unit 1
148 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
21 pages
DWDM LS1 Fall 24 25
No ratings yet
DWDM LS1 Fall 24 25
42 pages
Unit - I
No ratings yet
Unit - I
22 pages
DM 1
No ratings yet
DM 1
78 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
KDD Process
No ratings yet
KDD Process
56 pages
Lecture 1.1.1 1.1.2
No ratings yet
Lecture 1.1.1 1.1.2
32 pages
Data Mining
No ratings yet
Data Mining
35 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
86 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
02-Introduction To Data Mining
No ratings yet
02-Introduction To Data Mining
40 pages
Intro to Data Mining Concepts
No ratings yet
Intro to Data Mining Concepts
50 pages
Introduction
No ratings yet
Introduction
46 pages
01 Intro 1
No ratings yet
01 Intro 1
33 pages
LECTURE 1 Data Mining
No ratings yet
LECTURE 1 Data Mining
41 pages
2020 - UNIT 2 Chapter 1
No ratings yet
2020 - UNIT 2 Chapter 1
73 pages
Data Mining - Concepts and Techniques
No ratings yet
Data Mining - Concepts and Techniques
224 pages
01 Intro
No ratings yet
01 Intro
40 pages
Data Mining (Introduction)
No ratings yet
Data Mining (Introduction)
31 pages
CIS 467 - Topic 1 - Introduction - 2020
No ratings yet
CIS 467 - Topic 1 - Introduction - 2020
79 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
35 pages
IS414: Data Mining: DR - Waleed M.Ead
No ratings yet
IS414: Data Mining: DR - Waleed M.Ead
36 pages
DWDMUNIT1A
No ratings yet
DWDMUNIT1A
93 pages
Database Coursework Examples
100% (2)
Database Coursework Examples
5 pages
Database Management System I
No ratings yet
Database Management System I
8 pages
Data Analytics For Accounting - Exercise Chapter 3 Performing The Test Plan and Analyzing The Results
No ratings yet
Data Analytics For Accounting - Exercise Chapter 3 Performing The Test Plan and Analyzing The Results
3 pages
Database Management Systems Course Guide Book PDF
No ratings yet
Database Management Systems Course Guide Book PDF
4 pages
Idera SQL Workload Analysis
No ratings yet
Idera SQL Workload Analysis
12 pages
JDBC Basics for Java Developers
100% (3)
JDBC Basics for Java Developers
53 pages
All About Zookeeper and ClickHouse Keeper
No ratings yet
All About Zookeeper and ClickHouse Keeper
45 pages
Oracle 1z0-083 v2021-08-06 q56
No ratings yet
Oracle 1z0-083 v2021-08-06 q56
22 pages
Reza Rad Pro Power BI Architecture Development Deployment Sharing and Security For Microsoft Pow
No ratings yet
Reza Rad Pro Power BI Architecture Development Deployment Sharing and Security For Microsoft Pow
746 pages
Oracle RMAN
No ratings yet
Oracle RMAN
7 pages
DBMS Exp 5
No ratings yet
DBMS Exp 5
10 pages
Relational Model Slides
No ratings yet
Relational Model Slides
30 pages
Final Test B - DU2
No ratings yet
Final Test B - DU2
2 pages
Release Notes 11.2
No ratings yet
Release Notes 11.2
26 pages
File Systems and Databases
No ratings yet
File Systems and Databases
25 pages
CARVALHO, Marcus. Outro Lado Da Independência - Quilombolas, Negros e Pardos em Pernambuco (Brazil), 1817-23
No ratings yet
CARVALHO, Marcus. Outro Lado Da Independência - Quilombolas, Negros e Pardos em Pernambuco (Brazil), 1817-23
31 pages
Introduction to SQL and Commands
No ratings yet
Introduction to SQL and Commands
40 pages
Unit 04 Database Design and Development
No ratings yet
Unit 04 Database Design and Development
91 pages
SQL Subqueries and Table Management
No ratings yet
SQL Subqueries and Table Management
185 pages
Oracle PostgreSQL DBA Resume
No ratings yet
Oracle PostgreSQL DBA Resume
4 pages
NetBackup Blueprint Guide
No ratings yet
NetBackup Blueprint Guide
55 pages
Prometheus vs Zabbix: Monitoring Tools Comparison
No ratings yet
Prometheus vs Zabbix: Monitoring Tools Comparison
2 pages
Csv-Files-Using-Pandas-With-Examples/ Read CSV Files Using Pandas - With Examples
No ratings yet
Csv-Files-Using-Pandas-With-Examples/ Read CSV Files Using Pandas - With Examples
6 pages
DP-900 Exam
No ratings yet
DP-900 Exam
44 pages
Module-2-Data Visualisation
No ratings yet
Module-2-Data Visualisation
30 pages
Information Extraction Survey
No ratings yet
Information Extraction Survey
117 pages
Python MySQL CRUD Guide
No ratings yet
Python MySQL CRUD Guide
2 pages
DBMS Assignment-1
No ratings yet
DBMS Assignment-1
3 pages
ADBMS: Assignment - 05: Snowflake Schema in Data Warehouse
No ratings yet
ADBMS: Assignment - 05: Snowflake Schema in Data Warehouse
5 pages
Ib Past Paper Biology
100% (3)
Ib Past Paper Biology
9 pages

02 DM BI Data Mining

Uploaded by

02 DM BI Data Mining

Uploaded by

Data Mining

 Enables knowledge discovery by uncovering hidden patterns and insights

and data warehousing communities

Data Warehouse Selection

 Data mining is the process of discovering interesting

Data Presentation Business

Data Preprocessing/Integration, Data Warehouses

Input Data Data Pre- Data Post-

Data integration Pattern discovery Pattern evaluation

 This is a view from typical machine learning and statistics communities

 Descriptive mining tasks characterize properties of the data in a target

 Goal: Find human-interpretable patterns that describe the

 Goal: Use some variables (observations from the past) to

 Descriptive Mining== Unsupervised Learning

 Machine learning investigates how computers can learn (or improve

 A typical association rule

An interesting pattern represents knowledge.

 Data sets differ in a number of ways

 Data Objects : other names :

 Attributes : other names :

 Attribute values are numbers or symbols assigned to an

 Distinction between attributes and attribute values

– Different attributes can be mapped to the same set of values

A Measurement Scale is a rule (function) that

 The process of measurement is the application of a

 Thetype of an attribute depends on which of the following

values only ID numbers, eye contingency

distinguish. (=, ) color, sex: {male, correlation, 2

Ordinal Ordinal attribute hardness of minerals, median,

attributes, temperature in deviation,

differences between Celsius or Fahrenheit Pearson's

This categorization of attributes is done by S. S. Stevens

 Only presence (a non-zero attribute value) is regarded as

 Binary attributes where only non-zero values are important are

 Document data: Term-frequency vector (matrix) of text documents

 World Wide Web

 Social or information networks

 Temporal data: time-series

 Sequential Data: transaction sequences

 Objects have spacial attributes (like positions or areas).

You might also like