0% found this document useful (0 votes)

69 views52 pages

Data Mining for Computer Science Students

The document discusses data mining including its motivation, definition, applications, process, functionalities, and types of data used. Data mining aims to extract useful patterns from large datasets and can be used for applications like market analysis, risk analysis, and fraud detection.

Uploaded by

jaspreet.e16339

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views52 pages

Data Mining for Computer Science Students

Uploaded by

jaspreet.e16339

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

DATA MINING

UNIT-I : INTRODUCTION

DR. P. BALAMURUGAN, ASST. PROFESSOR

DEPARTMENT OF COMPUTERE SCIENCE
GOVERNMENT ARTS COLLEGE, COIMBATORE-18
spbalamurugan@rediffmail.com
Outline
• Motivation: Why data mining?
• What is data mining?
• Data Mining: On what kind of data?
• Data mining functionality
• Are all the patterns interesting?
• Classification of data mining systems
• Data Mining Task Primitives
• Integration of data mining system with a DB and DW System
• Major issues in data mining

2
Why Data Mining?
• The Explosive Growth of Data: from terabytes(10004) to yottabytes(10008)
– Data collection and data availability
• Automated data collection tools, database systems, web
– Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: bioinformatics, scientific simulation, medical research …
• Society and everyone: news, digital cameras, …
• Data rich but information poor!
– What does those data mean?
– How to analyze data?

• Data mining — Automated analysis of massive data sets

3
Evolution of Database Technology

4
What Is Data Mining?
• Data mining (knowledge discovery from data)
– Extraction of interesting (non-trivial, implicit, previously unknown and

potentially useful) patterns or knowledge from huge amount of data

– Data mining: a misnomer?
• Alternative names
– Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.

5
Potential Applications
• Data analysis and decision support
– Market analysis and management
• Target marketing, customer relationship management (CRM),
market basket analysis, cross selling, market segmentation
– Risk analysis and management
• Forecasting, customer retention, improved underwriting, quality
control, competitive analysis
– Fraud detection and detection of unusual patterns (outliers)
• Other Applications
– Text mining (news group, email, documents) and Web mining
– Stream data mining
– Bioinformatics and bio-data analysis

6
Ex.: Market Analysis and Management
• Where does the data come from?—Credit card transactions, loyalty cards,
discount coupons, customer complaint calls, surveys …
• Target marketing
– Find clusters of “model” customers who share the same characteristics:
interest, income level, spending habits, etc.,
• E.g. Most customers with income level 60k – 80k with food expenses
$600 - $800 a month live in that area
– Determine customer purchasing patterns over time
• E.g. Customers who are between 20 and 29 years old, with income of
20k – 29k usually buy this type of CD player
• Cross-market analysis—Find associations/co-relations between product sales,
& predict based on such association
– E.g. Customers who buy computer A usually buy software B

7
Ex.: Market Analysis and Management (2)
• Customer requirement analysis
– Identify the best products for different customers
– Predict what factors will attract new customers
• Provision of summary information
– Multidimensional summary reports
• E.g. Summarize all transactions of the first quarter from three different
branches
Summarize all transactions of last year from a particular branch
Summarize all transactions of a particular product
– Statistical summary information
• E.g. What is the average age for customers who buy product A?
• Fraud detection
– Find outliers of unusual transactions
• Financial planning
– Summarize and compare the resources and spending 8
Knowledge Discovery (KDD) Process

9
KDD Process: Several Key Steps
• Learning the application domain
– relevant prior knowledge and goals of application
• Identifying a target data set: data selection
• Data processing
 Data cleaning remove noise and inconsistent data
 Data integration multiple data sources maybe combined
 Data selection data relevant to the analysis task are retrieved from database
 Data transformation transformed/consolidated into forms appropriate for mining
 Data mining extract data patterns
 Pattern evaluation identify the truly interesting patterns
 Knowledge presentation mined knowledge is presented to the user with
visualization or representation techniques
• Use of discovered knowledge

10
Data Mining and Business Intelligence

Increasing potential
to support
business decisions End User
Decision
Making

Data Presentation Business

Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses

DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
11
A typical DM System Architecture
• Database, data warehouse, WWW or other information
repository (store data)
• Database or data warehouse server (fetch and
combine data)
• Knowledge base (turn data into meaningful groups
according to domain knowledge)
• Data mining engine (perform mining tasks)
• Pattern evaluation module (find interesting patterns)
• User interface (interact with the user)

12
A typical DM System Architecture (2)

13
Confluence of Multiple Disciplines

Database
Technology Statistics

Information Machine
Science Data Mining Learning

Visualization Other
Disciplines

• Not all “Data Mining System” performs true data mining

 machine learning system, statistical analysis (small amount of data)
 Database system (information retrieval, deductive querying…)
14
On What Kinds of Data?
• Database-oriented data sets and applications
– Relational database, data warehouse, transactional database
• Advanced data sets and advanced applications
– Object-Relational Databases
– Temporal Databases, Sequence Databases, Time-Series databases
– Spatial Databases and Spatiotemporal Databases
– Text databases and Multimedia databases
– Heterogeneous Databases and Legacy Databases
– Data Streams
– The World-Wide Web

15
Data Warehouses
• A repository of information collected from multiple sources, stored
under a unified schema, and that usually resides at a single site.
• Constructed via a process of data cleaning, data integration, data
transformation, data loading and periodic data refreshing.

16
Data Warehouses (2)
• Data are organized around major subjects, e.g. customer, item, supplier and
activity.
• Provide information from a historical perspective (e.g. from the past 5 – 10
years)
• Typically summarized to a higher level (e.g. a summary of the transactions
per item type for each store)
• User can perform drill-down or roll-up operation to view the data at different
degrees of summarization

17
Data Warehouses (3)

18
Transactional Databases
• Consists of a file where each record represents a transaction
• A transaction typically includes a unique transaction ID and a list of the items
making up the transaction.

• Either stored in a flat file or unfolded into relational tables

• Easy to identify items that are frequently sold together

19
Data Mining Functionalities
 Concept/Class Description: Characterization and Discrimination
 Data can be associated with classes or concepts.
E.g. classes of items – computers, printers, …
concepts of customers – bigSpenders, budgetSpenders, …
 How to describe these items or concepts?
 Descriptions can be derived via Data characterization – summarizing the
general characteristics of a target class of data.
E.g. summarizing the characteristics of customers who spend more than
$1,000 a year at AllElectronics.
Result can be a general profile of the customers, such as 40 – 50 years old,
employed, have excellent credit ratings. `

20
Data Mining Functionalities
 Data discrimination – comparing the target class with one or a set of
comparative classes
– E.g. Compare the general features of software products whole sales
increase by 10% in the last year with those whose sales decrease
by 30% during the same period
 Or both of the above
 Mining Frequent Patterns, Associations and Correlations
– Frequent itemset: a set of items that frequently appear together in a
transactional data set (e.g. milk and bread)
– Frequent subsequence: a pattern that customers tend to purchase product
A, followed by a purchase of product B

21
Data Mining Functionalities
– Association Analysis: find frequent patterns
• E.g. a sample analysis result – an association rule:
buys(X, “computer”) => buys(X, “software”)
[support = 1%, confidence = 50%]
(if a customer buys a computer, there is a 50% chance that she will buy
software. 1% of all of the transactions under analysis showed that
computer and software are purchased together. )
• Associations rules are discarded as uninteresting if they do not satisfy
both a minimum support threshold and a minimum confidence threshold.
– Correlation Analysis: additional analysis to find statistical correlations between
associated pairs

22
Data Mining Functionalities
 Classification and Prediction
– Classification
• The process of finding a model that describes and distinguishes the
data classes or concepts, for the purpose of being able to use the
model to predict the class of objects whose class label is unknown.
• The derived model is based on the analysis of a set of training data
(data objects whose class label is known).
• The model can be represented in classification (IF-THEN) rules, decision
trees, neural networks, etc.
– Prediction
• Predict missing or unavailable numerical data values

23
Data Mining Functionalities

24
Data Mining Functionalities (2)
• Cluster Analysis
– Class label is unknown: group data to form new classes
– Clusters of objects are formed based on the principle of maximizing intra-
class similarity & minimizing interclass similarity
• E.g. Identify homogeneous subpopulations of customers. These clusters
may represent individual target groups for marketing.

25
Data Mining Functionalities (2)
• Outlier Analysis
– Data that do no comply with the general behavior or model.
– Outliers are usually discarded as noise or exceptions.
– Useful for fraud detection.
• E.g. Detect purchases of extremely large amounts
• Evolution Analysis
– Describes and models regularities or trends for objects whose
behavior changes over time.
• E.g. Identify stock evolution regularities for overall stocks and for the
stocks of particular companies.

26
PATTERN
 A pattern means that data are correlated that they have a relationship.
 Patterns are predictable.
 When you have a lack of pattern, you have true randomness.
 When you find a pattern, you can have a good idea when or where something
will happen before it actually happens.
 A pattern is a series of data that repeats in a recognizable way. It can be
identified in the history of the asset being evaluated or other assets with
similar characteristics.

27
FREQUENT PATTERN
 Frequent patterns, as the name suggests, are patterns that occur frequently
in data.
 Frequent patterns are item sets, subsequences, or substructures that appear in
a data set with frequency no less than a user-specified threshold.
 For example, a set of items, such as milk and bread, that appear frequently
together in a transaction data set, is a frequent item set.
 A subsequence, such as buying first a PC, then a digital camera, and then a
memory card, if it occurs frequently in a shopping history database, is a
(frequent) sequential pattern.

28
FREQUENT PATTERN
 A substructure can refer to different structural forms, such as subgraphs
subtrees, or sub-lattices, which may be combined with item sets or
subsequences.
 If a substructure occurs frequently, it is called a (frequent) structured
pattern. Mining frequent patterns leads to the discovery of interesting
associations and correlations within data.

29
Are All of the Patterns Interesting?
• Data mining may generate thousands of patterns: Not all of them
are interesting
• A pattern is interesting if it is
– easily understood by humans
– valid on new or test data with some degree of certainty,
– potentially useful
– novel
– validates some hypothesis that a user seeks to confirm
• An interesting measure represents knowledge !

30
Are All of the Patterns Interesting?
• Objective measures
– Based on statistics and structures of patterns, e.g., support, confidence, etc.
(Rules that do not satisfy a threshold are considered uninteresting.)
• Subjective measures
– Reflect the needs and interests of a particular user.
• E.g. A marketing manager is only interested in characteristics of
customers who shop frequently.
– Based on user’s belief in the data.
• e.g., Patterns are interesting if they are unexpected, or can be used for
strategic planning, etc
• Objective and subjective measures need to be combined.

31
Are All of the Patterns Interesting?
• Find all the interesting patterns: Completeness
– Unrealistic and inefficient
– User-provided constraints and interestingness measures should be used
• Search for only interesting patterns: An optimization problem
– Highly desirable
– No need to search through the generated patterns to identify truly
interesting ones.
– Measures can be used to rank the discovered patterns according their
interestingness.

32
Classification of data mining systems
 Data mining is an interdisciplinary field, the confluence of a set of disciplines.

 Because of the diversity of disciplines contributing to datamining, datamining

research is expected to generate a large variety of data mining systems.

33
Classification of data mining systems

Database
Technology Statistics

Information Machine
Science Data Mining Learning

Visualization Other
Disciplines

34
Classification of data mining systems
• Classification according to Database
– Relational, data warehouse, transactional, stream, object-oriented/
relational, spatial, time-series, text data, multi-media, heterogeneous,
WWW
• Classification according to Knowledge
– Characterization, discrimination, association, classification, clustering,
outlier analysis, etc.
– Multiple/integrated functions and mining at multiple levels(High level data,
Raw level data, Multiple level data)
• Classification according to Techniques utilized
– Degree of user interaction involved (Autonomous system, Query driven
system, interactive exploratory system) or methods of data analysis involved
(Database-oriented, data warehouse, machine learning, statistics,
visualization, etc.)

35
Classification of data mining systems
• Classification according to Applications adapted
- Retail, telecommunication, banking, fraud analysis, DNA, stock
market analysis, text mining, Web mining, etc.
- Different applications often require the integration of application specific
methods. Therefore, a generic, all-purpose data mining system may not
fit domain-specific mining tasks.

36
Data Mining Task Primitives
• Each user will have a data mining task in mind, that is, some form of data analysis

that he or she would like to have performed.

• A data mining task can be specified in the form of a data mining query, which is

input to the data mining system.

• A data mining query is defined in terms of data mining task primitives.

• How to construct a data mining query?

– The primitives allow the user to interactively communicate with the data

mining system during discovery to direct the mining process, or examine the

findings

37
Data Mining Task Primitives
– The primitives specify:

(1) The set of task-relevant data – which portion of the database to be used

– Database or data warehouse name

– Database tables or data warehouse cubes

– Condition for data selection

– Relevant attributes or dimensions

– Data grouping criteria

38
Data Mining Task Primitives
– The primitives specify:

(2) The kind of knowledge to be mined – what DB functions to be performed

– Characterization
– Discrimination
– Association
– Classification/prediction
– Clustering
– Outlier analysis
– Other data mining tasks

39
Data Mining Task Primitives
(3) The background knowledge to be used – what domain knowledge,

concept hierarchies, etc.

(4) Interestingness measures and thresholds – support, confidence, etc.

(5) Visualization methods – what form to display the result, e.g. rules,

tables, charts, graphs, …

40
DMQL
• DMQL – Data Mining Query Language
– Designed to incorporate these primitives
– Allow user to interact with DM systems
– Providing a standardized language like SQL

41
Why Data Mining Query Language?
• Automated vs. query-driven?
– Finding all the patterns autonomously in a database?—unrealistic because
the patterns could be too many but uninteresting
• Data mining should be an interactive process
– User directs what to be mined
• Users must be provided with a set of primitives to be used to
communicate with the data mining system
• Incorporating these primitives in a data mining query language
– More flexible user interaction
– Foundation for design of graphical user interface
– Standardization of data mining industry and practice

42
Major Issues in Data Mining
• Mining different kinds of knowledge
– DM should cover a wide spectrum of data analysis and knowledge
discovery tasks
– Enable to use the database in different ways
– Require the development of numerous data mining techniques
• Interactive mining of knowledge at multiple levels of abstraction
– Difficult to know exactly what will be discovered
– Allow users to focus the search, refine data mining requests
• Incorporation of background knowledge
– Guide the discovery process
– Allow discovered patterns to be expressed in concise terms and different
levels of abstraction

43
Major Issues in Data Mining
• Data mining query languages and ad hoc data mining
– High-level query languages need to be developed
– Should be integrated with a DB/DW query language
• Presentation and visualization of results
– Knowledge should be easily understood and directly usable
– High level languages, visual representations or other expressive forms
– Require the DM system to adopt the above techniques
• Handling noisy or incomplete data
– Require data cleaning methods and data analysis methods that can handle
noise
• Pattern evaluation – the interestingness problem
– How to develop techniques to access the interestingness of discovered
patterns, especially with subjective measures based on user beliefs or
expectations

44
Major Issues in Data Mining
• Performance Issues
– Efficiency and scalability
• Huge amount of data
• Running time must be predictable and acceptable
– Parallel, distributed and incremental mining algorithms
• Divide the data into partitions and processed in parallel
• Incorporate database updates without having to mine the entire data
again from scratch

45
Major Issues in Data Mining
• Diversity of Database Types
– Other database that contain complex data objects, multimedia data,
spatial data, etc.
– Expect to have different DM systems for different kinds of data
– Heterogeneous databases and global information systems
• Web mining becomes a very challenging and fast-evolving field in
data mining.

46
ASSOCIATION AND CORRELATIONS
• Associations and Correlations are called relationships among item sets. Ther
e are some other interesting relationships used to find frequent item sets.

47
FREQUENT ITEM SET MINING METHODS
Apriori Algorithm
 Apriori algorithm was proposed by R. Agrawal and R. Srikant in 1994 for mining
frequent itemsets for Boolean association rules.
 It uses prior knowledge of frequent itemset properties.
 Apriori employs an iterative approach known as a level-wise search,
where k-itemsets are used to explore (k+1)-itemsets.
 Apriori property: All nonempty subsets of a frequent itemset must also be
frequent.
 Apriori Property will help to improve the efficiency and reduce the search space.

48
FREQUENT ITEM SET MINING METHODS

Demonstration – Finding Frequent Item sets

 Sample - Transactional data for an AllElectronics branch.
TID List of item IDs
T100 I1, I2, I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
T700 I1,I3
T800 I1,I2,I3,I5
T900 I1,I2,I3

49
FREQUENT ITEM SET MINING METHODS
Demonstration – Finding Frequent Item sets

1) Scan database (D) for count of each candidate(Ci). Compare candidate

support count with minimum support count and find Li, where i=1.
(C1) (C1) (L1)
Itemset Itemset Support Count Itemset Support Count
{I1} {I1} 6 {I1} 6
{I2} {I2} 7 {I2} 7
{I3} {I3} 6 {I3} 6
{I4} {I4} 2 {I4} 2
{I5} {I5} 2 {I5} 2

50
FREQUENT ITEM SET MINING METHODS
Demonstration – Finding Frequent Item sets
1) Generate C2 candidate from L1 and scan D for counting each candidate.
Compare Candidate support with minimum support and find L2
(C2) (C2) (L2)
Itemset Itemset Support Count Itemset Support Count
{I1,I2} {I1,I2} 4 {I1,I2} 4
{I1,I3} {I1,I3} 4 {I1,I3} 4
{I1,I4} {I1,I4} 1 {I1,I5} 2
{I1,I5} {I1,I5} 2 {I2,I3} 4
{I2,I3} {I2,I3} 4 {I2,I4} 2
{I2,I4} {I2,I4} 2 {I2,I5} 2
{I2,I5} {I2,I5} 2
{I3,I4} {I3,I4} 0
{I3,I5} {I3,I5} 1
{I4,I5} {I4,I5} 0
51
FREQUENT ITEM SET MINING METHODS
Demonstration – Finding Frequent Item sets
1) Generate C3 candidate from L2 and scan D for counting each candidate.
Compare Candidate support with minimum support and find L3
(C3) (C3) (L3)
Itemset Itemset Support Count Itemset Support Count
{I1,I2,I3} {I1,I2,I3} 2 {I1,I2,I3} 2
{I1,I2,I5} {I1,I2,I5} 2 {I1,I2,I5} 2
{I1,I2,I4} {I1,I2,I4} 1
{I1,I3,I5} {I1,I3,I5} 1
{I2,I3,I4} {I2,I3,I4} 0
{I2,I3,I5} {I2,I3,I5} 1
{I2,I4,I5} {I2,I4,I5} 0

Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
100% (1)
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
Unit 1
No ratings yet
Unit 1
59 pages
Chap 1
No ratings yet
Chap 1
45 pages
Data Mining Concepts and Applications
No ratings yet
Data Mining Concepts and Applications
38 pages
Data Mining for Analysts & Businesses
No ratings yet
Data Mining for Analysts & Businesses
15 pages
Unit 1 DM
No ratings yet
Unit 1 DM
62 pages
Unit I Dbmi
No ratings yet
Unit I Dbmi
35 pages
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
No ratings yet
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
80 pages
Unit-1 A
No ratings yet
Unit-1 A
47 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
Unit-1 DWDM
No ratings yet
Unit-1 DWDM
20 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
DWDM Notes
No ratings yet
DWDM Notes
59 pages
Module 1
No ratings yet
Module 1
41 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
46 pages
DM Module 1
No ratings yet
DM Module 1
13 pages
Unit 1 and 2
No ratings yet
Unit 1 and 2
145 pages
Data Mining: Knowledge Discovery in Databases
No ratings yet
Data Mining: Knowledge Discovery in Databases
21 pages
Data Mining Notes
100% (1)
Data Mining Notes
45 pages
CH 1
No ratings yet
CH 1
66 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
39 pages
Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
No ratings yet
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
100 pages
DWDM
No ratings yet
DWDM
30 pages
Introduction To Data Mining 1604
No ratings yet
Introduction To Data Mining 1604
32 pages
Bca DM Unit I
No ratings yet
Bca DM Unit I
20 pages
Data Mining (Introduction)
No ratings yet
Data Mining (Introduction)
31 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
Why We Need Data Mining?
No ratings yet
Why We Need Data Mining?
39 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
32 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
Chapter 1 Data Mining Lecture Note
No ratings yet
Chapter 1 Data Mining Lecture Note
31 pages
ICS 2408 Lecture 1 Introduction
No ratings yet
ICS 2408 Lecture 1 Introduction
32 pages
Module 3
No ratings yet
Module 3
187 pages
Chap 1
No ratings yet
Chap 1
32 pages
Data Mining:: Dr. Hany Saleeb
No ratings yet
Data Mining:: Dr. Hany Saleeb
37 pages
Introduction
No ratings yet
Introduction
46 pages
Comprehensive Guide to Data Mining
No ratings yet
Comprehensive Guide to Data Mining
32 pages
Lecture 1.1.1 1.1.2
No ratings yet
Lecture 1.1.1 1.1.2
32 pages
Unit-1 PPT Dma
No ratings yet
Unit-1 PPT Dma
83 pages
Data Mining - IMT Nagpur-Manish
No ratings yet
Data Mining - IMT Nagpur-Manish
82 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Prof. Chandan Singhavi
No ratings yet
Prof. Chandan Singhavi
86 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
Data Mining
No ratings yet
Data Mining
88 pages
Dmi Unit 1
No ratings yet
Dmi Unit 1
8 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
21 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
No ratings yet
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
31 pages
Important Questions Unit-1
No ratings yet
Important Questions Unit-1
20 pages
02-Introduction To Data Mining
No ratings yet
02-Introduction To Data Mining
40 pages
Unit 1: Data Warehousing & Data Mining
No ratings yet
Unit 1: Data Warehousing & Data Mining
54 pages
L - 1 Data Mining
No ratings yet
L - 1 Data Mining
17 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
Java API and Framework Overview
No ratings yet
Java API and Framework Overview
9 pages
6.823 Computer System Architecture: Victim Cache
No ratings yet
6.823 Computer System Architecture: Victim Cache
2 pages
Resume Vishu 18082003
No ratings yet
Resume Vishu 18082003
1 page
Database Solutions for Vet Clinics
No ratings yet
Database Solutions for Vet Clinics
21 pages
Confidence Interval 2
No ratings yet
Confidence Interval 2
3 pages
Unit 3 Storage Strategies Indices B-Trees Hashing
No ratings yet
Unit 3 Storage Strategies Indices B-Trees Hashing
12 pages
Cloud - Senior SQL Server Database Administrator - SagarPatil - 12yrs - Updated
No ratings yet
Cloud - Senior SQL Server Database Administrator - SagarPatil - 12yrs - Updated
3 pages
Data Mining Proteomes
No ratings yet
Data Mining Proteomes
4 pages
Data Warehouse Design Guide
No ratings yet
Data Warehouse Design Guide
11 pages
Symantec DLP 14.6 Release Notes
No ratings yet
Symantec DLP 14.6 Release Notes
49 pages
Practical Manual - Relational Database Management
No ratings yet
Practical Manual - Relational Database Management
5 pages
Bab 03 Mitigate Threats Using Microsoft Defender For Cloud
No ratings yet
Bab 03 Mitigate Threats Using Microsoft Defender For Cloud
45 pages
Random Forest
No ratings yet
Random Forest
25 pages
100 Most Difficult Data Analyst Interview Q&A
No ratings yet
100 Most Difficult Data Analyst Interview Q&A
26 pages
Postgres Notes
No ratings yet
Postgres Notes
158 pages
SAP BO Architecture Guide
No ratings yet
SAP BO Architecture Guide
14 pages
Dam Unit - V
No ratings yet
Dam Unit - V
19 pages
Fundamentals of Database Systems 6th Edition Test Bank Available Instantly
No ratings yet
Fundamentals of Database Systems 6th Edition Test Bank Available Instantly
325 pages
CCW331 BA IAT 1 Set 1 & Set 2 Questions
No ratings yet
CCW331 BA IAT 1 Set 1 & Set 2 Questions
19 pages
Irs Unit 5 PDF
No ratings yet
Irs Unit 5 PDF
24 pages
POSNET Serial Protocol Ver.015
No ratings yet
POSNET Serial Protocol Ver.015
39 pages
Beginning C - Through Game Prog - Michael Dawson
50% (2)
Beginning C - Through Game Prog - Michael Dawson
11 pages
M.sc. (Computer Science) Sem-I Feb.24 Nosql Technologies (Pd-16-Feb-24) (PC-57751)
No ratings yet
M.sc. (Computer Science) Sem-I Feb.24 Nosql Technologies (Pd-16-Feb-24) (PC-57751)
1 page
B.Tech. 3rd Yr CSE (AIML) 2022 23
No ratings yet
B.Tech. 3rd Yr CSE (AIML) 2022 23
33 pages
G12-Cs-Practical QP, Ak
No ratings yet
G12-Cs-Practical QP, Ak
16 pages
Exam Questions
No ratings yet
Exam Questions
12 pages
Practical File1 3
No ratings yet
Practical File1 3
51 pages
BMC Remedy Action Request System 9.1.00 PDF
No ratings yet
BMC Remedy Action Request System 9.1.00 PDF
4,838 pages
Oracle 12c Migration Performance Fix
No ratings yet
Oracle 12c Migration Performance Fix
5 pages
Deep Learning For Safety in Construction
No ratings yet
Deep Learning For Safety in Construction
12 pages

Data Mining for Computer Science Students

Uploaded by

Data Mining for Computer Science Students

Uploaded by

DATA MINING

DR. P. BALAMURUGAN, ASST. PROFESSOR

• Data mining — Automated analysis of massive data sets

potentially useful) patterns or knowledge from huge amount of data

Data Presentation Business

Data Preprocessing/Integration, Data Warehouses

• Not all “Data Mining System” performs true data mining

• Either stored in a flat file or unfolded into relational tables

 Because of the diversity of disciplines contributing to datamining, datamining

research is expected to generate a large variety of data mining systems.

that he or she would like to have performed.

input to the data mining system.

• A data mining query is defined in terms of data mining task primitives.

• How to construct a data mining query?

– Database or data warehouse name

– Database tables or data warehouse cubes

– Condition for data selection

– Relevant attributes or dimensions

– Data grouping criteria

(2) The kind of knowledge to be mined – what DB functions to be performed

concept hierarchies, etc.

(4) Interestingness measures and thresholds – support, confidence, etc.

tables, charts, graphs, …

Demonstration – Finding Frequent Item sets

1) Scan database (D) for count of each candidate(Ci). Compare candidate

You might also like