0% found this document useful (0 votes)

6 views6 pages

Introduction To Data Mining

Data mining is the process of analyzing large datasets to identify patterns and extract valuable information for various applications such as business strategy enhancement, decision-making, and fraud detection. It involves techniques like classification, clustering, and regression, and is closely related to machine learning and statistics. The data mining process includes stages such as data pre-processing, exploratory data analysis, data selection, and knowledge discovery.

Uploaded by

sayeedsamia963

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views6 pages

Introduction To Data Mining

Uploaded by

sayeedsamia963

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Introduction to Data mining

What is data mining? Examples.

Data mining is the process of searching and analyzing a large batch of raw data in order to
identify patterns and extract useful information.
Data mining is used to explore large data volumes to find patterns and insights that can be used
for specific purposes. These purposes might include improving sales and marketing, optimizing
manufacturing, detecting fraud, and enhancing security.
Why data mining?

Data mining is important because it helps organizations and individuals extract useful insights
from large amounts of data.

Enhances Business Strategies

 Companies use it for customer segmentation and personalized marketing.
 Helps in inventory management, pricing strategies, and customer retention.
Improves Decision-Making
 Businesses can make data-driven decisions instead of relying on intuition.
 Governments and healthcare industries use it to improve public services.
Medical and Scientific Discoveries
 Used in disease prediction, drug discovery, and patient care improvements.
 Helps in genome research and bioinformatics.

Fraud Detection and Risk Management

 Banks and financial institutions use data mining to detect unusual transactions.
 Helps in identifying cybersecurity threats.
Discover Patterns and Trends
 Helps identify hidden patterns and correlations in data.
 Predicts customer behavior, fraud detection, and market trends.
Automation and AI Development
 Supports machine learning models by providing meaningful data patterns.
 Used in recommendation systems like Netflix and Amazon.
Competitive Advantage
 Helps businesses stay ahead of competitors by understanding market trends.
 Assists in predicting future sales and customer needs.
What is (not) Data Mining?
What is not Data Mining?
-Look up phone number in phone directory
-Query a Web search engine for information about “Amazon”
What is Data Mining?

-Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in
Boston area)
-Group together similar documents returned by search engine according to their context (e.g.
Amazon rainforest, Amazon.com,)
-SPL ->DSA-I

Origins of Data Mining:

-Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems
-Traditional Techniques may be unsuitable due to
-Enormity | (extreme scale) of data
-High dimensionality (large number of features or attributes) of data
-Heterogeneous, distributed nature of data
Related technologies:

 Machine learning: Machine learning (ML) is a type of artificial intelligence (AI) that
allows computers to learn and improve from data without being explicitly programmed. It
uses algorithms to analyze data, identify patterns, and make decisions.
 OLAP : Online analytical processing (OLAP) is a technology that analyzes large amounts
of data quickly. It's used in business intelligence, decision support, and reporting.
 Statistics: Statistics is a branch of applied mathematics that involves the collection,
description, analysis, and inference of conclusions from quantitative data.
 DBMS: A database management system (DBMS) is a software system for creating
and managing databases. A DBMS enables end users to create, protect, read, update,
and delete data in a database. It also manages security, data integrity, and concurrency for
databases.

Data Mining Tasks

Prediction Methods- Use some variables to predict unknown or future values of other variables
Description Methods- Find human-interpretable patterns that describe the data
Classification [Predictive]
-Classification involves assigning data into predefined categories based on specific attributes.
For example, using algorithms trained on labeled data, emails can be classified as 'spam' or
'not spam'.
-Clustering groups data into clusters based on similarities without predefined labels.
Regression [Predictive]- Predict a value of a given continuous valued variable based on the
values of other variables
Examples:
– Predicting sales amounts of new product based on advertising expenditure.
– Predicting wind velocities as a function of temperature, humidity, air pressure,
etc.
– Time series prediction of stock market indices
Linear-> age and height
Nonlinear -> population growth with time
Logistics -> diabetes based on some issues
Deviation Detection [Predictive]
-Abnormality instead of normal occurrence [credit card fraud, Network Intrusion
Detection]
Clustering [Descriptive]
Given a set of data points, each having a set of attributes, and a similarity measure among
them, find clusters such that
– Data points in one cluster are more similar to one another.
– Data points in separate clusters are less similar to one another.
Association Rule Discovery [Descriptive]
Rules which will predict occurrence of an item based on occurrences of other items.
- Customers who buy bread are likely to also buy milk.

Sequential Pattern Discovery [Descriptive]

Find rules that predict strong sequential dependencies among different events. Event
occurrences in the patterns are governed by timing constraints.
– People who buy DVD players tend to buy DVD in the period immediately following the
purchase
– Candy sales peak before Halloween (timing constraints)
– Athletic Apparel Store:
(Shoes) (Racket, Racketball) --> (Sports_Jacket)
What is Machine Learning?
Machine learning (ML) is a type of artificial intelligence (AI) that allows machines to learn and improve
from experience.
Types of relationship
-Nonlinear relationship
-Linear relationship

Math for coefficients and least square equation for age salary
Straight-line regression using the method of least squares. Table 1 shows a set of paired data where x is
the number of works experience of a college graduate and y is the corresponding salary of the graduate.
Estimate the equation of least squares line. Also, calculate the salary y for 10 years experiences.

Solution:
Mean value of x, =9.1 and Mean value of y, =55.4
We know that,

Also we know,
w1 x w0

Now, y=w0+w1x=> 23.6+3.5x=> 23.6+3.510=58.61000=$58600

ML vs Data mining
Data mining is used on an existing dataset (like a data warehouse) to find patterns. Machine learning, on
the other hand, is trained on a 'training' data set, which teaches the computer how to make sense of
data, and then to make predictions about new data sets.

Data Mining Machine Learning

Extracting useful information from large amount Introduce algorithm from data as well as from
of data past experience
Teaches the computer to learn and understand
Used to understand the data flow
from the data flow
Huge databases with unstructured data Existing data as well as algorithms
Data mining abstract from the data warehouse Machine learning reads machine
Clustering, association rule mining, outlier Regression, classification, clustering, deep
detection learning
Domain knowledge is helpful, but not always
Strong domain knowledge is often required
necessary

Stages of the Data mining Process:

Four main stages:
1. Data Pre-processing: Data preprocessing is the process of preparing data for machine learning and other data
analysis. It involves:
 Data cleaning: Fixing errors, removing duplicates, and handling missing values
 Data transformation: Scaling, normalizing, and encoding categorical variables
 Data reduction: Selecting relevant features and reducing dimensionality
 Data integration: Combining data from multiple sources
 Data formatting: Ensuring consistent data types and structures
 Data validation: Checking for errors, ensuring data consistency, and verifying that transformations were
applied correctly
2. Exploratory Data Analysis: Exploratory Data Analysis (EDA) is an analysis approach that identifies
general patterns in the data. These patterns include outliers and features of the data that might be unexpected. EDA
is an important first step in any data analysis.
3. Data Selection: Data selection is the process of choosing the right data type, source, and collection
instruments for a project. It's a crucial step before data collection.
Purpose
 To ensure that the data is relevant, accurate, and aligned with the project's goals
 To answer research questions
 To train or evaluate a machine learning model
4. Knowledge Discovery: Knowledge discovery is the process of extracting useful knowledge from data

Data Mining Techniques:

 Classification
 Clustering Regression
 Associative rules
 Sequential Pattern
 Artificial Neural Network
 Outlier detection
 Prediction
 Genetic Algorithm

Prediction is a data mining technique that involves using models to predict future outcomes. This
is called predictive data mining.

Genetic algorithms (GAs) are a data mining technique that can be used to classify data and solve
optimization problems. GAs are based on natural evolution and can be used to find optimal
solutions.

Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
ISS-DSS - Module 3
No ratings yet
ISS-DSS - Module 3
23 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Data Mining for Business Insights
100% (3)
Data Mining for Business Insights
11 pages
Data Mining
No ratings yet
Data Mining
15 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Data Mining - An Overview
No ratings yet
Data Mining - An Overview
40 pages
Data Mining & BI Course Guide
No ratings yet
Data Mining & BI Course Guide
25 pages
PredictiveAnalysis U1 U2
No ratings yet
PredictiveAnalysis U1 U2
7 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Data Mining
No ratings yet
Data Mining
254 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
Data Mining Survey Overview
No ratings yet
Data Mining Survey Overview
8 pages
1 - DM
No ratings yet
1 - DM
5 pages
Data Mining
No ratings yet
Data Mining
30 pages
Data Mining Concepts
100% (3)
Data Mining Concepts
122 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
Unit III
No ratings yet
Unit III
101 pages
Presentation 1
No ratings yet
Presentation 1
28 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Data Mining OVERVIEW
No ratings yet
Data Mining OVERVIEW
8 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
DM Module1
No ratings yet
DM Module1
15 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Data Mining L1,2
No ratings yet
Data Mining L1,2
26 pages
Data Science Module 1 Notes
No ratings yet
Data Science Module 1 Notes
16 pages
Data Mining
No ratings yet
Data Mining
9 pages
Introduction To Data Mining Unit1
No ratings yet
Introduction To Data Mining Unit1
37 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
Data Mining Insights & Applications
No ratings yet
Data Mining Insights & Applications
9 pages
Data Mining: Techniques & Applications
No ratings yet
Data Mining: Techniques & Applications
16 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
13 pages
Data Mining
No ratings yet
Data Mining
6 pages
Introduction to Data Mining Basics
No ratings yet
Introduction to Data Mining Basics
43 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
UNIT 5 Introduction To Data Mining-1
No ratings yet
UNIT 5 Introduction To Data Mining-1
185 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
DM Notes
No ratings yet
DM Notes
91 pages
Week 4 - Introduction To Data Mining and Data Mining Techniques
No ratings yet
Week 4 - Introduction To Data Mining and Data Mining Techniques
44 pages
Fundamental of Data Mining (CSI-508) .
No ratings yet
Fundamental of Data Mining (CSI-508) .
19 pages
Data Mining for Business Insights
100% (1)
Data Mining for Business Insights
39 pages
1 - Lect 1 & 2 Data Mining
No ratings yet
1 - Lect 1 & 2 Data Mining
20 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
87 pages
DWDM 3 Unit Notes
No ratings yet
DWDM 3 Unit Notes
10 pages
Data Mining, Data Pattern, Machine Learning (Week 2
No ratings yet
Data Mining, Data Pattern, Machine Learning (Week 2
19 pages
Lec 1
No ratings yet
Lec 1
48 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Datamining: by Guan Hang Su Cs157A Section 2 Fall 2005
0% (1)
Datamining: by Guan Hang Su Cs157A Section 2 Fall 2005
31 pages
CS-DM Module - 1
No ratings yet
CS-DM Module - 1
27 pages
What Is Data Mining
No ratings yet
What Is Data Mining
1 page
Data Mining Concepts & Applications
100% (1)
Data Mining Concepts & Applications
121 pages
Unit Iii
No ratings yet
Unit Iii
33 pages
Decision Trees for Data Science
No ratings yet
Decision Trees for Data Science
5 pages
Jaava MCQ Questions
No ratings yet
Jaava MCQ Questions
50 pages
Logistics Manager Role in Kigali
100% (2)
Logistics Manager Role in Kigali
2 pages
Relational Database Design Guide
No ratings yet
Relational Database Design Guide
16 pages
BSCS 1stsemester ICT
33% (6)
BSCS 1stsemester ICT
4 pages
Chapter 4 Tree and Graph
No ratings yet
Chapter 4 Tree and Graph
43 pages
Biodevices 2009
0% (1)
Biodevices 2009
508 pages
C++ Programming for Beginners
No ratings yet
C++ Programming for Beginners
3 pages
Dell Repair Manual
No ratings yet
Dell Repair Manual
66 pages
Sample
No ratings yet
Sample
131 pages
MNFST
No ratings yet
MNFST
22 pages
Hard Drive Troubleshooting Checklist
No ratings yet
Hard Drive Troubleshooting Checklist
4 pages
2024 UG03 231MA202 Question Bank New
No ratings yet
2024 UG03 231MA202 Question Bank New
12 pages
IP Unit - 2
No ratings yet
IP Unit - 2
23 pages
An Overview of Transceiver Systems
No ratings yet
An Overview of Transceiver Systems
11 pages
Operating System Lab Manual
No ratings yet
Operating System Lab Manual
70 pages
MEC603
No ratings yet
MEC603
2 pages
ECE 412: Microcomputer Laboratory: Lecture 10: Kernel Modules and Device Drivers
No ratings yet
ECE 412: Microcomputer Laboratory: Lecture 10: Kernel Modules and Device Drivers
22 pages
Alap Asap
No ratings yet
Alap Asap
5 pages
Solving Nonlinear Simultaneous Equations
No ratings yet
Solving Nonlinear Simultaneous Equations
3 pages
HL7 Conformance Testing With Message Maker
No ratings yet
HL7 Conformance Testing With Message Maker
24 pages
Fitzpatrick Dermatology
100% (4)
Fitzpatrick Dermatology
2,576 pages
Writing The USRP File System Disk Image To A SD Card
No ratings yet
Writing The USRP File System Disk Image To A SD Card
2 pages
Unit - 5 PDF
100% (1)
Unit - 5 PDF
4 pages
Sarkar 89
No ratings yet
Sarkar 89
213 pages
CH 13
No ratings yet
CH 13
47 pages
Colombia Maps & Usage Terms
No ratings yet
Colombia Maps & Usage Terms
3 pages
Case Study - Oil Bunkering
No ratings yet
Case Study - Oil Bunkering
4 pages
Facebook Messenger Message Count
No ratings yet
Facebook Messenger Message Count
2 pages
Electronic Commerce Act (RA No. 8792)
No ratings yet
Electronic Commerce Act (RA No. 8792)
2 pages

Introduction To Data Mining

Uploaded by

Introduction To Data Mining

Uploaded by

Introduction to Data mining

What is data mining? Examples.

Enhances Business Strategies

Fraud Detection and Risk Management

Origins of Data Mining:

Data Mining Tasks

Sequential Pattern Discovery [Descriptive]

Now, y=w0+w1x=> 23.6+3.5x=> 23.6+3.5*10=58.6*1000=$58600

Data Mining Machine Learning

Stages of the Data mining Process:

Data Mining Techniques:

You might also like

Now, y=w0+w1x=> 23.6+3.5x=> 23.6+3.510=58.61000=$58600