100% found this document useful (1 vote)

335 views18 pages

Data Mining

This document provides an overview of data mining, including its definition, implementation process, techniques, examples, challenges and tools. It describes data mining as the process of discovering patterns in large data sets. The implementation process involves business understanding, data preparation, modelling, evaluation and deployment. Common techniques include classification, clustering, regression, association rule mining and prediction. Examples show how data mining can be used for customer profiling and analyzing credit card usage. Challenges include needing skilled experts and integrating diverse data sources.

Uploaded by

admin ker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

335 views18 pages

Data Mining

Uploaded by

admin ker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Data Mining Tutorial: Process,

Techniques, Tools, EXAMPLES

What is Data Mining?
Data mining is looking for hidden, valid, and potentially useful patterns in
huge data sets. Data Mining is all about discovering unsuspected/
previously unknown relationships amongst the data.

It is a multi-disciplinary skill that uses machine learning, statistics, AI and

database technology.

The insights derived via Data Mining can be used for marketing, fraud
detection, and scientific discovery, etc.

Data mining is also called as Knowledge discovery, Knowledge extraction,

data/pattern analysis, information harvesting, etc.

In this tutorial, you will learn-

 What is Data Mining?

 Types of Data
 Data Mining Implementation Process
 Business understanding:
 Data understanding:
 Data preparation:
 Data transformation:
 Modelling:
 Data Mining Techniques
 Challenges of Implementation of Data Mine:
 Data Mining Examples:
 Data Mining Tools
 Benefits of Data Mining:
 Disadvantages of Data Mining
 Data Mining Applications

Types of Data
Data mining can be performed on following types of data

 Relational databases
 Data warehouses
 Advanced DB and information repositories
 Object-oriented and object-relational databases
 Transactional and Spatial databases
 Heterogeneous and legacy databases
 Multimedia and streaming database
 Text databases
 Text mining and Web mining

Data Mining Implementation Process

Let's study the Data Mining implementation process in detail

Business understanding:
In this phase, business and data-mining goals are established.

 First, you need to understand business and client objectives. You

need to define what your client wants (which many times even they
do not know themselves)
 Take stock of the current data mining scenario. Factor in resources,
assumption, constraints, and other significant factors into your
assessment.
 Using business objectives and current scenario, define your data
mining goals.
 A good data mining plan is very detailed and should be developed to
accomplish both business and data mining goals.

Data understanding:
In this phase, sanity check on data is performed to check whether its
appropriate for the data mining goals.

 First, data is collected from multiple data sources available in the

organization.
 These data sources may include multiple databases, flat filer or data
cubes. There are issues like object matching and schema integration
which can arise during Data Integration process. It is a quite complex
and tricky process as data from various sources unlikely to match
easily. For example, table A contains an entity named cust_no
whereas another table B contains an entity named cust-id.
 Therefore, it is quite difficult to ensure that both of these given
objects refer to the same value or not. Here, Metadata should be
used to reduce errors in the data integration process.
 Next, the step is to search for properties of acquired data. A good
way to explore the data is to answer the data mining questions
(decided in business phase) using the query, reporting, and
visualization tools.
 Based on the results of query, the data quality should be ascertained.
Missing data if any should be acquired.

Data preparation:
In this phase, data is made production ready.

The data preparation process consumes about 90% of the time of the
project.

The data from different sources should be selected, cleaned, transformed,

formatted, anonymized, and constructed (if required).

Data cleaning is a process to "clean" the data by smoothing noisy data and
filling in missing values.

For example, for a customer demographics profile, age data is missing.

The data is incomplete and should be filled. In some cases, there could be
data outliers. For instance, age has a value 300. Data could be
inconsistent. For instance, name of the customer is different in different
tables.

Data transformation operations change the data to make it useful in data

mining. Following transformation can be applied

Data transformation:
Data transformation operations would contribute toward the success of the
mining process.

Smoothing: It helps to remove noise from the data.

Aggregation: Summary or aggregation operations are applied to the data.

I.e., the weekly sales data is aggregated to calculate the monthly and
yearly total.
Generalization: In this step, Low-level data is replaced by higher-level
concepts with the help of concept hierarchies. For example, the city is
replaced by the county.

Normalization: Normalization performed when the attribute data are

scaled up o scaled down. Example: Data should fall in the range -2.0 to 2.0
post-normalization.

Attribute construction: these attributes are constructed and included the

given set of attributes helpful for data mining.

The result of this process is a final data set that can be used in modeling.

Modelling
In this phase, mathematical models are used to determine data patterns.

 Based on the business objectives, suitable modeling techniques

should be selected for the prepared dataset.
 Create a scenario to test check the quality and validity of the model.
 Run the model on the prepared dataset.
 Results should be assessed by all stakeholders to make sure that
model can meet data mining objectives.

Evaluation:
In this phase, patterns identified are evaluated against the business
objectives.

 Results generated by the data mining model should be evaluated

against the business objectives.
 Gaining business understanding is an iterative process. In fact, while
understanding, new business requirements may be raised because
of data mining.
 A go or no-go decision is taken to move the model in the deployment
phase.

Deployment:
In the deployment phase, you ship your data mining discoveries to
everyday business operations.

 The knowledge or information discovered during data mining process

should be made easy to understand for non-technical stakeholders.
 A detailed deployment plan, for shipping, maintenance, and
monitoring of data mining discoveries is created.
 A final project report is created with lessons learned and key
experiences during the project. This helps to improve the
organization's business policy.

Data Mining Techniques

1.Classification:
This analysis is used to retrieve important and relevant information about
data, and metadata. This data mining method helps to classify data in
different classes.

2. Clustering:
Clustering analysis is a data mining technique to identify data that are like
each other. This process helps to understand the differences and
similarities between the data.

3. Regression:
Regression analysis is the data mining method of identifying and analyzing
the relationship between variables. It is used to identify the likelihood of a
specific variable, given the presence of other variables.

4. Association Rules:
This data mining technique helps to find the association between two or
more Items. It discovers a hidden pattern in the data set.

5. Outer detection:
This type of data mining technique refers to observation of data items in the
dataset which do not match an expected pattern or expected behavior. This
technique can be used in a variety of domains, such as intrusion, detection,
fraud or fault detection, etc. Outer detection is also called Outlier Analysis
or Outlier mining.

6. Sequential Patterns:
This data mining technique helps to discover or identify similar patterns or
trends in transaction data for certain period.

7. Prediction:
Prediction has used a combination of the other data mining techniques like
trends, sequential patterns, clustering, classification, etc. It analyzes past
events or instances in a right sequence for predicting a future event.

Challenges of Implementation of Data mine:

 Skilled Experts are needed to formulate the data mining queries.
 Overfitting: Due to small size training database, a model may not fit
future states.
 Data mining needs large databases which sometimes are difficult to
manage
 Business practices may need to be modified to determine to use the
information uncovered.
 If the data set is not diverse, data mining results may not be
accurate.
 Integration information needed from heterogeneous databases and
global information systems could be complex

Data mining Examples:

Example 1:

Consider a marketing head of telecom service provides who wants to

increase revenues of long distance services. For high ROI on his sales and
marketing efforts customer profiling is important. He has a vast data pool of
customer information like age, gender, income, credit history, etc. But its
impossible to determine characteristics of people who prefer long distance
calls with manual analysis. Using data mining techniques, he may uncover
patterns between high long distance call users and their characteristics.

For example, he might learn that his best customers are married females
between the age of 45 and 54 who make more than $80,000 per year.
Marketing efforts can be targeted to such demographic.

Example 2:

A bank wants to search new ways to increase revenues from its credit card
operations. They want to check whether usage would double if fees were
halved.

Bank has multiple years of record on average credit card balances,

payment amounts, credit limit usage, and other key parameters. They
create a model to check the impact of the proposed new business policy.
The data results show that cutting fees in half for a targetted customer base
could increase revenues by $10 million.

Data Mining Tools

Following are 2 popular Data Mining Tools widely used in Industry

R-language:

R language is an open source tool for statistical computing and graphics. R

has a wide variety of statistical, classical statistical tests, time-series
analysis, classification and graphical techniques. It offers effective data
handing and storage facility.

Learn more here

Oracle Data Mining:

Oracle Data Mining popularly knowns as ODM is a module of the Oracle

Advanced Analytics Database. This Data mining tool allows data analysts
to generate detailed insights and makes predictions. It helps predict
customer behavior, develops customer profiles, identifies cross-selling
opportunities.

Learn more here

Benefits of Data Mining:

 Data mining technique helps companies to get knowledge-based
information.
 Data mining helps organizations to make the profitable adjustments
in operation and production.
 The data mining is a cost-effective and efficient solution compared to
other statistical data applications.
 Data mining helps with the decision-making process.
 Facilitates automated prediction of trends and behaviors as well as
automated discovery of hidden patterns.
 It can be implemented in new systems as well as existing platforms
 It is the speedy process which makes it easy for the users to analyze
huge amount of data in less time.

Disadvantages of Data Mining

 There are chances of companies may sell useful information of their
customers to other companies for money. For example, American
Express has sold credit card purchases of their customers to the
other companies.
 Many data mining analytics software is difficult to operate and
requires advance training to work on.
 Different data mining tools work in different manners due to different
algorithms employed in their design. Therefore, the selection of
correct data mining tool is a very difficult task.
 The data mining techniques are not accurate, and so it can cause
serious consequences in certain conditions.

Data Mining Applications

Applications Usage

Communications Data mining techniques are used in communication sector to predict customer behavior t

Insurance Data mining helps insurance companies to price their products profitable and promote ne
Education Data mining benefits educators to access student data, predict achievement levels and fin
attention. For example, students who are weak in maths subject.

Manufacturing With the help of Data Mining Manufacturers can predict wear and tear of production ass
reduce them to minimize downtime.

Banking Data mining helps finance sector to get a view of market risks and manage regulatory co
to decide whether to issue credit cards, loans, etc.

Retail Data Mining techniques help retail malls and grocery stores identify and arrange most se
store owners to comes up with the offer which encourages customers to increase their sp

Service Providers Service providers like mobile phone and utility industries use Data Mining to predict the
analyze billing details, customer service interactions, complaints made to the company to
incentives.

E-Commerce E-commerce websites use Data Mining to offer cross-sells and up-sells through their we
use Data mining techniques to get more customers into their eCommerce store.

Super Markets Data Mining allows supermarket's develope rules to predict if their shoppers were likely
they could find woman customers who are most likely pregnant. They can start targeting
on.

Crime Data Mining helps crime investigation agencies to deploy police workforce (where is a c
Investigation at a border crossing etc.

Bioinformatics Data Mining helps to mine biological data from massive datasets gathered in biology and

Summary:
 Data Mining is all about explaining the past and predicting the future
for analysis.
 Data mining helps to extract information from huge sets of data. It is
the procedure of mining knowledge from data.
 Data mining process includes business understanding, Data
Understanding, Data Preparation, Modelling, Evolution, Deployment.
 Important Data mining techniques are Classification, clustering,
Regression, Association rules, Outer detection, Sequential Patterns,
and prediction
 R-language and Oracle Data mining are prominent data mining tools.
 Data mining technique helps companies to get knowledge-based
information.
 The main drawback of data mining is that many analytics software is
difficult to operate and requires advance training to work on.
 Data mining is used in diverse industries such as Communications,
Insurance, Education, Manufacturing, Banking, Retail, Service
providers, eCommerce, Supermarkets Bioinformatics.

How It Works
Data mining, as a composite discipline, represents a variety of methods or
techniques used in different analytic capabilities that address a gamut of
organizational needs, ask different types of questions and use varying levels of
human input or rules to arrive at a decision.

Descriptive Modeling: It uncovers shared similarities or groupings in historical data

to determine reasons behind success or failure, such as categorizing customers by
product preferences or sentiment. Sample techniques include:

Clustering Grouping similar records together.

Anomaly detection Identifying multidimensional outliers.

Association rule learning Detecting relationships between records.

Principal component analysis Detecting relationships between variables.

Affinity grouping Grouping people with common interests or similar goals (e.g., people who b
Predictive Modeling: This modeling goes deeper to classify events in the future or
estimate unknown outcomes – for example, using credit scoring to determine an
individual's likelihood of repaying a loan. Predictive modeling also helps uncover
insights for things like customer churn, campaign response or credit defaults.
Sample techniques include:

Regression A measure of the strength of the relationship between one dependent variable and

Neural networks Computer programs that detect patterns, make predictions and learn.

Decision trees Tree-shaped diagrams in which each branch represents a probable occurrence.

Support vector machines Supervised learning models with associated learning algorithms.

Prescriptive Modeling: With the growth in unstructured data from the web,
comment fields, books, email, PDFs, audio and other text sources, the adoption of
text mining as a related discipline to data mining has also grown significantly. You
need the ability to successfully parse, filter and transform unstructured data in order
to include it in predictive models for improved prediction accuracy.

In the end, you should not look at data mining as a separate, standalone entity
because pre-processing (data preparation, data exploration) and post-processing
(model validation, scoring, model performance monitoring) are equally essential.
Prescriptive modelling looks at internal and external variables and constraints to
recommend one or more courses of action – for example, determining the best
marketing offer to send to each customer. Sample techniques include:

Predictive analytics plus rules Developing if/then rules from patterns and predicting outcomes.

Marketing optimization Simulating the most advantageous media mix in real time for the hig
Data Mart vs. Data
Warehouse
Data mart vs. data warehouse–what is the difference? Discover why the old question of how
to structure the data warehouse is no longer relevant.
A data mart is a subset of a data warehouse oriented to a specific business line. Data marts
contain repositories of summarized data collected for analysis on a specific section or unit
within an organization, for example, the sales department.

A data warehouse is a large centralized repository of data that contains information from
many sources within an organization. The collated data is used to guide business decisions
through analysis, reporting, and data mining tools.

Data Mart and Data Warehouse

Comparison
Data Mart
 Focus: A single subject or functional organization area

 Data Sources: Relatively few sources linked to one line of business

 Size: Less than 100 GB

 Normalization: No preference between a normalized and denormalized structure

 Decision Types: Tactical decisions pertaining to particular business lines and ways of
doing things

 Cost: Typically from $10,000 upwards

 Setup Time: 3-6 months

 Data Held: Typically summarized data

Data Warehouse
 Focus: Enterprise-wide repository of disparate data sources

 Data Sources: Many external and internal sources from different areas of an
organization

 Size: 100 GB minimum but often in the range of terabytes for large organizations

 Normalization: Modern warehouses are mostly denormalized for quicker data

querying and read performance

 Decision Types: Strategic decisions that affect the entire enterprise

 Cost: Varies but often greater than $100,000; for cloud solutions costs can be
dramatically lower as organizations pay per use

 Setup Time: At least a year for on-premise warehouses; cloud data warehouses are
much quicker to set up

 Data Held: Raw data, metadata, and summary data

Inmon vs. Kimball

Two data warehouse pioneers, Bill Inmon and Ralph Kimball differ in their views on how
data warehouses should be designed from the organization's perspective.

Bill Inmon's approach favours a top-down design in which the data warehouse is the
centralized data repository and the most important component of an organization's data
systems.

The Inmon approach first builds the centralized corporate data model, and the data warehouse
is seen as the physical representation of this model. Dimensional data marts related to
specific business lines can be created from the data warehouse when they are needed.

In the Inmon model, data in the data warehouse is integrated, meaning the data warehouse is
the source of the data that ends up in the different data marts. This ensures data integrity and
consistency across the organization.

Ralph Kimball's data warehouse design starts with the most important business processes. In
this approach, an organization creates data marts that aggregate relevant data around subject-
specific areas. The data warehouse is the combination of the organization’s individual data
marts.
With the Kimball approach, the data warehouse is the conglomerate of a number of data
marts. This is in contrast to Inmon's approach, which creates data marts based on information
in the warehouse. As Kimball said in 1997, “the data warehouse is nothing more than the
union of all data marts.”*

* Quoted from Kimball's book, "The Data Warehouse Lifecycle Toolkit".

Data Marts vs. Centralized Data

Warehouse: Use Cases
The following use cases highlight some examples of when to use each approach to data
warehousing.

Data Marts Use Cases

 Marketing analysis and reporting favor a data mart approach because these activities
are typically performed in a specialized business unit, and do not require enterprise-
wide data.

 A financial analyst can use a finance data mart to carry out financial reporting.
Centralized Data Warehouse Use Cases
 A company considering an expansion needs to incorporate data from a variety of data
sources across the organization to come to an informed decision. This requires a data
warehouse that aggregates data from sales, marketing, store management, customer
loyalty, supply chains, etc.

 Many factors drive profitability at an insurance company. An insurance company

reporting on its profits needs a centralized data warehouse to combine information
from its claims department, sales, customer demographics, investments, and other
areas.

Are Data Marts Still Relevant in a

Cloud Architecture?
Organizations that want to make data-driven decisions are faced with a challenge—when
should they use data marts versus data warehouses to analyze and report on the data they
collect?

Data marts can guide tactical decisions at a departmental level while data warehouses guide
high-level strategic business decisions by providing a consolidated view of all organizational
data.

There are two approaches to this challenge that reflect the classic Bill Inmon versus Ralph
Kimball debate:

 The first approach, based on Bill Inmon's opinion, is to build the data warehouse as
the centralized repository of all enterprise data, from which data marts can be created
later on to serve particular departmental needs.

 The second approach, in line with Ralph Kimball's thoughts, is to initially create
separate data marts that hold aggregate data on the most important businesses
processes, before merging these data marts as a data warehouse later on.

Data warehouses provide a convenient, single repository for all enterprise data, but the cost of
implementing such a system on-site is much greater than building data marts. On-premise
data warehouse systems also take a significant length of time to build.
However, cloud-based data warehouse services have made data warehouses much easier and
quicker to set up, and cheaper to run, which negates the need for a “start small” approach that
recommends starting with data marts and merging them later on into a data warehouse.

Since cloud-based data warehouse services are cost-effective, scalable, and extremely
accessible, organizations of all sizes can leverage cloud infrastructure and build a centralized
data warehouse first.

Learn More about Data

Warehouses
 Data Warehouse Architecture: Traditional vs. Cloud

 Data Warehouse Concepts: Traditional vs. Cloud

 Database vs. Data Warehouse

 Amazon Redshift Architecture

DATA Mining
No ratings yet
DATA Mining
21 pages
Data
No ratings yet
Data
9 pages
Predictive Modeling
No ratings yet
Predictive Modeling
1 page
Fast Food Data Warehouse Case Study
No ratings yet
Fast Food Data Warehouse Case Study
5 pages
Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets
No ratings yet
Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets
89 pages
Data Analytics
No ratings yet
Data Analytics
12 pages
Data Mining For Customer Segmentation
No ratings yet
Data Mining For Customer Segmentation
13 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
6 pages
Data Mining
100% (2)
Data Mining
36 pages
Understanding Data Mining
No ratings yet
Understanding Data Mining
21 pages
Predictive Modeling
No ratings yet
Predictive Modeling
8 pages
Data Mining
No ratings yet
Data Mining
27 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
2 pages
Data Mining
100% (1)
Data Mining
53 pages
Data Mining in Business Intelligence
No ratings yet
Data Mining in Business Intelligence
14 pages
Recommender Systems Overview
No ratings yet
Recommender Systems Overview
16 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
Data Analytics
75% (4)
Data Analytics
45 pages
Notes DATA MINING MBA III
No ratings yet
Notes DATA MINING MBA III
8 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
91 pages
Data Mining: Business Intelligence
No ratings yet
Data Mining: Business Intelligence
68 pages
Amity School of Engineering and Technology: Submitted To
No ratings yet
Amity School of Engineering and Technology: Submitted To
28 pages
Big Data Analytics Use Cases
No ratings yet
Big Data Analytics Use Cases
24 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
35 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
25 pages
Text Analytics and Text Mining Overview
No ratings yet
Text Analytics and Text Mining Overview
16 pages
Telecommunication Customer Churn (New)
100% (1)
Telecommunication Customer Churn (New)
23 pages
Customer Churn Prediction Using Big Data Analytics
50% (2)
Customer Churn Prediction Using Big Data Analytics
41 pages
Conjoint Analysis
No ratings yet
Conjoint Analysis
14 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Diabetes Prediction Using Data Mining
No ratings yet
Diabetes Prediction Using Data Mining
17 pages
Data Mining Model Performance of Sales Predictive Algorithms Based On Rapidminer Workflows
No ratings yet
Data Mining Model Performance of Sales Predictive Algorithms Based On Rapidminer Workflows
18 pages
E-Commerce Recommendation System With Reverse Image Search
No ratings yet
E-Commerce Recommendation System With Reverse Image Search
47 pages
Data Mining
No ratings yet
Data Mining
14 pages
Data Management Notes
No ratings yet
Data Management Notes
3 pages
Text Mining Techniques Overview
100% (1)
Text Mining Techniques Overview
4 pages
Project 4 - Predictive Modeling - Telecom Customer Churn Prediction PDF
No ratings yet
Project 4 - Predictive Modeling - Telecom Customer Churn Prediction PDF
22 pages
Introduction To Data Science
100% (1)
Introduction To Data Science
200 pages
BigMart Sale Prediction Using Machine Learning
No ratings yet
BigMart Sale Prediction Using Machine Learning
2 pages
A Exercises Solutions
No ratings yet
A Exercises Solutions
13 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Lecture - 04 - Data Understanding and Preparation
No ratings yet
Lecture - 04 - Data Understanding and Preparation
59 pages
Data Mining Course Syllabus
No ratings yet
Data Mining Course Syllabus
8 pages
Online Recommendation System
No ratings yet
Online Recommendation System
42 pages
Google Data Analytics Week 2
100% (1)
Google Data Analytics Week 2
7 pages
Multimedia & Web Data Mining Guide
100% (2)
Multimedia & Web Data Mining Guide
13 pages
Intro To BI
No ratings yet
Intro To BI
28 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
Data Cleaning: A Brief Guide To
100% (2)
Data Cleaning: A Brief Guide To
15 pages
Data Analysis Methods & Tools
100% (1)
Data Analysis Methods & Tools
19 pages
Chapter 5 - Data Exploration and Visualization With
No ratings yet
Chapter 5 - Data Exploration and Visualization With
39 pages
Data Mining Seminar
100% (2)
Data Mining Seminar
21 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
Data Science for Business Leaders
No ratings yet
Data Science for Business Leaders
9 pages
Big Data Unit
100% (1)
Big Data Unit
16 pages
Github Data Science Projects
No ratings yet
Github Data Science Projects
16 pages
Data Mining Process, Techniques, Tools & Examples
No ratings yet
Data Mining Process, Techniques, Tools & Examples
11 pages
DM Sem U-1
No ratings yet
DM Sem U-1
50 pages
Informatica Transformations
No ratings yet
Informatica Transformations
6 pages
250+ TOP MCQs On Software Maintenance and Answers 2023
No ratings yet
250+ TOP MCQs On Software Maintenance and Answers 2023
6 pages
Yumeng Bu 2456361 202208160218 Resume
No ratings yet
Yumeng Bu 2456361 202208160218 Resume
2 pages
Non-Prime Attribute: Attributes Called Attributes Attribute Not Any Candidate Key Is Called Attribute
No ratings yet
Non-Prime Attribute: Attributes Called Attributes Attribute Not Any Candidate Key Is Called Attribute
5 pages
Cognizant: Process Associate
No ratings yet
Cognizant: Process Associate
3 pages
Chapter 5 Database
No ratings yet
Chapter 5 Database
47 pages
Best Practices - QlikView Metadata
No ratings yet
Best Practices - QlikView Metadata
11 pages
Sharda Bia10e Tif 01
No ratings yet
Sharda Bia10e Tif 01
11 pages
Shriram Dahotre DWBI Resume
No ratings yet
Shriram Dahotre DWBI Resume
3 pages
Types of Demand
No ratings yet
Types of Demand
31 pages
Chapter 1 Database Systems
No ratings yet
Chapter 1 Database Systems
54 pages
Software Design Document
No ratings yet
Software Design Document
3 pages
03 Sap Ha100
No ratings yet
03 Sap Ha100
40 pages
Music Database
No ratings yet
Music Database
5 pages
A Roadmap To R12 - PPT
No ratings yet
A Roadmap To R12 - PPT
23 pages
Upgrade Database From 11.2.0.1 To 11.2.0.3 (IDM DEV Database)
No ratings yet
Upgrade Database From 11.2.0.1 To 11.2.0.3 (IDM DEV Database)
20 pages
DWM (Questions Asked in MSBTE)
No ratings yet
DWM (Questions Asked in MSBTE)
2 pages
Software Development Life Cycle
No ratings yet
Software Development Life Cycle
6 pages
Structured Analysis for Analysts
No ratings yet
Structured Analysis for Analysts
19 pages
Emea Ros Pos Program Presentation Kit
No ratings yet
Emea Ros Pos Program Presentation Kit
19 pages
CA - 502 Software Engineering (New) : P. Pages: 2 Time: Three Hours Max. Marks: 75 1. Attempt Any Three of The Following
No ratings yet
CA - 502 Software Engineering (New) : P. Pages: 2 Time: Three Hours Max. Marks: 75 1. Attempt Any Three of The Following
2 pages
Database Architectures and The Web: Pearson Education © 2009
No ratings yet
Database Architectures and The Web: Pearson Education © 2009
26 pages
Data Mesh for Tech Leaders
No ratings yet
Data Mesh for Tech Leaders
4 pages
BBA Students' DBMS Lab Report
No ratings yet
BBA Students' DBMS Lab Report
15 pages
52 MX Erd
No ratings yet
52 MX Erd
123 pages
Computer Science Apprenticeship Bigdata Assignement3
No ratings yet
Computer Science Apprenticeship Bigdata Assignement3
3 pages
Modern Big Data Analysis
100% (1)
Modern Big Data Analysis
35 pages
Database Management System
No ratings yet
Database Management System
16 pages
Mysql Tutorial: What Is Dbms
No ratings yet
Mysql Tutorial: What Is Dbms
11 pages
Sample Exam Istqb Foundation Level 2011 Syllabus: International Software Testing Qualifications Board
No ratings yet
Sample Exam Istqb Foundation Level 2011 Syllabus: International Software Testing Qualifications Board
27 pages

Data Mining

Uploaded by

Data Mining

Uploaded by

Data Mining Tutorial: Process,

Techniques, Tools, EXAMPLES

It is a multi-disciplinary skill that uses machine learning, statistics, AI and

Data mining is also called as Knowledge discovery, Knowledge extraction,

In this tutorial, you will learn-

 What is Data Mining?

Data Mining Implementation Process

Let's study the Data Mining implementation process in detail

 First, you need to understand business and client objectives. You

 First, data is collected from multiple data sources available in the

The data from different sources should be selected, cleaned, transformed,

For example, for a customer demographics profile, age data is missing.

Data transformation operations change the data to make it useful in data

Smoothing: It helps to remove noise from the data.

Aggregation: Summary or aggregation operations are applied to the data.

Normalization: Normalization performed when the attribute data are

Attribute construction: these attributes are constructed and included the

 Based on the business objectives, suitable modeling techniques

 Results generated by the data mining model should be evaluated

 The knowledge or information discovered during data mining process

Data Mining Techniques

Challenges of Implementation of Data mine:

Data mining Examples:

Consider a marketing head of telecom service provides who wants to

Bank has multiple years of record on average credit card balances,

Data Mining Tools

R language is an open source tool for statistical computing and graphics. R

Learn more here

Oracle Data Mining:

Oracle Data Mining popularly knowns as ODM is a module of the Oracle

Learn more here

Benefits of Data Mining:

Disadvantages of Data Mining

Data Mining Applications

Descriptive Modeling: It uncovers shared similarities or groupings in historical data

Clustering Grouping similar records together.

Anomaly detection Identifying multidimensional outliers.

Association rule learning Detecting relationships between records.

Principal component analysis Detecting relationships between variables.

Data Mart and Data Warehouse

 Data Sources: Relatively few sources linked to one line of business

 Size: Less than 100 GB

 Normalization: No preference between a normalized and denormalized structure

 Cost: Typically from $10,000 upwards

 Setup Time: 3-6 months

 Data Held: Typically summarized data

 Normalization: Modern warehouses are mostly denormalized for quicker data

 Decision Types: Strategic decisions that affect the entire enterprise

 Data Held: Raw data, metadata, and summary data

Inmon vs. Kimball

* Quoted from Kimball's book, "The Data Warehouse Lifecycle Toolkit".

Data Marts vs. Centralized Data

Data Marts Use Cases

 Many factors drive profitability at an insurance company. An insurance company

Are Data Marts Still Relevant in a

Learn More about Data

 Data Warehouse Concepts: Traditional vs. Cloud

 Database vs. Data Warehouse

 Amazon Redshift Architecture

You might also like