0% found this document useful (0 votes)

15 views7 pages

Build The Models

Uploaded by

Cyrilla Salem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views7 pages

Build The Models

Uploaded by

Cyrilla Salem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Build the Models

To build the model, data should be clean and understand the content
properly.

The components of model building are as follows:

a) Selection of model and variable

b) Execution of model

c) Model diagnostic and model comparison

Model and Variable Selection

Consider model performance and whether project meets all the requirements

1. Must the model be easy to implement

2. The model need to be easy to explain
3. Check difficulty

Commercial tools
1. SAS enterprise miner
2. SPSS modeller
3. Matlab
4. Alpine miner

Open Source tools:

1. R and PL/R

2. Octave

3. WEKA: WEKA can be executed within Java code.

4. Python

5. SQL in-dat
Model Execution

Various programming language is used for implementing the model

For model execution, Python provides libraries like StatsModels or Scikit-

learn. These packages use several of the most popular techniques.

Model Diagnostics and Model Comparison

Try to build multiple model and then select best one based on multiple criteria.
Working with a holdout sample helps user pick the best-performing model.

• In Holdout Method, the data is split into two different datasets labeled as a
training and a testing dataset. This can be a 60/40 or 70/30 or 80/20 split. This
technique is called the hold-out validation technique.

PRESENTING FINDINGS AND BUILDING APPLICATIONS

• The team delivers final reports, briefings, code and technical documents.

 In addition, team may run a pilot project to implement the models in a

production environment.

• The last stage of the data science process is where user soft skills will be
most useful.

• Presenting your results to the stakeholders and industrializing your analysis

process for repetitive reuse and integration with other tools.

Data Mining

• Data mining refers to extracting or mining knowledge from large amounts of

data.

It is a process of discovering interesting patterns or Knowledge from a large

amount of data stored either in databases, data warehouses or other information
repositories.
Reasons for using data mining:

1. Knowledge discovery: To identify the invisible correlation, patterns in the

database.

2. Data visualization: To find sensible way of displaying data.

3. Data correction: To identify and correct incomplete and inconsistent data.

Functions of Data Mining

• Different functions of data mining are characterization, association and

correlation analysis, classification, prediction, clustering analysis and
evolution analysis.

1. Characterization is a summarization of the general characteristics or

features of a target class of data. For example, the characteristics of students
can be produced, generating a profile of all the University in first year
engineering students.

2. Association is the discovery of association rules showing attribute-value

conditions that occur frequently together in a given set of data.

3. Classification differs from prediction. Classification constructs a set of

models that describe and distinguish data classes and

Prediction builds a model to predict some missing data values.

4. Clustering can also support taxonomy formation. The organization of

observations into a hierarchy of classes that group similar events together.

5. Data evolution analysis describes and models' regularities for objects whose
behaviour changes over time.

Data mining tasks can be classified into two categories: descriptive and
predictive.
Predictive Mining Tasks
To make prediction, predictive mining tasks performs inference on the current
data.

Predictive analysis provides answers of the future queries

Descriptive Mining Task

To provide a depiction or "summary view" of facts and figures in an
understandable format, to either inform or prepare data for further analysis.

Architecture of a Typical Data Mining System

Data warehouse server based on the user's data request, data warehouse server is
responsible for fetching the relevant data.

• Knowledge base is helpful in the whole data mining process. It might be

useful for guiding the search or evaluating the interestingness of the result
patterns. The knowledge base might even contain user beliefs and data from
user experiences that can be useful in the process of data mining.

• The data mining engine is the core component of any data mining system. It
consists of a number of modules for performing data mining tasks including
association, classification, characterization, clustering, prediction, time-series
analysis etc.

• The pattern evaluation module is mainly responsible for the measure of

interestingness of the pattern by using a threshold value. It interacts with the
data mining engine to focus the search towards interesting patterns.

• The graphical user interface module communicates between the user and the
data mining system. This module helps the user use the system easily and
efficiently without knowing the real complexity behind the process.

• When the user specifies a query or a task, this module interacts with the data
mining system and displays the result in an easily understandable manner.

Data Warehousing

Data warehousing is the process of constructing and using a data warehouse

A data warehouse is constructed by integrating data from multiple

heterogeneous sources that support analytical reporting, structured and/or ad
hoc queries and decision making.
Data warehousing involves data cleaning, data integration and data
consolidations.

A data warehouse usually stores many months or years of data to support

historical analysis. The data in a data warehouse is typically loaded through an
extraction, transformation and loading (ETL) process from multiple data
sources.

• Databases and data warehouses are related but not the same.

• A database is a way to record and access information from a single source

• A data warehouse is a way to store historical information from multiple

sources to allow you to analyse and report on related data

Goals of data warehousing:

1. To help reporting as well as analysis.

2. Maintain the organization's historical information.

3. Be the foundation for decision making.

Characteristics of Data Warehouse

1. Subject oriented Data are organized

2. Integrated
3. Non-volatile
4. Time variant

Multitier Architecture of Data Warehouse

a) Single-tier architecture.

b) Two-tier architecture.

c) Three-tier architecture (Multi-tier architecture).

• Single tier warehouse architecture focuses on creating a compact data set and
minimizing the amount of data stored
• Two-tier warehouse structures separate the resources physically available from
the warehouse itself. This is most commonly used in small organizations

• The bottom tier is the database of the warehouse, where the cleansed and
transformed data is loaded. The bottom tier is a warehouse database server.

• The middle tier is the application layer giving an abstracted view of the
database. It arranges the data to make it more suitable for analysis. This is
done with an OLAP server, implemented using the ROLAP or MOLAP model.

• The top tier is the front-end of an organization's overall business intelligence

suite. The top-tier is where the user accesses and interacts with data via
queries, data visualizations and data analytics tools.

DSDM Notes
No ratings yet
DSDM Notes
114 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
36 pages
Data Warehousing&Dat Mining
No ratings yet
Data Warehousing&Dat Mining
12 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
DW and DM Notes
No ratings yet
DW and DM Notes
89 pages
Current Trends
No ratings yet
Current Trends
35 pages
Unit-2 Finalized
No ratings yet
Unit-2 Finalized
12 pages
Ware House Server
No ratings yet
Ware House Server
89 pages
Database Tech Evolution for Analysts
No ratings yet
Database Tech Evolution for Analysts
59 pages
DATA MINING-Knowledge Discovery in Databases
No ratings yet
DATA MINING-Knowledge Discovery in Databases
6 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Module 4
No ratings yet
Module 4
54 pages
Unit-2 Introduction To Data Mining
100% (1)
Unit-2 Introduction To Data Mining
11 pages
Data Mining AND Warehousing: Abstract
No ratings yet
Data Mining AND Warehousing: Abstract
12 pages
DWH Unit 3
No ratings yet
DWH Unit 3
7 pages
Data Mining
No ratings yet
Data Mining
26 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
11 pages
DWM Notes Class by Proff
No ratings yet
DWM Notes Class by Proff
88 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
87 pages
Internship
No ratings yet
Internship
12 pages
Data Warehouse and Mining Guide
No ratings yet
Data Warehouse and Mining Guide
4 pages
Data Mining: Discovering Hidden Value in Your Data Warehouse
No ratings yet
Data Mining: Discovering Hidden Value in Your Data Warehouse
6 pages
Data Warehousing & Mining Overview
No ratings yet
Data Warehousing & Mining Overview
55 pages
Data Mining Notes
No ratings yet
Data Mining Notes
21 pages
Data Warehosing and Data Mining
No ratings yet
Data Warehosing and Data Mining
15 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
ModelQB - Part B&C-1
No ratings yet
ModelQB - Part B&C-1
51 pages
Dataminig ch1 30006
No ratings yet
Dataminig ch1 30006
4 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
18 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
LECTURE NOTES ON DATA MINING and DATA WA
No ratings yet
LECTURE NOTES ON DATA MINING and DATA WA
84 pages
Data Mining Notes
No ratings yet
Data Mining Notes
297 pages
Unit-1 PPT
No ratings yet
Unit-1 PPT
21 pages
Es 2646574663
No ratings yet
Es 2646574663
7 pages
Data Mining Chapter 1 Introduction
No ratings yet
Data Mining Chapter 1 Introduction
39 pages
Unit 3
No ratings yet
Unit 3
34 pages
Data Mining Basics
No ratings yet
Data Mining Basics
20 pages
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
No ratings yet
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
52 pages
Unit III
No ratings yet
Unit III
101 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
29 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
Data Mininng
No ratings yet
Data Mininng
11 pages
Unit 1 DM
No ratings yet
Unit 1 DM
62 pages
Data Mining & Warehousing Basics
No ratings yet
Data Mining & Warehousing Basics
14 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Module 1
No ratings yet
Module 1
41 pages
DM Module1 Notes
No ratings yet
DM Module1 Notes
25 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
(Ebook PDF) Data Mining Concepts and Techniques 3Rd Download
No ratings yet
(Ebook PDF) Data Mining Concepts and Techniques 3Rd Download
53 pages
DM-Unit V
No ratings yet
DM-Unit V
47 pages
Data Mining
No ratings yet
Data Mining
44 pages
1 Intro
No ratings yet
1 Intro
33 pages
Data Mining for Business Growth
No ratings yet
Data Mining for Business Growth
7 pages
Dwdm-Unit-1 R16
No ratings yet
Dwdm-Unit-1 R16
17 pages
DM Mod 1
No ratings yet
DM Mod 1
17 pages
Lecture 3
No ratings yet
Lecture 3
10 pages
Dmdw-Unit-1 R16
No ratings yet
Dmdw-Unit-1 R16
17 pages
Paper 4
No ratings yet
Paper 4
12 pages
Paper 1
No ratings yet
Paper 1
7 pages
Stock Verfication Report-Updated
No ratings yet
Stock Verfication Report-Updated
5 pages
Algorithms
No ratings yet
Algorithms
61 pages
Vidhaigal TNPSCGroup I&IV Schedule
No ratings yet
Vidhaigal TNPSCGroup I&IV Schedule
4 pages
TNPSC GR - II - 01-13 Ans Key (Tamil)
No ratings yet
TNPSC GR - II - 01-13 Ans Key (Tamil)
378 pages
AWS Athena Knowledgebase
No ratings yet
AWS Athena Knowledgebase
4 pages
PFD5
No ratings yet
PFD5
2 pages
Mirroring With Replication
No ratings yet
Mirroring With Replication
3 pages
DMDW Lab Oral Question Bank
No ratings yet
DMDW Lab Oral Question Bank
4 pages
Big Data Analytics
No ratings yet
Big Data Analytics
8 pages
Short Answer Type Question
No ratings yet
Short Answer Type Question
3 pages
Awrrpt 1 475 482
No ratings yet
Awrrpt 1 475 482
90 pages
Chapter 2 Introduction To Data Science
No ratings yet
Chapter 2 Introduction To Data Science
47 pages
Oracle BI Repository Guide
No ratings yet
Oracle BI Repository Guide
56 pages
JSP
No ratings yet
JSP
11 pages
DBMS Syllabus
No ratings yet
DBMS Syllabus
1 page
BIM302 20232024 Project
No ratings yet
BIM302 20232024 Project
2 pages
Yitno Pambudi Tugas 1.: Query 1
No ratings yet
Yitno Pambudi Tugas 1.: Query 1
3 pages
Amazon Data Warehouse
No ratings yet
Amazon Data Warehouse
21 pages
Scripting Guide ServiceNow
63% (8)
Scripting Guide ServiceNow
140 pages
Table Scanner for SAP Data Analysis
No ratings yet
Table Scanner for SAP Data Analysis
29 pages
Cloud Virtual Final
No ratings yet
Cloud Virtual Final
88 pages
Databases Sde Sheet
No ratings yet
Databases Sde Sheet
34 pages
Fusion SQL Query
No ratings yet
Fusion SQL Query
3 pages
MySql & PhpMyAdmin - Chapter 6
No ratings yet
MySql & PhpMyAdmin - Chapter 6
83 pages
MS Access Notes and Question
No ratings yet
MS Access Notes and Question
3 pages
SAP DTW: Importing BP Addresses
No ratings yet
SAP DTW: Importing BP Addresses
3 pages
Oya2021 - Hiding The Access Pattern Is Not Enough - Exploiting Search Pattern Leakage in Searchable Encryption
No ratings yet
Oya2021 - Hiding The Access Pattern Is Not Enough - Exploiting Search Pattern Leakage in Searchable Encryption
16 pages
SAP Bi 7.3
No ratings yet
SAP Bi 7.3
95 pages
Distributed File Systems Overview
No ratings yet
Distributed File Systems Overview
49 pages
Commit, Rollback and Savepoint SQL Commands: Command
No ratings yet
Commit, Rollback and Savepoint SQL Commands: Command
6 pages
Notes - PPS Unit 6
No ratings yet
Notes - PPS Unit 6
24 pages
Day 10
No ratings yet
Day 10
6 pages
Dbms 5th Unit
No ratings yet
Dbms 5th Unit
30 pages

Build The Models

Uploaded by

Build The Models

Uploaded by

Build the Models

The components of model building are as follows:

a) Selection of model and variable

c) Model diagnostic and model comparison

Model and Variable Selection

1. Must the model be easy to implement

Open Source tools:

3. WEKA: WEKA can be executed within Java code.

Various programming language is used for implementing the model

For model execution, Python provides libraries like StatsModels or Scikit-

Model Diagnostics and Model Comparison

PRESENTING FINDINGS AND BUILDING APPLICATIONS

 In addition, team may run a pilot project to implement the models in a

• Presenting your results to the stakeholders and industrializing your analysis

• Data mining refers to extracting or mining knowledge from large amounts of

It is a process of discovering interesting patterns or Knowledge from a large

1. Knowledge discovery: To identify the invisible correlation, patterns in the

2. Data visualization: To find sensible way of displaying data.

3. Data correction: To identify and correct incomplete and inconsistent data.

Functions of Data Mining

• Different functions of data mining are characterization, association and

1. Characterization is a summarization of the general characteristics or

2. Association is the discovery of association rules showing attribute-value

3. Classification differs from prediction. Classification constructs a set of

Prediction builds a model to predict some missing data values.

4. Clustering can also support taxonomy formation. The organization of

Predictive analysis provides answers of the future queries

Descriptive Mining Task

Architecture of a Typical Data Mining System

• Knowledge base is helpful in the whole data mining process. It might be

• The pattern evaluation module is mainly responsible for the measure of

Data warehousing is the process of constructing and using a data warehouse

A data warehouse is constructed by integrating data from multiple

A data warehouse usually stores many months or years of data to support

• A database is a way to record and access information from a single source

• A data warehouse is a way to store historical information from multiple

Goals of data warehousing:

1. To help reporting as well as analysis.

2. Maintain the organization's historical information.

3. Be the foundation for decision making.

Characteristics of Data Warehouse

1. Subject oriented Data are organized

Multitier Architecture of Data Warehouse

c) Three-tier architecture (Multi-tier architecture).

• The top tier is the front-end of an organization's overall business intelligence

You might also like