0% found this document useful (0 votes)
15 views7 pages

Build The Models

Uploaded by

Cyrilla Salem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views7 pages

Build The Models

Uploaded by

Cyrilla Salem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Build the Models

To build the model, data should be clean and understand the content
properly.

The components of model building are as follows:

a) Selection of model and variable

b) Execution of model

c) Model diagnostic and model comparison

Model and Variable Selection


Consider model performance and whether project meets all the requirements

1. Must the model be easy to implement


2. The model need to be easy to explain
3. Check difficulty

Commercial tools
1. SAS enterprise miner
2. SPSS modeller
3. Matlab
4. Alpine miner

Open Source tools:

1. R and PL/R

2. Octave

3. WEKA: WEKA can be executed within Java code.

4. Python

5. SQL in-dat
Model Execution

Various programming language is used for implementing the model

For model execution, Python provides libraries like StatsModels or Scikit-


learn. These packages use several of the most popular techniques.

Model Diagnostics and Model Comparison

Try to build multiple model and then select best one based on multiple criteria.
Working with a holdout sample helps user pick the best-performing model.

• In Holdout Method, the data is split into two different datasets labeled as a
training and a testing dataset. This can be a 60/40 or 70/30 or 80/20 split. This
technique is called the hold-out validation technique.

PRESENTING FINDINGS AND BUILDING APPLICATIONS

• The team delivers final reports, briefings, code and technical documents.

 In addition, team may run a pilot project to implement the models in a


production environment.

• The last stage of the data science process is where user soft skills will be
most useful.

• Presenting your results to the stakeholders and industrializing your analysis


process for repetitive reuse and integration with other tools.

Data Mining

• Data mining refers to extracting or mining knowledge from large amounts of


data.

It is a process of discovering interesting patterns or Knowledge from a large


amount of data stored either in databases, data warehouses or other information
repositories.
Reasons for using data mining:

1. Knowledge discovery: To identify the invisible correlation, patterns in the


database.

2. Data visualization: To find sensible way of displaying data.

3. Data correction: To identify and correct incomplete and inconsistent data.

Functions of Data Mining

• Different functions of data mining are characterization, association and


correlation analysis, classification, prediction, clustering analysis and
evolution analysis.

1. Characterization is a summarization of the general characteristics or


features of a target class of data. For example, the characteristics of students
can be produced, generating a profile of all the University in first year
engineering students.

2. Association is the discovery of association rules showing attribute-value


conditions that occur frequently together in a given set of data.

3. Classification differs from prediction. Classification constructs a set of


models that describe and distinguish data classes and

Prediction builds a model to predict some missing data values.

4. Clustering can also support taxonomy formation. The organization of


observations into a hierarchy of classes that group similar events together.

5. Data evolution analysis describes and models' regularities for objects whose
behaviour changes over time.

Data mining tasks can be classified into two categories: descriptive and
predictive.
Predictive Mining Tasks
To make prediction, predictive mining tasks performs inference on the current
data.

Predictive analysis provides answers of the future queries

Descriptive Mining Task


To provide a depiction or "summary view" of facts and figures in an
understandable format, to either inform or prepare data for further analysis.

Architecture of a Typical Data Mining System

Data warehouse server based on the user's data request, data warehouse server is
responsible for fetching the relevant data.

• Knowledge base is helpful in the whole data mining process. It might be


useful for guiding the search or evaluating the interestingness of the result
patterns. The knowledge base might even contain user beliefs and data from
user experiences that can be useful in the process of data mining.

• The data mining engine is the core component of any data mining system. It
consists of a number of modules for performing data mining tasks including
association, classification, characterization, clustering, prediction, time-series
analysis etc.

• The pattern evaluation module is mainly responsible for the measure of


interestingness of the pattern by using a threshold value. It interacts with the
data mining engine to focus the search towards interesting patterns.

• The graphical user interface module communicates between the user and the
data mining system. This module helps the user use the system easily and
efficiently without knowing the real complexity behind the process.

• When the user specifies a query or a task, this module interacts with the data
mining system and displays the result in an easily understandable manner.

Data Warehousing

Data warehousing is the process of constructing and using a data warehouse

A data warehouse is constructed by integrating data from multiple


heterogeneous sources that support analytical reporting, structured and/or ad
hoc queries and decision making.
Data warehousing involves data cleaning, data integration and data
consolidations.

A data warehouse usually stores many months or years of data to support


historical analysis. The data in a data warehouse is typically loaded through an
extraction, transformation and loading (ETL) process from multiple data
sources.

• Databases and data warehouses are related but not the same.

• A database is a way to record and access information from a single source

• A data warehouse is a way to store historical information from multiple


sources to allow you to analyse and report on related data

Goals of data warehousing:

1. To help reporting as well as analysis.

2. Maintain the organization's historical information.

3. Be the foundation for decision making.

Characteristics of Data Warehouse

1. Subject oriented Data are organized


2. Integrated
3. Non-volatile
4. Time variant

Multitier Architecture of Data Warehouse

a) Single-tier architecture.

b) Two-tier architecture.

c) Three-tier architecture (Multi-tier architecture).

• Single tier warehouse architecture focuses on creating a compact data set and
minimizing the amount of data stored
• Two-tier warehouse structures separate the resources physically available from
the warehouse itself. This is most commonly used in small organizations

• The bottom tier is the database of the warehouse, where the cleansed and
transformed data is loaded. The bottom tier is a warehouse database server.

• The middle tier is the application layer giving an abstracted view of the
database. It arranges the data to make it more suitable for analysis. This is
done with an OLAP server, implemented using the ROLAP or MOLAP model.

• The top tier is the front-end of an organization's overall business intelligence


suite. The top-tier is where the user accesses and interacts with data via
queries, data visualizations and data analytics tools.

You might also like