0% found this document useful (0 votes)
29 views14 pages

Introduction To Predictive Analytics: UNIT-1

Uploaded by

priyaachinni301
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views14 pages

Introduction To Predictive Analytics: UNIT-1

Uploaded by

priyaachinni301
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

INTRODUCTION TO PREDICTIVE ANALYTICS

UNIT-1
Introduction: Overview of Predictive Analytics, Setting Up the Problem- Predictive Analytics
Processing Steps: CRISP-DM, Defining Data for Predictive Modeling, Defining the Target
Variable, Defining Measures of Success for Predictive Models, Doing Predictive Modeling Out of
Order

1.1 Introduction to Predictive Analytics

Predictive Analytics is a branch of advanced analytics that uses historical data, statistical
algorithms, and machine learning techniques to identify the likelihood of future outcomes
based on past data.

Key Objectives

 Forecast future trends or behaviors


 Inform decision-making
 Optimize operations
 Identify risks and opportunities

Examples

Domain Application

Finance Credit scoring, fraud detection

Marketing Customer churn prediction, targeted advertising

Healthcare Disease outbreak prediction, patient readmission risk

Supply Chain Demand forecasting, inventory optimization


Core Components

1. Data Collection – Acquiring structured or unstructured historical data


2. Data Preparation – Cleaning, transforming, and feature engineering
3. Model Selection – Choosing appropriate algorithms (e.g., regression, decision trees,
neural networks)
4. Model Training – Fitting the model to historical data
5. Evaluation – Measuring performance using metrics like RMSE, accuracy, AUC
6. Deployment – Integrating the model into decision-making systems
7. Monitoring – Tracking model performance over time

1.2 Setting Up the Predictive Analytics Problem

1. Define the Objective

Clarify the business or research goal:

"What are we trying to predict?"

Example: Predict customer churn within 3 months.

2. Identify the Target Variable (Label)

This is the output the model will predict.

 Continuous → Regression (e.g., sales forecast)


 Categorical → Classification (e.g., churn yes/no)

3. Gather and Explore the Data

 Collect relevant historical data


 Use exploratory data analysis (EDA) to detect:
o Trends and patterns
o Outliers or anomalies
o Missing values

4. Feature Selection and Engineering

 Select or create input variables (features) that best explain or relate to the target.
 Techniques include:
o One-hot encoding
o Binning
o Normalization
o Interaction terms

5. Choose a Modeling Approach

Match the problem type with an appropriate model:

Problem Type Example Models

Regression Linear Regression, Random Forest Regressor

Classification Logistic Regression, SVM, Decision Trees

Time Series ARIMA, Prophet, LSTM

6. Define Success Criteria

Establish evaluation metrics to assess performance:

 Classification: Accuracy, Precision, Recall, F1-score, ROC-AUC


 Regression: RMSE, MAE, R²
 Time Series: MAPE, RMSE, SMAPE
7. Validation Strategy

Split data to test generalizability:

 Train/Test Split
 K-Fold Cross-Validation
 Time-based Validation (for time series)

Summary

Topic Key Point

Predictive Analytics Uses data to predict future outcomes

Main Steps Data → Features → Model → Evaluate → Deploy

Problem Setup Define goal → Identify target → Choose features/model → Evaluate

Metrics Vary by task (accuracy for classification, RMSE for regression)


1. 3 Predictive Analytics Processing Steps

Flowchart

[Start]

[Define the Problem]

[Collect Data]

[Prepare Data (Cleaning & Preprocessing)]

[Exploratory Data Analysis (EDA)]

[Feature Selection & Engineering]

[Split the Data (Train/Test/Validation)]

[Select Model]

[Train the Model]

[Evaluate the Model]

If performance is poor → Return to Step Feature Selection & Engineering


[Deploy the Model]

[Monitor & Maintain the Model]

[End / Repeat if Needed]
1. Define the Problem

Goal: Translate a business need into a predictive task

 What do we want to predict?


 How will the prediction be used?

Examples:

 Predict loan defaults (classification)


 Forecast monthly sales (regression)

Next: You need data that relates to the outcome you're predicting.

2. Collect Data

Goal: Gather historical data containing relevant features and the target variable

Sources may include:

 Databases (SQL, NoSQL)


 APIs
 Spreadsheets
 Web scraping

Important: Ensure data quality, coverage, and relevance.

Next: Raw data must be cleaned and prepared for analysis.

3. Prepare Data (Preprocessing)

Goal: Clean and format the data for modeling


Tasks include:
 Handle missing values
 Remove duplicates
 Correct inconsistent formats
 Convert dates to standard formats

Outcome: Structured, clean dataset ready for exploration.

Next: Explore patterns, trends, and relationships in the data.

4. Exploratory Data Analysis (EDA)

Goal: Understand the data’s structure, trends, and relationships


Techniques:

 Summary statistics (mean, std, median)


 Correlation analysis
 Visualizations: histograms, boxplots, scatterplots

Use EDA to:

 Spot outliers
 Detect multicollinearity
 Guide feature engineering

Next: Choose/create variables to train your model.

5. Feature Selection & Engineering

Goal: Build powerful input variables for the model


Techniques include:

 Feature selection (filter, wrapper, embedded)


 Encoding (One-Hot, Label Encoding)
 Normalization/standardization
 Creating interaction or aggregated features

Outcome: Optimized feature set that maximizes model accuracy.

Next: Split the dataset to train and test the model properly.

6. Split the Data

Goal: Divide data to avoid overfitting and estimate generalization


Common splits:

 Training set: to train the model


 Validation set: to tune hyperparameters
 Test set: to evaluate final model

Alternative: K-Fold Cross-Validation for smaller datasets

Next: Choose a model suited to your problem and data.

7. Select Model

Goal: Choose the best algorithm for the task


Examples:

 Classification: Logistic Regression, Decision Tree, SVM


 Regression: Linear Regression, Random Forest Regressor
 Time Series: ARIMA, Prophet, LSTM

Next: Train the model on the training data.


8. Train the Model

Goal: Fit the selected model to the training data


May involve:

 Parameter tuning
 Feature scaling
 Handling imbalance (e.g., SMOTE)

Result: A trained model ready for evaluation.

Next: Evaluate performance using test/validation data.

9. Evaluate the Model

Goal: Measure how well the model performs


Metrics vary by task:

Type Metrics

Classification Accuracy, F1-score, ROC-AUC

Regression RMSE, MAE, R²

Time Series MAPE, SMAPE

Outcome: Decide if the model meets the required performance.

If good: Move to deployment


If bad: Reiterate steps 5–8 (feature engineering, model tuning)
10. Deploy the Model

Goal: Use the model in real-world applications


Options include:

 API integration
 Dashboard interfaces
 Embedded in software systems

Result: The model starts making live or batch predictions.

Next: Monitor performance in production.

11. Monitor & Maintain the Model

Goal: Ensure long-term model reliability


Tasks:

 Track performance metrics over time


 Detect data drift or concept drift
 Retrain with new data as needed

Reason: Models degrade over time due to changing patterns.


1.4 CRISP-DM (Cross-Industry Standard Process for Data Mining)

CRISP-DM is a structured, six-phase framework for organizing and executing data science and
predictive analytics projects. It’s industry-neutral, iterative, and designed to support the full
lifecycle of a data mining project.

PROCESS

[1. Business Understanding]



[2. Data Understanding]

[3. Data Preparation]

[4. Modeling]

[5. Evaluation]

[6. Deployment]

[→ Iterate if necessary or update over time]

1. Business Understanding

Understand the project objectives and requirements from a business perspective.

 Define business goals (e.g., reduce churn, increase sales)


 Assess the current situation
 Translate business needs into a data science problem
2. Data Understanding

Collect initial data and begin to get familiar with it.

 Identify data sources


 Describe and explore data
 Detect data quality issues
 Perform initial EDA (exploratory data analysis)

3. Data Preparation

Build the dataset to feed into modeling tools.

 Clean data (handle missing values, outliers)


 Format data (types, encodings)
 Create features (feature engineering)
 Merge data from different sources

4. Modeling

Apply machine learning or statistical modeling techniques.

 Select modeling techniques (e.g., classification, regression)


 Train models on prepared data
 Tune model parameters
 Evaluate training performance

5. Evaluation

Assess the model to ensure it meets business objectives.

 Evaluate model performance using metrics


 Review whether the model satisfies the original business goals
 Decide on next steps (refine model, go to deployment, or revisit earlier steps)

6. Deployment

Implement the model into a real-world environment.

 Deploy to production systems (e.g., via API, dashboard)


 Create documentation
 Set up monitoring and maintenance plan
 Deliver final report to stakeholders

Iterative Loops: You may go back and forth between steps. For example:

 Poor evaluation results → return to Modeling or Data Prep


 New business goals → return to Business Understanding

Summary

Phase Description Output

Business Understand the business


Project goals, data mining goals
Understanding problem

Data Understanding Collect & explore the data Initial insights, data quality report

Clean, select, and engineer


Data Preparation Modeling-ready dataset
features

Modeling Build predictive models Trained models, tuned parameters

Assess models against business


Evaluation Model performance report
goals

Live model, documentation, feedback


Deployment Implement the solution
loop

You might also like