0% found this document useful (0 votes)

148 views35 pages

Telecom Churn Prediction Report

The document provides a project report on predictive modeling of a telecom customer churn dataset. It includes: 1) The objective is to do EDA, check for missing values/outliers, and build models to predict customer churn. 2) Assumptions for logistic regression include little multicollinearity and a categorical dependent variable. 3) Libraries are loaded and the data is explored. There are no missing values. Basic EDA finds the churn rate is 14.5% and identifies variables correlated with churn.

Uploaded by

Shreya Garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

148 views35 pages

Telecom Churn Prediction Report

Uploaded by

Shreya Garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Project Report

-by Vipul Malpani

Predictive Modelling- Telecom Customer Churn Dataset

1. Project Objective:-
a. To do EDA on given data set
b. To check for missing value, outlier and multicollinearity in data set
c. To build a model which can predict whether a customer will cancel their service in
the future or not using different predictive modeling technique

2. Assumptions:-
a. Logistic regression requires there to be little or no multicollinearity among the
independent variables. This means that the independent variables should not be
too highly correlated with each other.
b. Dependent variable should be categorical.
c. Outcome is always categorical

3. Environment setup and library installation

#Loading the libraries and dataset
library(readxl)
library(corrplot)

library(psych)
library(ggplot2)

library(RColorBrewer)
library(caTools)
library(car)

library(data.table)
library(ROCR)

library(class)
library(funModeling)

library(tidyverse)

library(Hmisc)
library(ineq)
library(caret)

library(e1071)

celldata <- read_excel("~/Downloads/cellphoneData.xlsx")

##View(Cellphone)

#analysising and treating data

str(celldata)

## Classes 'tbl_df', 'tbl' and 'data.frame': 3333 obs. of 11 variables:

## $ Churn : num 0 0 0 0 0 0 0 0 0 0 ...
## $ AccountWeeks : num 128 107 137 84 75 118 121 147 117 141 ...
## $ ContractRenewal: num 1 1 1 0 0 0 1 0 1 0 ...
## $ DataPlan : num 1 1 0 0 0 0 1 0 0 1 ...
## $ DataUsage : num 2.7 3.7 0 0 0 0 2.03 0 0.19 3.02 ...
## $ CustServCalls : num 1 1 0 2 3 0 3 0 1 0 ...
## $ DayMins : num 265 162 243 299 167 ...
## $ DayCalls : num 110 123 114 71 113 98 88 79 97 84 ...
## $ MonthlyCharge : num 89 82 52 57 41 57 87.3 36 63.9 93.2 ...
## $ OverageFee : num 9.87 9.78 6.06 3.1 7.42 ...
## $ RoamMins : num 10 13.7 12.2 6.6 10.1 6.3 7.5 7.1 8.7 11.2 ...

dim(celldata)

## [1] 3333 11

attach(celldata)
boxplot(celldata)
4. Basic Treatment of Data
#saving the data into another dataset for backup
celldata_n = celldata

##fatorising the catagorical variables

celldata$Churn=as.factor(celldata$Churn)
celldata$ContractRenewal=as.factor(celldata$ContractRenewal)
celldata$DataPlan=as.factor(celldata$DataPlan)
summary(celldata)

## Churn AccountWeeks ContractRenewal DataPlan DataUsage

## 0:2850 Min. : 1.0 0: 323 0:2411 Min. :0.0000
## 1: 483 1st Qu.: 74.0 1:3010 1: 922 1st Qu.:0.0000
## Median :101.0 Median :0.0000
## Mean :101.1 Mean :0.8165
## 3rd Qu.:127.0 3rd Qu.:1.7800
## Max. :243.0 Max. :5.4000
## CustServCalls DayMins DayCalls MonthlyCharge
## Min. :0.000 Min. : 0.0 Min. : 0.0 Min. : 14.00
## 1st Qu.:1.000 1st Qu.:143.7 1st Qu.: 87.0 1st Qu.: 45.00
## Median :1.000 Median :179.4 Median :101.0 Median : 53.50
## Mean :1.563 Mean :179.8 Mean :100.4 Mean : 56.31
## 3rd Qu.:2.000 3rd Qu.:216.4 3rd Qu.:114.0 3rd Qu.: 66.20
## Max. :9.000 Max. :350.8 Max. :165.0 Max. :111.30
## OverageFee RoamMins
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 8.33 1st Qu.: 8.50
## Median :10.07 Median :10.30
## Mean :10.05 Mean :10.24
## 3rd Qu.:11.77 3rd Qu.:12.10
## Max. :18.19 Max. :20.00

str(celldata)

## Classes 'tbl_df', 'tbl' and 'data.frame': 3333 obs. of 11 variables:

## $ Churn : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ AccountWeeks : num 128 107 137 84 75 118 121 147 117 141 ...
## $ ContractRenewal: Factor w/ 2 levels "0","1": 2 2 2 1 1 1 2 1 2 1 ...
## $ DataPlan : Factor w/ 2 levels "0","1": 2 2 1 1 1 1 2 1 1 2 ...
## $ DataUsage : num 2.7 3.7 0 0 0 0 2.03 0 0.19 3.02 ...
## $ CustServCalls : num 1 1 0 2 3 0 3 0 1 0 ...
## $ DayMins : num 265 162 243 299 167 ...
## $ DayCalls : num 110 123 114 71 113 98 88 79 97 84 ...
## $ MonthlyCharge : num 89 82 52 57 41 57 87.3 36 63.9 93.2 ...
## $ OverageFee : num 9.87 9.78 6.06 3.1 7.42 ...
## $ RoamMins : num 10 13.7 12.2 6.6 10.1 6.3 7.5 7.1 8.7 11.2 ...
#basic EDA
basic_eda <- function(data)
{
summary(celldata)
df_status(celldata)
freq(celldata)
plot_num(celldata)
#profiling_num(celldata)
hist(celldata)
describe(celldata)
attach(celldata)

basic_eda(celldata)

## variable q_zeros p_zeros q_na p_na q_inf p_inf type unique

## 1 Churn 2850 85.51 0 0 0 0 factor 2
## 2 AccountWeeks 0 0.00 0 0 0 0 numeric 212
## 3 ContractRenewal 323 9.69 0 0 0 0 factor 2
## 4 DataPlan 2411 72.34 0 0 0 0 factor 2
## 5 DataUsage 1813 54.40 0 0 0 0 numeric 174
## 6 CustServCalls 697 20.91 0 0 0 0 numeric 10
## 7 DayMins 2 0.06 0 0 0 0 numeric 1667
## 8 DayCalls 2 0.06 0 0 0 0 numeric 119
## 9 MonthlyCharge 0 0.00 0 0 0 0 numeric 656
## 10 OverageFee 1 0.03 0 0 0 0 numeric 1024
## 11 RoamMins 18 0.54 0 0 0 0 numeric 162
from above result we can see that the percentage of occurrence of zero's is
85.51 percent. So 14.5% is the churn rate, 483/3333 have churned.
We can see that percentage of occurrence of contract equals to 1 is 90.31 percent which is
very high as compared to occurrence of zero’s.

Similar trend of majority of one’s can be seen here.

Here we can also see that all the variables are almost normally distributed and fulfill
normal distribution assumption
Histograms of all the variables

5. Plots for each variable:-

ggplot(celldata, aes(Churn)) + geom_bar(fill="blue")

#Account Week
One would expect a decreasing churn rate with the increase in the time
(account weeks) of an account, but it does not seem to be the case. There is
no clear trend visible.

#Contract Renewal

table(Churn, ContractRenewal)
## ContractRenewal
## Churn 0 1
## 0 186 2664
## 1 137 346

(137/(186+137))

## [1] 0.4241486

Clearly, there is a good probability (approx 42%) of an account churning if

the contract has not been renewed.

#Data Plan

The probability of an account churning is higher if the account has not

subscribed to a data plan.

#Data Usage
#Clearly, maximum churn is in the 0-0.5 data usage category.
#CustServCall

clearly churn rate significantly increases if user makes more than 4 calls

#DayMins
#The churn rate increases if the monthly average daytime minutes are greater
than 245.

#DayCalls

#no clear pattern can be observed here

#MonthlyCharge

#The churn Rate observed to be maximum if the monthly bill is between 64-74.

#OverageFee
no clear observation can be made from this

#RoamMins

Note: no clear observation can be made from

#check missing values

anyNA(celldata)

## [1] FALSE

no values are missing so missing values treatment is not required

#Outliers check
boxplot(celldata,horizontal = TRUE)
Note: There are outliers in the dataset but they are not asked to be treated else wuld have
used KNN impute method to treat the outliers.

5.2 Collinearity:-
#LOGISTIC REGRESSION
set.seed(12345)
model1 <- glm(Churn ~ ., data= train, family=binomial)
summary(model1)

##
## Call:
## glm(formula = Churn ~ ., family = binomial, data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9954 -0.5164 -0.3475 -0.2119 2.9881
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.3368705 0.6574349 -8.118 4.75e-16 ***
## AccountWeeks 0.0011308 0.0016600 0.681 0.495737
## ContractRenewal1 -1.9892428 0.1699581 -11.704 < 2e-16 ***
## DataPlan1 -1.0422186 0.6527079 -1.597 0.110319
## DataUsage -0.7635666 2.3185753 -0.329 0.741909
## CustServCalls 0.5319410 0.0473461 11.235 < 2e-16 ***
## DayMins -0.0016103 0.0391549 -0.041 0.967195
## DayCalls -0.0008953 0.0033234 -0.269 0.787635
## MonthlyCharge 0.0806978 0.2301505 0.351 0.725865
## OverageFee -0.0283545 0.3926032 -0.072 0.942425
## RoamMins 0.0928157 0.0266561 3.482 0.000498 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1930.4 on 2332 degrees of freedom
## Residual deviance: 1531.2 on 2322 degrees of freedom
## AIC: 1553.2
##
## Number of Fisher Scoring iterations: 6

vif(model1)

## AccountWeeks ContractRenewal DataPlan DataUsage

## 1.004962 1.049342 14.890864 1668.816467
## CustServCalls DayMins DayCalls MonthlyCharge
## 1.071380 987.216745 1.005758 3013.929613
## OverageFee RoamMins
## 211.337513 1.199179

Note: The multicolliniearity has caused the inflated VIF values for
correlated variables, making the model unreliable.
#As per stats facts, if vif value is greater than 5 then multicolienarity is
maximum . so we wil stepwise remove variable whose vif value is grater than 5

#remove MonthlyCharge
model2 <- glm(Churn ~.-MonthlyCharge, data= train, family=binomial)
summary(model2)

##
## Call:
## glm(formula = Churn ~ . - MonthlyCharge, family = binomial, data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9942 -0.5171 -0.3465 -0.2109 2.9827
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.3006340 0.6491593 -8.165 3.20e-16 ***
## AccountWeeks 0.0011226 0.0016594 0.677 0.498711
## ContractRenewal1 -1.9883365 0.1698888 -11.704 < 2e-16 ***
## DataPlan1 -1.0477052 0.6522055 -1.606 0.108185
## DataUsage 0.0456863 0.2209013 0.207 0.836152
## CustServCalls 0.5316024 0.0473523 11.227 < 2e-16 ***
## DayMins 0.0121123 0.0012723 9.520 < 2e-16 ***
## DayCalls -0.0008644 0.0033235 -0.260 0.794787
## OverageFee 0.1089775 0.0272701 3.996 6.44e-05 ***
## RoamMins 0.0926841 0.0266490 3.478 0.000505 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1930.4 on 2332 degrees of freedom
## Residual deviance: 1531.3 on 2323 degrees of freedom
## AIC: 1551.3
##
## Number of Fisher Scoring iterations: 6

vif(model2)

## AccountWeeks ContractRenewal DataPlan DataUsage

## 1.004781 1.049038 14.875167 15.159742
## CustServCalls DayMins DayCalls OverageFee
## 1.070753 1.042789 1.004988 1.019429
## RoamMins
## 1.199067

#now removing dataUsage

model3 <- glm(Churn ~.-MonthlyCharge-DataUsage, data= train, family=binomial)
summary(model3)

##
## Call:
## glm(formula = Churn ~ . - MonthlyCharge - DataUsage, family = binomial,
## data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9968 -0.5181 -0.3465 -0.2112 2.9759
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.3190778 0.6430461 -8.272 < 2e-16 ***
## AccountWeeks 0.0011291 0.0016593 0.680 0.496202
## ContractRenewal1 -1.9892584 0.1698531 -11.712 < 2e-16 ***
## DataPlan1 -0.9177473 0.1707087 -5.376 7.61e-08 ***
## CustServCalls 0.5311885 0.0472969 11.231 < 2e-16 ***
## DayMins 0.0121141 0.0012722 9.522 < 2e-16 ***
## DayCalls -0.0008638 0.0033232 -0.260 0.794906
## OverageFee 0.1088689 0.0272578 3.994 6.50e-05 ***
## RoamMins 0.0948722 0.0244771 3.876 0.000106 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1930.4 on 2332 degrees of freedom
## Residual deviance: 1531.3 on 2324 degrees of freedom
## AIC: 1549.3
##
## Number of Fisher Scoring iterations: 5

vif(model3)

## AccountWeeks ContractRenewal DataPlan CustServCalls

## 1.004348 1.048383 1.020221 1.068665
## DayMins DayCalls OverageFee RoamMins
## 1.042708 1.005033 1.019067 1.010847

#Now vif value of the variable is less than 5. Accountweeks & Daycalls are
insignificant varaibles for Model3, so we will remove Accountsweek and
daycalls step wise from our final model to check if they are affecting the
AIC and and residual deviance.

model4 <- glm(Churn ~.-MonthlyCharge-DataUsage-AccountWeeks, data= train,

family=binomial)
summary(model4)

##
## Call:
## glm(formula = Churn ~ . - MonthlyCharge - DataUsage - AccountWeeks,
## family = binomial, data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9990 -0.5155 -0.3463 -0.2110 2.9737
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.1962737 0.6162506 -8.432 < 2e-16 ***
## ContractRenewal1 -1.9933451 0.1697498 -11.743 < 2e-16 ***
## DataPlan1 -0.9145091 0.1705327 -5.363 8.20e-08 ***
## CustServCalls 0.5312599 0.0472621 11.241 < 2e-16 ***
## DayMins 0.0121098 0.0012720 9.521 < 2e-16 ***
## DayCalls -0.0008059 0.0033222 -0.243 0.808321
## OverageFee 0.1081960 0.0272360 3.973 7.11e-05 ***
## RoamMins 0.0946130 0.0244739 3.866 0.000111 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1930.4 on 2332 degrees of freedom
## Residual deviance: 1531.8 on 2325 degrees of freedom
## AIC: 1547.8
##
## Number of Fisher Scoring iterations: 5

model5 <- glm(Churn ~.-MonthlyCharge-DataUsage-AccountWeeks-DayCalls, data=

train, family=binomial)
summary(model5)

##
## Call:
## glm(formula = Churn ~ . - MonthlyCharge - DataUsage - AccountWeeks -
## DayCalls, family = binomial, data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0026 -0.5143 -0.3463 -0.2115 2.9666
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.278539 0.515106 -10.247 < 2e-16 ***
## ContractRenewal1 -1.992118 0.169637 -11.743 < 2e-16 ***
## DataPlan1 -0.913216 0.170453 -5.358 8.43e-08 ***
## CustServCalls 0.531395 0.047257 11.245 < 2e-16 ***
## DayMins 0.012101 0.001271 9.519 < 2e-16 ***
## OverageFee 0.108411 0.027220 3.983 6.81e-05 ***
## RoamMins 0.094548 0.024470 3.864 0.000112 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1930.4 on 2332 degrees of freedom
## Residual deviance: 1531.9 on 2326 degrees of freedom
## AIC: 1545.9
##
## Number of Fisher Scoring iterations: 5

vif(model5)

## ContractRenewal DataPlan CustServCalls DayMins

## 1.046460 1.018063 1.068819 1.041792
## OverageFee RoamMins
## 1.016316 1.010383

#step(model1)

summary(model5)

vif(model5)

## ContractRenewal DataPlan CustServCalls DayMins

## 1.046460 1.018063 1.068819 1.041792
## OverageFee RoamMins
## 1.016316 1.010383

Note: Model 5 is created with 6 number of significant variables with no

correlation.

AIC: 1545.9 is the lowest for model5

Explanatory Power of odds

round(exp(coef(model5)),2)

## (Intercept) ContractRenewal1 DataPlan1 CustServCalls

## 0.01 0.14 0.40 1.70
## DayMins OverageFee RoamMins
## 1.01 1.11 1.10

Note: Rounded off values

Classification/Prediction based on threshold value (train)

Pred.model5=predict(model5,type = "response",data=train)
summary(Pred.model5)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 0.001599 0.041391 0.084945 0.144878 0.186781 0.984946

summary(train$Churn)

## 0 1
## 1995 338

plot(train$Churn,Pred.model5)

Note: from the graph it is clear that we give the threshold of 0.5 , which
means those who have probabilty greater than 0.5 will be classified as
Churned (Customer will end service) and rest will be classified as Not
churned(Customer will continue service).
Pred.model5.factor=ifelse(Pred.model5<0.20,0,1)

6.Confusion Matrix
TRAIN Model

confusionMatrix(table(Actual=train$Churn,Pred.model5.factor))
## Confusion Matrix and Statistics
##
## Pred.model5.factor
## Actual 0 1
## 0 1674 321
## 1 109 229
## Accuracy : 0.8157
## 95% CI : (0.7993, 0.8312)
##
## Mcnemar's Test P-Value : < 0.00000000000000022
##
## Sensitivity : 0.9389
## Specificity : 0.4164
## Pos Pred Value : 0.8391
## Neg Pred Value : 0.6775
## Prevalence : 0.7643
## Detection Rate : 0.7175
## Detection Prevalence : 0.8551
## Balanced Accuracy : 0.6776

Pred.model.test.factor=ifelse(Pred.model.test<0.20,0,1)
confusionMatrix(table(Actual=test$Churn,Pred.model.test.factor))
## Confusion Matrix and Statistics
##
## Pred.model.test.factor
## Actual 0 1
## 0 720 135
## 1 54 91
##
## Accuracy : 0.811
## 95% CI : (0.7853, 0.8348)
## No Information Rate : 0.774
##
## Sensitivity : 0.9302
## Specificity : 0.4027
## Pos Pred Value : 0.8421
## Neg Pred Value : 0.6276
## Prevalence : 0.7740
## Detection Rate : 0.7200
## Detection Prevalence : 0.8550
## Balanced Accuracy : 0.6664

Note: after finding out the Predicted model at .20 we have corrected the user
from 115 to 49 who were fassely predicted to not to be churned(In actual they
were churned), though this makes an overall impact on the accuracy and falls
down to 82% from 85 but also imporves the specificity by 7-8% which is
acceptable.

Initially we had threshold value equal to .50 but after creating the ROC and
taking the business in account we realized that keeping .20 as threshold we
can try to hold most number of users from getting churn. Though that have a
impact of 4% of users with falsely positives Churn which leads to send some
measures to be taken to retrieve that users but this is less than the impact
of losing the users with falsely considered as non-churn users even when they
are about to churn.

From business perspective retaining a user is also important factor which is

taken into consideration.

6 ROC Curve on train and test dataset respectively

NOTE: From ROC curve we were able to identify the threshold where we initially assumed
it to be .50.
library(blorr) # to build and validate binary logistic models

blr_step_aic_both(model5, details = FALSE)

## Stepwise Selection Method

## -------------------------
##
## Candidate Terms:
##
## 1 . ContractRenewal
## 2 . DataPlan
## 3 . CustServCalls
## 4 . DayMins
## 5 . OverageFee
## 6 . RoamMins
##
##
## Variables Entered/Removed:
##
## - ContractRenewal added
## - CustServCalls added
## - DayMins added
## - DataPlan added
## - OverageFee added
## - RoamMins added

##
##
## Stepwise Summary
## ---------------------------------------------------------------
## Variable Method AIC BIC Deviance
## ---------------------------------------------------------------
## ContractRenewal addition 1804.930 1816.439 1800.930
## CustServCalls addition 1691.744 1709.009 1685.744
## DayMins addition 1599.200 1622.220 1591.200
## DataPlan addition 1571.798 1600.573 1561.798
## OverageFee addition 1559.186 1593.716 1547.186
## RoamMins addition 1545.863 1586.147 1531.863
## ---------------------------------------------------------------

7 ModelPerformanceParameter
Train

KS_train
## [1] 0.5285685

auc_train

## [1] 0.8190506

gini

## [1] 0.5241413

Test
KS_test

## [1] 0.7579812

auc_test

## [1] 0.8178907

gini

## [1] 0.514548

Note: KS, AUC and GINI values are calculated for the logistic model for both
train and test data.
8 KNN Classifier

#normalize the test & train data

#split the normalized dataset into 7:3 ratio of train and test respectively

## knn.pred
## 0 1
## 0 850 5
## 1 90 55

sum(diag(table.knn)/sum(table.knn))

## [1] 0.905

confusionMatrix(table.knn)

## Confusion Matrix and Statistics

##
## knn.pred
## 0 1
## 0 850 5
## 1 90 55
##
## Accuracy : 0.905
## 95% CI : (0.8851, 0.9225)
## No Information Rate : 0.94
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.4936
##
## Mcnemar's Test P-Value : <0.0000000000000002
##
## Sensitivity : 0.9043
## Specificity : 0.9167
## Pos Pred Value : 0.9942
## Neg Pred Value : 0.3793
## Prevalence : 0.9400
## Detection Rate : 0.8500
## Detection Prevalence : 0.8550
## Balanced Accuracy : 0.9105
##
## 'Positive' Class : 0
##

accuracy percentage obtained is 90.5% at k = 11 , where majority rule is

applied to predict the churn value.
9 Naïve Bayes

confusionMatrix(table(train$Churn,Nb.prediction.train))

## Confusion Matrix and Statistics

##
## Nb.prediction.train
## 0 1
## 0 1771 224
## 1 121 217
##
## Accuracy : 0.8521
## 95% CI : (0.8371, 0.8663)
## No Information Rate : 0.811
## P-Value [Acc > NIR] : 0.00000009962
##
## Kappa : 0.4702
##
## Mcnemar's Test P-Value : 0.00000003985
##
## Sensitivity : 0.9360
## Specificity : 0.4921
## Pos Pred Value : 0.8877
## Neg Pred Value : 0.6420
## Prevalence : 0.8110
## Detection Rate : 0.7591
## Detection Prevalence : 0.8551
## Balanced Accuracy : 0.7141
##
## 'Positive' Class : 0
##

confusionMatrix(table(test$Churn,Nb.prediction.test))

## Confusion Matrix and Statistics

##
## Nb.prediction.test
## 0 1
## 0 749 106
## 1 60 85
##
## Accuracy : 0.834
## 95% CI : (0.8095, 0.8566)
## No Information Rate : 0.809
## P-Value [Acc > NIR] : 0.0229328
##
## Kappa : 0.4084
##
## Mcnemar's Test P-Value : 0.0004782
##
## Sensitivity : 0.9258
## Specificity : 0.4450
## Pos Pred Value : 0.8760
## Neg Pred Value : 0.5862
## Prevalence : 0.8090
## Detection Rate : 0.7490
## Detection Prevalence : 0.8550
## Balanced Accuracy : 0.6854
##
## 'Positive' Class : 0

10 Model Comparison Table

Logistic Regression KNN Naïve Bayes

Accuracy 81.1% 90.5% 85.1%

Sensitivity 93.02% 90.63% 93%

Specificity 40.27% 91.67% 49.50%

Note: From the above Comparison of different Model, we can say that KNN method is
best in our case in predicting the customer who will discontinue the services. If we
predict using KNN then our prediction is 90.5% accurate.
11 Interpretation and conclusion.
Our model is using contract renewal, dataplan, custservcalls, Daymins, overagefee &
roammins features from past data to make a decision if a customer will churn or
not.Since this feature are most important company should focus on this features.

Recommendation for telecom company based on my model:-

 Company Should focus in making existing customer to renew their

contract by giving them best offers as compared to competitor telecom
company.
 Company should also focus more in making customer to opt for
dataplan by providing then best offers on dataplan .
 Company should focus on providing best customer support through
customer service calls.
 Company should also focus on customer whose Daymins are reducing over a
time and try to interact with customer for feedback on call quality.

SOURCE CODE Telecom
No ratings yet
SOURCE CODE Telecom
30 pages
Telecom Churn Prediction Analysis
No ratings yet
Telecom Churn Prediction Analysis
23 pages
Telecom Customer Churn Prediction Assessment-Pratik Zanke
No ratings yet
Telecom Customer Churn Prediction Assessment-Pratik Zanke
19 pages
Telecom Churn Prediction Model
No ratings yet
Telecom Churn Prediction Model
13 pages
Telecom Customer Churn
0% (1)
Telecom Customer Churn
39 pages
Program 4+Linear+Discriminant+Analysis+-+Mentor+Version0.2 - New
No ratings yet
Program 4+Linear+Discriminant+Analysis+-+Mentor+Version0.2 - New
16 pages
Telecom Churn Prediction Model
100% (3)
Telecom Churn Prediction Model
15 pages
DM Group Assignment
No ratings yet
DM Group Assignment
23 pages
Predictive Modelling Project - Business Report
100% (1)
Predictive Modelling Project - Business Report
23 pages
Group Assignment - Predictive Modelling
No ratings yet
Group Assignment - Predictive Modelling
23 pages
Predictive Modelling
100% (1)
Predictive Modelling
58 pages
Project 4 - Predictive Modeling - Telecom Customer Churn Prediction PDF
No ratings yet
Project 4 - Predictive Modeling - Telecom Customer Churn Prediction PDF
22 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
23 pages
Predictive Modelling - Linear Discriminant Analysis - Mentor Version - Jupyter Notebook
100% (1)
Predictive Modelling - Linear Discriminant Analysis - Mentor Version - Jupyter Notebook
25 pages
Exploratry Data Analysis of The Telecom Customer Churn
No ratings yet
Exploratry Data Analysis of The Telecom Customer Churn
16 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
16 pages
Telecome Churn
No ratings yet
Telecome Churn
4 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
19 pages
Telecom Churn Solutions
No ratings yet
Telecom Churn Solutions
66 pages
Analysis
No ratings yet
Analysis
37 pages
Module 3.2
No ratings yet
Module 3.2
7 pages
Capstone Presentation: Telecom Churn Study
100% (3)
Capstone Presentation: Telecom Churn Study
19 pages
Telecom Customer Churn RV PDF
No ratings yet
Telecom Customer Churn RV PDF
29 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Hanoi - 2021: (Document Title)
No ratings yet
Hanoi - 2021: (Document Title)
19 pages
Telco Churn Analysis for Students
No ratings yet
Telco Churn Analysis for Students
23 pages
Bank Marketing Data Analysis
No ratings yet
Bank Marketing Data Analysis
18 pages
Telco Customers Churn Predication - Analysis
No ratings yet
Telco Customers Churn Predication - Analysis
24 pages
DS Capestone PDF
No ratings yet
DS Capestone PDF
41 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
CustomerChurnPrediction ProjectReport 2555425555
No ratings yet
CustomerChurnPrediction ProjectReport 2555425555
19 pages
Comparing Methods Assignment
No ratings yet
Comparing Methods Assignment
2 pages
Telecommunication Customer Churn (New)
100% (1)
Telecommunication Customer Churn (New)
23 pages
Report
No ratings yet
Report
17 pages
DM Assg 041
No ratings yet
DM Assg 041
9 pages
Customer Churn Analysis - Jupyter Notebook
No ratings yet
Customer Churn Analysis - Jupyter Notebook
10 pages
Telco Customer Churn
100% (2)
Telco Customer Churn
11 pages
Telecom Churn Prediction Guide
No ratings yet
Telecom Churn Prediction Guide
9 pages
Churn Predictions
No ratings yet
Churn Predictions
96 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
AML Project LearnerNotebook LowCode
No ratings yet
AML Project LearnerNotebook LowCode
74 pages
ADA Assignment - Final - 2024
No ratings yet
ADA Assignment - Final - 2024
5 pages
Bank Rpubs
No ratings yet
Bank Rpubs
24 pages
Thera Bank
100% (1)
Thera Bank
25 pages
Lab 16 Questions
No ratings yet
Lab 16 Questions
5 pages
Capstone Project Vivek
100% (4)
Capstone Project Vivek
145 pages
Churn Assignment
No ratings yet
Churn Assignment
11 pages
Telco Churn Analysis
No ratings yet
Telco Churn Analysis
9 pages
FRA Group Assignment - Report
No ratings yet
FRA Group Assignment - Report
22 pages
Course Project Report: Indian Institute of Technology, Kanpur
No ratings yet
Course Project Report: Indian Institute of Technology, Kanpur
15 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
30 pages
Caselet2 - Understanding Customer Churn in Telecom Sector
No ratings yet
Caselet2 - Understanding Customer Churn in Telecom Sector
2 pages
Customer Churn Analysis 1740361695
No ratings yet
Customer Churn Analysis 1740361695
14 pages
ADVANCED DATA ANALYTICS Vaibhav Maheshwari Submission
No ratings yet
ADVANCED DATA ANALYTICS Vaibhav Maheshwari Submission
11 pages
AIML Lab Ex 3-5 - 1
No ratings yet
AIML Lab Ex 3-5 - 1
31 pages
Final - Bank Customer Response Prediction Model
No ratings yet
Final - Bank Customer Response Prediction Model
23 pages
Urban Clap - Anu
No ratings yet
Urban Clap - Anu
10 pages
Cambridge Lower Secondary Computing 7 (Ben Barnes, Tristan Kirkpatrick Etc.) (Z-Library)
79% (19)
Cambridge Lower Secondary Computing 7 (Ben Barnes, Tristan Kirkpatrick Etc.) (Z-Library)
239 pages
Answers Key IGCSE Computer Science 2nd Edition Hodder CourseBook by David Watson
79% (38)
Answers Key IGCSE Computer Science 2nd Edition Hodder CourseBook by David Watson
132 pages
Workbook Answers: Unit 1 Respiration
88% (242)
Workbook Answers: Unit 1 Respiration
30 pages
Physics For Cambridge IGCSE Coursebook David Sang Third Edition 1
81% (36)
Physics For Cambridge IGCSE Coursebook David Sang Third Edition 1
527 pages
Learner's Book Answers: Unit 1 Cells
90% (124)
Learner's Book Answers: Unit 1 Cells
31 pages
Cambridge Checkpoint Lower Secondary Science Students Book 8
87% (54)
Cambridge Checkpoint Lower Secondary Science Students Book 8
517 pages
Cambridge IGCSE and O Level Geography 3rd Edition
73% (15)
Cambridge IGCSE and O Level Geography 3rd Edition
453 pages
LS English 9 Workbook Answers
92% (50)
LS English 9 Workbook Answers
23 pages
Workbook Stage 9
87% (52)
Workbook Stage 9
205 pages
Cambridge IGCSE and O Level Business Studies Coursebook - 5th Ed
88% (75)
Cambridge IGCSE and O Level Business Studies Coursebook - 5th Ed
380 pages
Cambridge Year 7 Math Learner Book Answers
87% (149)
Cambridge Year 7 Math Learner Book Answers
57 pages
Progress in Geography KS3 Sample Material - 1 PDF
70% (10)
Progress in Geography KS3 Sample Material - 1 PDF
17 pages
Science Revision Guide - Collins KS3
91% (11)
Science Revision Guide - Collins KS3
148 pages
Cambridge Checkpoint Lower Secondary Mathematics Students Book 8 (Frankie Pimentel, Ric Pimentel, Terry Wall) (Z-Library)
100% (26)
Cambridge Checkpoint Lower Secondary Mathematics Students Book 8 (Frankie Pimentel, Ric Pimentel, Terry Wall) (Z-Library)
284 pages
Igcse Biology Coursebook 4th Ed Answers
91% (44)
Igcse Biology Coursebook 4th Ed Answers
58 pages
LS English 8 Workbook Answers
87% (85)
LS English 8 Workbook Answers
20 pages
Cambridge IGCSE and O Level Computer Science Study and Revision Guide Second Edition (David Watson, Helen Williams, David Fairley) (Z-Library)
86% (29)
Cambridge IGCSE and O Level Computer Science Study and Revision Guide Second Edition (David Watson, Helen Williams, David Fairley) (Z-Library)
211 pages
Cambridge IGCSE and O Level Business Studies Revised Coursebook
94% (31)
Cambridge IGCSE and O Level Business Studies Revised Coursebook
373 pages
Cambridge IGCSE® Geography Coursebook Second Edition
55% (20)
Cambridge IGCSE® Geography Coursebook Second Edition
61 pages
Cambridge IGCSE Biology Executive Preview - Digital
50% (26)
Cambridge IGCSE Biology Executive Preview - Digital
139 pages
3 Cambridge IGCSE and O Level Computer Science Teachers Guide With Boost Subscription Booklet
36% (11)
3 Cambridge IGCSE and O Level Computer Science Teachers Guide With Boost Subscription Booklet
51 pages
Key Stage Three °: Mathematics
71% (7)
Key Stage Three °: Mathematics
100 pages
Cambridge IGCSE Business Studies Coursebook With CD-ROM PDF
80% (10)
Cambridge IGCSE Business Studies Coursebook With CD-ROM PDF
424 pages
Cambridge Primary English 5 Answers
81% (43)
Cambridge Primary English 5 Answers
32 pages
KS3 History Workbook
100% (1)
KS3 History Workbook
131 pages
Year 7 and 8 Textbook
92% (12)
Year 7 and 8 Textbook
680 pages
Cambridge IGCSE and O Level History 3rd Edition - Option B - Ben Walsh, Benjamin Harrison - 2022 - Hodder Education - 9781398374904 - Anna's Archive
91% (22)
Cambridge IGCSE and O Level History 3rd Edition - Option B - Ben Walsh, Benjamin Harrison - 2022 - Hodder Education - 9781398374904 - Anna's Archive
376 pages
Industria Oxford PDF
100% (11)
Industria Oxford PDF
228 pages
Igcse Physics New Textbook
94% (17)
Igcse Physics New Textbook
492 pages
Cambridge Primary Mathematics 6 Learner's Book Second Edition
40% (40)
Cambridge Primary Mathematics 6 Learner's Book Second Edition
10 pages
Ex - No: 1 (A) SQL Commands: DDL Commands: (Data Definition Language)
No ratings yet
Ex - No: 1 (A) SQL Commands: DDL Commands: (Data Definition Language)
6 pages
Siggraph2011 SpecialEffectsWithDepth WithNotes
No ratings yet
Siggraph2011 SpecialEffectsWithDepth WithNotes
57 pages
Iso 10303 - Step Format
100% (1)
Iso 10303 - Step Format
5 pages
Assignment#2 Report
No ratings yet
Assignment#2 Report
7 pages
CORM 2011 Calculation of CCT and Duv and Practical Conversion Formulae
No ratings yet
CORM 2011 Calculation of CCT and Duv and Practical Conversion Formulae
28 pages
Mimics Student Edition Course Book
100% (1)
Mimics Student Edition Course Book
83 pages
JavaScript Quiz & Coding Tasks
No ratings yet
JavaScript Quiz & Coding Tasks
6 pages
IS 4420 Database Fundamentals Database Development Process Leon Chen
No ratings yet
IS 4420 Database Fundamentals Database Development Process Leon Chen
28 pages
System Diagrams UML
No ratings yet
System Diagrams UML
97 pages
Building Information Modeling: An Academic Perspective
No ratings yet
Building Information Modeling: An Academic Perspective
11 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
16 pages
Unit 3 DBMS
No ratings yet
Unit 3 DBMS
46 pages
Use Cases of Hotel Management System
100% (1)
Use Cases of Hotel Management System
17 pages
Assignment 3: Simulation and Stella Report
100% (3)
Assignment 3: Simulation and Stella Report
14 pages
17CS834 SMS
No ratings yet
17CS834 SMS
2 pages
Chapter 5, Analysis: Dynamic Modeling
No ratings yet
Chapter 5, Analysis: Dynamic Modeling
37 pages
Programming In: Revised 2 Edition
No ratings yet
Programming In: Revised 2 Edition
29 pages
DBMS Lab Manual 2019-20
No ratings yet
DBMS Lab Manual 2019-20
47 pages
Time Series Forecasting Guide
No ratings yet
Time Series Forecasting Guide
29 pages
System Simulation Course Guide
No ratings yet
System Simulation Course Guide
38 pages
Understanding Data Models in Databases
No ratings yet
Understanding Data Models in Databases
60 pages
2.4 Structured Analysis and Design: Symbols Used For Constructing Dfds 1. Function Symbol
No ratings yet
2.4 Structured Analysis and Design: Symbols Used For Constructing Dfds 1. Function Symbol
3 pages
Muppet Fozzie Bear Paper Craft Printable 0710
No ratings yet
Muppet Fozzie Bear Paper Craft Printable 0710
2 pages
Essential 3D Texturing Terms
No ratings yet
Essential 3D Texturing Terms
9 pages
Comandos FREEBasic
No ratings yet
Comandos FREEBasic
3 pages
DBMS - Interview Questions and Answers: Level 1
No ratings yet
DBMS - Interview Questions and Answers: Level 1
30 pages
Arts BPM
No ratings yet
Arts BPM
20 pages
Practical On Sar Image Processing and Classification: Part A: Speckle Suppression
No ratings yet
Practical On Sar Image Processing and Classification: Part A: Speckle Suppression
6 pages
OOSE Syllabus
No ratings yet
OOSE Syllabus
2 pages
Sap Powerdesigner: Object-Oriented Model Report
No ratings yet
Sap Powerdesigner: Object-Oriented Model Report
13 pages

Telecom Churn Prediction Report

Uploaded by

Telecom Churn Prediction Report

Uploaded by

Project Report

-by Vipul Malpani

Predictive Modelling- Telecom Customer Churn Dataset

3. Environment setup and library installation

celldata <- read_excel("~/Downloads/cellphoneData.xlsx")

#analysising and treating data

## Classes 'tbl_df', 'tbl' and 'data.frame': 3333 obs. of 11 variables:

##fatorising the catagorical variables

## Churn AccountWeeks ContractRenewal DataPlan DataUsage

## Classes 'tbl_df', 'tbl' and 'data.frame': 3333 obs. of 11 variables:

## variable q_zeros p_zeros q_na p_na q_inf p_inf type unique

Similar trend of majority of one’s can be seen here.

5. Plots for each variable:-

Clearly, there is a good probability (approx 42%) of an account churning if

The probability of an account churning is higher if the account has not

#no clear pattern can be observed here

Note: no clear observation can be made from

#check missing values

no values are missing so missing values treatment is not required

## AccountWeeks ContractRenewal DataPlan DataUsage

## AccountWeeks ContractRenewal DataPlan DataUsage

#now removing dataUsage

## AccountWeeks ContractRenewal DataPlan CustServCalls

model4 <- glm(Churn ~.-MonthlyCharge-DataUsage-AccountWeeks, data= train,

model5 <- glm(Churn ~.-MonthlyCharge-DataUsage-AccountWeeks-DayCalls, data=

## ContractRenewal DataPlan CustServCalls DayMins

## ContractRenewal DataPlan CustServCalls DayMins

Note: Model 5 is created with 6 number of significant variables with no

AIC: 1545.9 is the lowest for model5

Explanatory Power of odds

## (Intercept) ContractRenewal1 DataPlan1 CustServCalls

Note: Rounded off values

## Min. 1st Qu. Median Mean 3rd Qu. Max.

From business perspective retaining a user is also important factor which is

6 ROC Curve on train and test dataset respectively

blr_step_aic_both(model5, details = FALSE)

## Stepwise Selection Method

#normalize the test & train data

## Confusion Matrix and Statistics

accuracy percentage obtained is 90.5% at k = 11 , where majority rule is

## Confusion Matrix and Statistics

## Confusion Matrix and Statistics

10 Model Comparison Table

Logistic Regression KNN Naïve Bayes

Accuracy 81.1% 90.5% 85.1%

Sensitivity 93.02% 90.63% 93%

Specificity 40.27% 91.67% 49.50%

Recommendation for telecom company based on my model:-

 Company Should focus in making existing customer to renew their

You might also like