0% found this document useful (0 votes)
59 views69 pages

Seminar Report

This document is a seminar report submitted by Gajanan Patil to the Department of Computer Engineering at S.S.V.P.S.’s B.S. Deore College of Engineering in Dhule, India in partial fulfillment of the requirements for a Bachelor of Technology degree in Computer Engineering. The report proposes developing a dynamic churn prediction model using soft computing techniques and random forest supervised machine learning. It presents the problem statement, objectives, methodology including the system architecture, implementation details, results and evaluation of the model for predicting customer churn in the telecom industry.

Uploaded by

darshanpraj11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views69 pages

Seminar Report

This document is a seminar report submitted by Gajanan Patil to the Department of Computer Engineering at S.S.V.P.S.’s B.S. Deore College of Engineering in Dhule, India in partial fulfillment of the requirements for a Bachelor of Technology degree in Computer Engineering. The report proposes developing a dynamic churn prediction model using soft computing techniques and random forest supervised machine learning. It presents the problem statement, objectives, methodology including the system architecture, implementation details, results and evaluation of the model for predicting customer churn in the telecom industry.

Uploaded by

darshanpraj11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 69

A dynamic Churn prediction Model using soft

computing and Random Forest based Supervised


Learning

Seminar I report submitted in partial fulfillment of the


Requirements for the Degree of

Bachelor of Technology
In
Computer Engineering
Submitted by
Gajanan Patil

DEPARTMENT OF COMPUTER ENGINEERING


S.S.V.P.S.’s B.S. DEORE COLLEGE OF ENGINEERING, DHULE
2023-2024

I
A dynamic Churn prediction Model using soft
computing and Random Forest based Supervised
Learning

Seminar I report submitted in partial fulfillment of the


Requirements for the Degree of

Bachelor of Technology
In
Computer Engineering
Submitted by
Gajanan Patil
Guided by
Prof. R. V. Patil

DEPARTMENT OF COMPUTER ENGINEERING


S.S.V.P.S.’s B.S. DEORE COLLEGE OF ENGINEERING, DHULE
2023-2024

II
S.S.V.P.S.’s B.S. DEORE COLLEGE OF ENGINEERING, DHULE

DEPARTMENT OF COMPUTER ENGINEERING

CERTIFICATE

This is to certify that the Seminar I entitled “A Dynamic Churn prediction


using soft computing and Random Forest based Supervised Learning”
has been carried out by
Gajanan Patil

under my guidance in partial fulfillment of the degree of Bachelor of


Technology in Compute Engineering of Dr.Babasaheb Ambedkar
Technological University, Lonere (M.S ) during the academic year 2023-
24. To the best of my knowledge and belief this work has not been
submitted elsewhere for the award of any other degree.

Date: Guide
Place: Dhule Prof. R. V. Patil

Head Principal
Prof. Dr. B. R. Mandre Dr. Hitendra D. Patil

III
ACKNOWLEDGEMENT
This Seminar I report has taken its current shape after a lot of hard work and
perseverance-not only just by me. I would like to express our sincere gratitude for
the assistance and support of a number of people who are helping to make this
success.
Immeasurable appreciation and deepest gratitude are extended for the help and
support to Prof. R. V. Patil, my guide for his guidance and enlightening comments
throughout the seminar work. It has been an altogether different experience to work
with him and I would like to thank for his helpful suggestion and numerous
discussions. I gladly take this opportunity to thank Prof. B. R. Mandre (Head Of
Department, Computer Engineering) and Dr. Hitendra D. Patil (Principal, SSVPS,
BSD, College of Engineering, Dhule) for providing facilities during progress of the
thesis.
I wish to express my sincere thanks to Prof. R. V. Paitl for his expert, sincere
and valuable guidance and encouragement extended to me.
I am also thankful to all those who helped us directly or indirectly to develop
this thesis and complete it successfully. Then I would like to thank all the Staff for
their encouragement. They had always been very prompt at extending in their
helping hand and sharing valuable technical knows. Special thanks to my Family
and Friends.

Gajanan Patil

IV
ABBREVIATIONS

Abbreviations Details

CRM Customer Relationship Management


ML Machine Learning
DL Deep Learning
NN Neural Network
ANN Artificial Neural Network
NLP Natural Language Processing
RNN Recurrent Neural Network
DFD Data Flow Diagram

V
Table of Contents
Page
Sr.No No
List of Tables X
List of Figures IX
1 INTRODUCTION………………………………………………………..... 1
1.1 Introduction………………………………………………………...... 1

1.2 Background………………………………………………………...... 2
1.3 Churn in Telecom Industry………………………………………...... 3
1.4 Machine Learning Approach………………………………………… 5
1.5 Motivation…………………………………………………………… 9
1.6 Scope…………………………………………………………………. 10
1.7 Goals…………………………………………………………………. 10
1.8 Organization of Thesis………………………………………………. 10
1.9 Summary…………………………………………………………...... 10
2 LITERATUR REVIEW……………………………………………………. 11
2.1 Literature Survey…………………………………………………...... 11
2.2 Summary of the Literature…………………………........................... 15
2.3 Problem Statement and Objective…………………………………… 17
2.4 Summary…………………………………………………………….. 18
3 METHODOLOGY…………………………………………………………. 19
3.1 System Analysis……………………………………………………... 19
3.1.1 Existing Algorithms………………………........................... 20
3.1.2 Limitations of previous algorithm…………………………. 20
3.1.3 Analysis of the problem……………………………………. 20
3.1.4 Proposed System…………………………………………….21
3.2 Software Requirement Specification…………………………………21
3.2.1 Introduction………………………………………………… 21
3.2.2 Over all Description…………………………………………22
3.2.3 External interface requirements……………………………. 22
3.2.4 Software requirements……………………………………… 23
3.2.5 Hardware requirements…………………………………….. 23
3.3 Applications…………………………………………………………. 24
3.4 Future Scope…………………………………………………………. 24
3.5 System Architecture…………………………………………………. 24
3.5.1 Proposed architecture………………………………………. 24
3.5.2 Modules…………………………………………………...... 26
VI
3.6 UML Diagrams……………………………………………………… 26
3.6.1 Use case Diagram………………………………………….. 27
3.6.2 Activity Diagram……………………………………………28
3.6.3 Class Diagram……………………………………………… 29
3.6.4 Data Flow Diagram………………………………………… 30
3.6.5 Component Diagram………………………………………. 32
3.6.6 Sequence Diagram…………………………………………. 33
3.7 Summary……………………………………………………………. 33
4 IMPLEMENTATION……………………………………………………… 34
4.1 System model……………………………………………………….. 34
4.2 Training and Testing…………………………………………………36
4.3 Algorithms…………………………………………………………... 36
4.4 Flow chart of System………………………………………………... 39
4.5 Summary……………………………………………………………. 40
5 RESULTS AND DISCUSSION…………………………………………… 41
5.1 Experimental Setup…………………………………………………. 41
5.2 Simulation Scenario……………………………………………….... 41
5.3 Evaluation Metrics………………………………………………….. 42
5.4 Methods of Comparison……………………………………………. 44
5.5 Result Analysis……………………………………………………… 45
5.6 Summary……………………………………………………………. 52
6 ADVANTAGES…………………………………………………………… 53
7 CONCLUSION……………………………………………………………. 54
BIBILOGRAPHY ……………………………………………………………….. 55
PUBLICATION…………………………………………………………………....... 57
APPENDIX………………………………………………………………………... ...58

VII
FIGURE INDEX
Figure 1.1 : Machine learning approach……………………………………………..6
Figure 1.2: Supervised Learning approach………………………………………… 7
Figure 1.3 : Semi-Supervised Learning approach…………………………………. 7
Figure 1.4: Un-Supervised Learning approach……………………………………. 8
Figure 1.5 : Reinforcement Learning approach……………………………………. 9
Figure 3.1: Block diagram of Architecture…………………………………………. 25
Figure 3.2: Use case Diagram………………………………………………………. 27
Figure 3.3: Activity Diagram for User……………………………………………. 28
Figure 3.4: Class Diagram………………………………………………………… 29
Figure 3.5: DFD Level 0 Diagram………………………………………………….. 30
Figure 3.6: DFD Level 1 Diagram………………………………………………….. 31
Figure 3.7: Component Diagram…………………………………………………. 32
Figure 3.8: Sequence Diagram……………………………………………………. 33
Figure 4.1: Flow chart of System…………………………………………………… 39
Figure 5.1 : Comparitive analysis of various classification algorithms……………. 45
Figure 5.2 : Classification Report of Random Forest………………………………. 45
Figure 5.3 : Performance evaluation of Random Forest classification…………….. 46
Figure 5.4 : Discrimination threshold evaluation of Random Forest classification.. 46
Figure 5.5 : Classification Report of DT classification…………………………….. 47
Figure 5.6 : Performance evaluation of DT classification………………………….. 47
Figure 5.7 : Discrimination threshold evaluation of DT classification…………….. 48
Figure 5.8 : Classification Report of Bagging classification……………………….. 49
Figure 5.9 : Performance evaluation of Bagging classification……………………. 49
Figure 5.10 : Discrimination threshold evaluation of Bagging classification……… 50
Figure 5.11: Classification Report of K-Neighbors………………………………… 51
Figure 5.12 : Performance evaluation of K-Neighbors classification………………. 51
Figure 5.13 : Discrimination threshold evaluation of K-Neighbors classification… 52

VIII
TABLE INDEX
Table 1: Summary of Literature survey…………………………………… 15
Table 2: Testing Parameter for Algorithm…………………………………42
Table 3: Confusion Matrix Analysis……………………………………… 43
Table 4: Comparative analysis of various classification algorithms……… 44

IX
ABSTRACT
Customer churn is a major problem and one of the most important concerns for large
companies. Due to the direct effect on the revenues of the companies, especially in the
telecom field, companies are seeking to develop means to predict potential customer to
churn. Therefore, finding factors that increase customer churn is important to take
necessary actions to reduce this churn. The main contribution of our work is to
develop a churn prediction model which assists telecom operators to predict
customers who are most likely subject to churn. The model developed in this work uses
machine learning techniques on big data platform and builds a new way of features’
engineering and selection. In order to measure the performance of the model, this
work also identified churn factors that are essential in determining the root causes of
churn. By knowing the significant churn factors from customers' data, CRM can
improve productivity, recommend relevant promotions to the group of likely churn
customers based on similar behavior patterns, and excessively improve marketing
campaigns of the company. The proposed churn prediction model is evaluated using
metrics, such as accuracy, precision, recall, f-measure, and receiving operating
characteristics (ROC) area. Furthermore, it also provides factors behind the churning
of churn customers through the rules generated by using the attribute-selected
classifier algorithm.
Key terms: Receiving Operating Characteristics, Deep learning, Convolution Neural
Network, churn prediction, Feature selection.

X
Chapter - 1
INTRODUCTION
This chapter introduces churn prediction model and different approaches for churn prediction
model. It explains churn in telecom industry and their causes and also introduces different
machine learning techniques like supervised learning, unsupervised learning, Reinforcement
learning and also analyze which approaches is best for churn prediction model.

1.1 Introduction
Consumers today go through a complex decision making process before subscribing to any
one of the numerous Telecom service options. The services provided by the Telecom
vendors are not highly differentiated and number portability is commonplace. The mobile
telephone industry churn is the similar problem [2] [9] [12]. Customer loyalty becomes an
issue. Hence, it is becoming increasingly important for telecommunications companies to
proactively identify factors that have a tendency to unsubscribe and take preventive
measures to retain customers. To calculate your probable monthly churn, start with the
number of users who churn that month. Then divide by the total number of user days that
month to get the number of churns per user day. Then multiply by the number of days in the
month to get your resulting monthly churn rate. It is found that data mining techniques are
more effective in predicting consumer churn from the research conducted over the past few
years [17]. Creating an efficient churn prediction model is an essential activity requiring a lot
of work right from determining appropriate predictor variables (features) from the large
volume of available customer data to choosing an effective predictive data mining technique
suitable for the feature set.
The A multi-layer perceptron approach for customer churn prediction has used in
[14] for customer-related data such as customer profiling, calling pattern, and democratic
data in addition to the network data they generate. Based on the customer‘s history of calling
behavior and behavior, there is a possibility to classify their attitude of either going away or
not. Data mining techniques are found to be more effective in predicting churn from the
research done over the past decade. The predictive modeling techniques in churn prediction
are also considered to be more accurate. Churn prediction systems and sentiment analysis
using classification as well as clustering techniques to classify churn customers and the
reasons behind the churning of telecom customers [18].
1
In telecom industry should we generate large amount of data on daily basis, it is very
tedious task to mine such a kind of last data using specific data mining techniques, while
hard to interpret the prediction on classical techniques. Sometime such telecommunication
data may be containing some churn and, it is much necessary to identify search problems.
Big companies implement churn prediction models to be able to detect possible churners
before they effectively leave the company [16].

1.2 Background
It is found that data mining techniques are more effective in predicting consumer churn from
the research conducted over the past few years [3]. Creating an efficient churn prediction
model is an essential activity requiring a lot of work right from determining appropriate
predictor variables (features) from the large volume of available customer data to choosing
an effective predictive data mining technique suitable for the feature set. Telecom Industries
collect a large amount of customer-related data such as customer profiling, calling pattern,
and democratic data in addition to the network data they generate [4]. Based on the
customer‘s history of calling behavior, there is a possibility to classify their attitude of either
going away or not.

Data mining techniques are found to be more effective in predicting churn from the
research done over the past decade. The predictive modeling techniques in churn prediction
are also considered to be more accurate. Churn prediction systems and sentiment analysis
using classification as well as clustering techniques to classify churn customers and the
reasons behind the churning of telecom customers. In telecom industry [7] should we
generate large amount of data on daily basis, it is very tedious task to mine such a kind of
last data using specific data mining techniques, while hard to interpret the prediction on
classical techniques. Various researchers already described search a work to eliminate churn
from large data sets fusion static as well as dynamic approaches, but still such systems are
facing many problems actual identification of churn. Sometime such telecommunication data
may be containing some churn and, it is much necessary to identify search problems. To
successful identification of churn from large data is providing effectiveness to customer
relationship management (CRM) using various soft computing techniques in e.g. genetic
algorithms, adaboosting etc. [9].

In this research we proposed churn identification as well as prediction from large


scale telecommunication data set using Natural Language Processing (NLP) and machine
2
learning techniques to dealing with big data [10]. First system deals with strategic NLP
process which contains data pre-processing, data normalization, feature extraction and
feature selection respectively. Feature extraction techniques have been proposed like TF-
IDF, Stanford NLP and occurrence correlation techniques. Where machine learning
classification algorithms are has used to train and test the entire module. Neuro fuzzy
algorithms are used to divide the subscribers into discrete classes based on their input
attributes [11].

Using these classes, the Adaptive Neuro Fuzzy Inference System (ANFIS) is used to
develop a sensitive prediction model for churn management [1].

1.3 Churn in Telecom Industry


Telecommunications companies [7] are usually not the most popular companies with
consumers and Customer loyalty is the key to profitability in the telecom industry. People
often express frustration with the performance of service providers; whether it's complicated
billing, spam marketing emails, difficult customer service, internet speed, connectivity, or
high plans. As a result, it is not surprising to learn that telecommunications companies have a
high customer churn rate. Because telecom providers manage large fixed infrastructures that
must be offset by revenue, customer churn (attrition) is particularly problematic in this
industry [12].
Companies usually have a greater focus on customer acquisition and keep retention
as a secondary priority. However, it can cost five times more to attract a new customer than
it does to retain an existing one. Increasing customer retention rates by 5% can increase
profits by 25% to 95%, according to research done by Bain & Company [2].
Churn is a metric that shows customers who stop doing business with a company or a
particular service, also known as customer attrition. By following this metric, what most
businesses could do was try to understand the reason behind churn numbers and tackle those
factors, with reactive action plans [10].
The reasons that lead customers to the cancellation decision can be numerous, coming
from poor service quality, delay on customer support, prices, new competitors entering the
market, and so on. Usually, there is no single reason, but a combination of events that
somehow culminated in customer displeasure. If your company was not capable to identify
these signals and take actions prior to the cancel button click, there is no turning back, your
customer is already gone. But you still have something valuable: the data. Your customer left
very good clues about where you left to be desired. It can be a valuable source for
3
meaningful insights and to train customer churn models. Learn from the past, and have
strategic information at hand to improve future experiences, it‘s all about machine learning.
When it comes to the telecommunications segment, there is great room for opportunities.
The wealth and the amount of customer data that carriers collect can contribute a lot to shift
from a reactive to a proactive position. The emergence of sophisticated artificial intelligence
and data analytics techniques further help leverage this rich data to address churn in a much
more effective manner.

What is Churn Rate? What is the cause for customer churn in the Telecom
Industry?

The churn rate, also known as the rate of attrition or customer churn, is the rate at which
customers stop doing business with an entity [4]. It is most commonly expressed as the
percentage of service subscribers who discontinue their subscriptions within a given time
period. The churn rate in developing markets ranges from 20% to 70%. In some of these
markets, more than 90% of all mobile subscribers are on prepaid service. Some operators in
developing markets lose in aggregate their entire subscriber base to churn in a year [5].

How Business Intelligence can reduce churn in the telecom industry?

In the present world, Business Intelligence is helping businesses and organizations ask and
answer questions about their data. It is helping companies make better decisions by showing
present and historical data within their business context. With the availability of BI self-
service tools capable of helping companies understand performance from various angles so
they can then take action to drive better business outcomes on big data [10]. These tools can
mine massive datasets for performance insights relevant to customer churn, and then push
them to the attention of marketers, customer service managers, and executives so they can
factor these findings into subsequent decisions.

4
Rich customer data availability in Telecom Sector

Telecom providers both Communication Service Providers (CSPs) and content providers
have a unique opportunity to access rich customer data that isn‘t available to many other
industries. This is due to the nature of their products/services and the visibility they have to
the end-to-end supply chain of communication services. They can see content and service
usage through web services and centralized systems [9]. By accessing data from cell towers
and deployed infrastructure, companies can add a location dimension to the data. Reaching
into individual consumer devices, these companies gain visibility to the last mile of the
supply chain and can access data about the types of users/viewers of their services and
telemetry on end-user service performance [12].

Usage of Predictive Analytics to derive actionable insights


The real-time data that is being gathered from multiple sources like the call logs, call
records, network performance, live network data can be collected. We can correlate this data
with the customer preferences, usage history, complaints or calls received to the call center,
and customer segments from the Back end systems. Using Predictive Analytics, we can
predict the possible impact of the network events in these events.
This actionable insight will help the Telco’s to avoid the potential risk associated with the
customer experience. Application of predictive statistical models such as Logistic
Regression, Market Basket Analysis, and Exploratory Data Analysis can help to identify the
best possible option like [10].
Analyzing the data from the customer can provide a 360-degree comprehensive view
of the available pieces of information. This can help in personalized service offering to the
customer to retain them for a longer time. The data about a single customer is interesting but
may not be very actionable [4]. By analyzing the data of all customers’ telecom companies
can identify trends, patterns, and conduct correlation analysis to understand what factors
drive service usage behavior and influence customer satisfaction.

1.4 Machine Learning Approach


The more general framework of text classification or prediction assigns some actual output
to a given input text for categorization and aggregation. Sequence marking that assigns the
category to of component of a sequence of numbers (PoS tagging, which attributes a form of

5
a word from each term in an input text); parsing, that assigns a distance matrix to a feature
vector, defining the linguistic meaning of the sentence; etc. [17]. Other examples include
estimation, which assigns a productive capacity to each input; stochastic classification is a
general subset of classification. To find the best classification for a given case, algorithms of
this type use statistical analysis. Probabilistic formulas output a likelihood of the example is
a participant of one of the training images, unlike other equations, which output a 'best' class.
Usually, the best class would then be chosen as the one with the highest likelihood. Such an
implementation, however, has substantial advantages over – anti classification models.

Figure 1.1: Machine learning approach

As listed below, machine learning persists in the following 4 models:

Supervised learning
Supervised learning is the methodology of artificial intelligence that operates on labeled data
and maps team obtained with train and test instances. As trained knowledge is labeled and
properly categorized here, it is, therefore, a regulated process and conducted under
observation. The supervised technique (also known as the probabilistic activation method)
uses co-occurrence association rule mining to find categories, similar to the first method
[17].

6
Figure 1.2: Supervised Learning approach

Semi-Supervised Learning

Semi-supervised has become a machine-learning activity that small quantities of labeled data
can be used, including some unlabeled data. The mixture of different classifiers is also a
variation. Semi-supervised learning objective to train unlabeled data using a labeled data set.
[17]

Figure 1.3: Supervised Learning approach

7
Unsupervised Learning

The most frequently used computer vision strategy, where correlations are discovered, and
grouping techniques ha used for unsupervised classification. It operates on unlabeled
information specifically concerned about giving to the machine with no independent
variables the input vector or cluster. The suggested unsupervised technique (dubbed the
spreading activation method) learns relevant rules between notional words (defined as the
words in the sentence after deleting stop words and low frequency words) and the
considered categories using co-occurrence association rule mining in a similar fashion
[17].
In unsupervised classification data, the system explicitly operates on the given data
or repository with some way to succeed, neither marked nor labeled. It is not controlled.
Since the output variable is uncertain, uncontrolled learning can manage more complex
tasks than reinforcement methods.

Figure 1.4: Un-Supervised Learning approach

Reinforcement Learning

Reinforcement learning operates based on steps of reward and penalties. This can be seen
as how we can benefit from their actions. Either qualifying action may give us the
incentive for desired performance in a given context, or it may merit a violation based on
performed errors.

8
The agent learns how to focused intervention on his behavior in a given context. In the
given case, the agent must properly analyses the things and get away from the penalties by
doing the right things.

Figure 1.5: Reinforcement learning approach

The well-being diagnostic system predicts the disease using the neural
classification approach based on the suggested fuzzy theory. This section has a sub-
component called the severity section, which is responsible for breaking the degree of
severity [8] [11] [18]. The user information is eventually categorized as ordinary and
affected by the infection. Smart fuzzy criteria are used in the expert system to decide on
choices about rehabilitative documents. The exploratory findings indicate that the
application of the work carried out overcomes the existing traditional classification
mechanisms.

1.5 Motivation

• Having the ability to accurately predict future churn rates is necessary because it helps
your business gain a better understanding of future expected revenue [7].
• Predicting churn rates can also help your business identify and improve upon areas
where customer service is lacking [4].
• With this research we proposed a churn for telecom sector using machine learning to
eliminate future revenue losses [12].

9
1.6 Scope
It is our intention to collect data from the first popular online Customer reviews website for
churning predictions [3] [4]. Predict future Churn Prediction using machine learning
algorithms. The system can work in a stable and real time environment and can predict the
best accuracy [5].

1.7 Goal
 In different category class labels to categorize online customer reviews.

 Identify customer-changing behavior patterns during forecasting

1.8 Organization of Thesis


The rest of the thesis is organized as follows:
Chapter 2 provides a complete overview of the background research that was carried in
the various topics required to achieve the thesis goal. A deep literature is done. All the
related work is mentioned in this chapter. Chapter 3 provides a description on system
analysis.
Detail architecture explanation of the developed system with architecture and block
diagram. Even the details of modules with UML diagram are explained about developing
process of system.
Chapter 4 includes implementation phase of project. Chapter 5 presents the result and
discussion for the developed system. The performance of the model is compared. The Graph
and the relevant output images are provided .Chapter 6 gives advantage of developed
proposed system and Chapter 7 concludes the thesis.

1.9 Summary
This chapter describes a basic idea of churn and problem of churn in telecom industry.
Reducing churn is more important than ever, particularly in light of the telecom industry's
growing competitive pressures. At the present stage, many operators have not taken the steps
required to build a strong analytical foundation for successfully establishing a truly
aspirational mandate for data-based decision-making or capitalize on analytical insights. The
companies that move quickly will be best positioned for success in the future.
10
Chapter - 2
LITERATURE SURVEY
This chapter gives the details of various abstractive summarization techniques. It also gives
the literature survey for the abstractive summarization. Literature review helps to summarize
and synthesize the arguments and ideas of existing knowledge in a particular field without
adding any new contributions.

2.1 Literature Review


Many methods such as machine learning and data mining are used for churn prediction. The
decision-tree algorithm is a reliable method for churn prediction [6]. In addition, a neural
network method [7], data certainty [8], and particle swarm optimization [9] are used for
churn prediction.
According to system [2] a current collection of software to increase the standard of
detecting possible churners. The roles are extracted from request information and client
accounts and are classified as deal, request pattern and call pattern adjustments overview
functionality. The characteristics are evaluated using two probabilistic data mining
algorithms from Naïve Bayes and Bayesian Network, and their findings compared to those
obtained by the use of C4.5 decision tree, an algorithm widely used in many classification
and prediction tasks.
According to [3] formalization of time-window of the collection process, coupled
with literature review. Second, by expanding the duration of consumer events from one to
seventeen years using logistic regression, classification trees and bagging together with
classification trees, this analysis analyzes the rise in churn model accuracy. The practical
result is that researchers may substantially reduce the data-related pressures, such as data
collection, preparation, and analysis.
According to [4] the most efficient consumer engagement strategies can be used to
high the client satisfaction level efficiently. The study indicates a Multilayer Perceptron
(MLP) neural network method to estimate client turnover in one of Malaysia's leading
telecommunications firms. The results were contrasted with the most traditional churn
prediction strategies such as Multiple Regression Analysis and Analyzing Logistic
Regression.

11
The maximal neural network architecture includes 14 input nodes, 1 concealed node
and 1 output node with the learning algorithm Levenberg Marquardt (LM). Multilayer
Perceptron (MLP) neural network approach to predict client churn in one of the leading
telecommunications companies in Malaysia compared to the most common churn prediction
techniques, such as Multiple Regression Analysis and Logistic Regression Analysis.
In system [5] on creating an efficient and descriptive statistical churn model utilizing
a Partial Least Square (PLS) approach focused on strongly associated intervals in data sets.
A preliminary analysis reveals that the proposed model provides more reliable results than
conventional forecast models and recognizes core variables in order to better explain
churning behaviors. Additionally, network administration, overage administration and issue
handling approaches are introduced in certain simple marketing campaigns and discussed.
Burez and Van den Poel [6] Unbalance data sets studies in churn prediction models, and
contrasts random sampling performance, Advanced Under-Sampling, Gradient Boosting
Method, and Weighted Random Forest. The concept was evaluated using Metrics (AUC,
Lift). The study shows that the methodology under sampling is preferable to the other
techniques evaluated.
Brandusoiu [7] describes an innovative data mining method to explain the broad dataset type
of consumer churn detection. About 3500 consumer details is analyzed based on incoming
number as well as outgoing input call and texts. Specific machine learning algorithms were
used for training classification and research, respectively. The system's estimated average
accuracy is about 90 percent for the entire dataset.
He et al. [8] with approximately 5.23 million subscribers, a major Chinese
telecommunications corporation developed a predictive model focused on the Neural
Network method to address the issue of consumer churn. The average degree of precision
was the extent of predictability of 91.1%.
Idris [9] suggested a genetic engineering solution to modeling AdaBoost-churning
telecommunications problems. Two Standard Data Sets verified the series. With a precision
of 89%, one from Orange Telecom and the other from cell2cell and 63% for the other one.
Huang et al. [10] the customer churn studied on the big data platform. The
researchers ' aim was to show that big data significantly improves the cycle of churn
prediction, based on the quantity, variety and pace of the data. A broad data repository for
fracture engineering was expected to accommodate data from the Project Support and

12
Business Support Department at China's biggest telecommunications firm. AUC used the
forest algorithm at random and assessed.
According to [11] with k-means and fuzzy c-means clustering algorithms are
clustered input features to place subscribers in separate discrete groups. The Adaptive Neuro
Fuzzy Inference System (ANFIS) is introduced using these classes to construct a predictive
model for active churn management. The first prediction step begins with Neuro fuzzy
parallel classification. FIS then takes Neuro fuzzy classifiers outputs as input to decide on
churners activities. Measurements of success can be used to recognize inefficiency problems.
Churn management metrics are associated with customer service network services,
operations, and efficiency. GSM number versatility is a vital criterion for churner‘s
determination.
In System [12] a New set of apps to improve the identification level of potential
churners. The features are derived from call details and customer profiles and are categorized
as description features related to contract, call pattern, and call pattern changes. The features
are tested using two Naïve Bayes and Bayesian Network probabilistic data mining
algorithms and their results compared to those obtained from the use of C4.5 decision tree,
an algorithm commonly used in many classification and prediction tasks. These have
contributed, among other factors, to the risk that customers can easily switch to competitors.
One of the techniques that can be used to do this is to improve churn prediction from large
amount of data with extraction in the near future.
According to [13] Formalization of the selection method in time window, along with
analysis of literature. Second, this study analyzes the increase in churn model consistency by
extending the history of customer events from one to seventeen years using logistic
regression, classification trees and bagging along with classification trees. The functional
consequence is that researchers, such as data storage, planning and research, can
significantly reduce data-related burdens. The amount that consumers have to pay depends
on the subscription's duration and pro-motional sense. A letter is sent by the newspaper
company to remind them that the subscription is coming to an end. Then ask them if they
want to renew their subscription, along with guidance on how to do that. Customers are
unable to cancel their subscription and have a grace period of four weeks once they have
subscribed lapsed.
According to [14] the most effective customer retention techniques should be used to
effectively reduce customer turnover rates. The research suggests a neural network approach
for Multilayer Perceptron (MLP) to predict customer churn in one of Malaysia's leading
13
telecommunications firms. The findings were compared with the most common techniques
of churn prediction such as Multiple Regression Analysis and Logistic Regression Analysis.
The optimal configuration of the neural network contains 14 input nodes, 1 hidden node and
1 output node with Levenberg Marquardt (LM) learning algorithm. Multilayer Perceptron
(MLP) neural network approach to predict client churn in one of the leading
telecommunications companies in Malaysia most common Analysis and Logistic Regression
Analysis.
In system [15] on Building a predictive churn model that is accurate and concise
using a Partial Least Square (PLS) methodology based on highly correlated data sets
between variables. A preliminary experiment shows that the model presented provides more
accurate performance than traditional models of prediction and identifies key variables to
better understand churning behaviors. Additionally, there is a range of basic churn marketing
strategies— system management, overage management, and complaint management
strategies is presented and discussed.
Burez and Van den Poel [16] studied the problem of unbalance datasets in churn
prediction models and compared performance of Random Sampling, Advanced Under-
Sampling, Gradient Boosting Model, and Weighted Random Forests. They used (AUC, Lift)
metrics to evaluate the model. The result showed that under sampling technique
outperformed the other tested techniques.
Schouten et al. [17] presented an advanced methodology of data mining to predict
churn for prepaid customers using dataset for call details of 3333 customers with 21 features,
and a dependent churn parameter with two values: Yes/No. Some features include
information about the number of incoming and outgoing messages and voicemail for each
customer. The author applied principal component analysis algorithm ―PCA‖ to reduce data
dimensions. Three machine learning algorithms were used: Neural Networks, Support
Vector Machine, and Bayes Networks to predict churn factor. The author used AUC to
measure the performance of the algorithms. The AUC values were 99.10%, 99.55% and
99.70% for Bayes Networks, Neural networks and support vector machine, respectively. The
dataset used in this study is small and no missing values existed.
Karahoca [18] proposed a model for prediction based on the Neural Network
algorithm in order to solve the problem of customer churn in a large Chinese telecom
company which contains about 5.23 million customers. The prediction accuracy standard
was the overall accuracy rate, and reached 91.1%.

14
Kamalraj [19] proposed an approach based on genetic programming with
Adaboosting to model the churn problem in telecommunications. The model was tested on
two standard data sets. One by Orange Telecom and the other by cell2cell, with 89%
accuracy for the cell2cell dataset and 63% for the other one.

2.2 Summary of the Literature


We identify the some research gap based on entire literature review that are mention
in below
 Various researchers have already done the churn prediction model but most of the
system having accuracy issues due to imbalanced data.
 Sometimes data contain mini miss-classified instances it is hard classifies by supervised
learning algorithm.
 Most of the systems have used structured data so there is no scope for NLP feature
extraction.
 Hard to detect according to user wise based on sentiment analysis.
 Low churn prediction.
 Time complexity very high due to RF generates and extracts redundant features
 High error rate of system.

Table 1: Summary of Literature Survey


Extracted
No Technique Dataset Features Research Gap
System reflects good
1 x-Means GSM operation Some value-added accuracy
Clustering data, 24,900 services and some on structured dataset only.
algorithms values added
and customers 22 services
attributes
Neuro Fuzzy Turkey
algorithm [1] dataset

High error rate to detect


2 Naïve Bayes, European Contract, usage actual
operator Churn due to redundant
Decision 106,405 pattern patterns, and features.

15
Tree[2] customers 112 calls pattern
attributes

Heterogeneous dataset
3 Neural Unknown Demographic, Value tedious to
handle in similar
network, 129,892 added, usage pattern patterns
Regression
[3] customers 113 Environments.
attributes

High space complexity


4 Neural Unknown, 169 Demographic, Billing generate
network, customers 10 data, usage pattern, in each layer
Regression
[4] attributes customer relationship

Redundant features
5 Stepwise Cell2Cell Dataset Behavioral should be
Generating high error
Variable 100,000 information, Customer rate.
Selection customers 171 care and
partial least attributes Demographics
squares [5]

It works only define


6 Artificial ML Dataset of Demographics, Usage statically
Neural UCI 2,427 user‘s pattern, Value added Parameters.
Network
[6] information with Services
20 attributes

Language influence
7 Binomial Iranian Telco Demographic, call should be
generate irrelevant
Logistic operator 3150 usage pattern, features
Regression customers 15 customer care service Vector.
model [7] Attributes

High error rate during


8 Generalize Belgian 134, 120 Demographic Usage unknown
Additive customers 27 patter, bill and Text prediction.

16
models (GAM) Attributes payment
[8]

Its works only


9 Logistic Polish mobile Demographic, call synthetic data
only and high data
Regression operator 122098 data records, customer reduction
Decision tree customers 1381 care services Rate.
[9] Attributes

Behaviors
information
10 Decision tree Cell2Cell Dataset Behavioral data, of generate
the churn
possibility
as well as 100,000 customer care and sometime it
Generate false
Machine customers 171 feature information ratio.
learning [10] Attributes
algorithms has
used.

2.3 Problem Statement and Objective

Problem Statement

In the proposed research work to measure and identify the churn using text analysis using
NLP and machine learning classifier. To identify the customer changing behavior pattern
during prediction. To identify the factor which mostly influence to reduce accuracy of churn
prediction? To evaluate and calculate churn rate for month wise as well as day wise, which
useful for enhance the service quality of system

Objectives

 To design and develop an approach for Churn Prediction with Sentiment Analysis on
customer reviews large dataset.
 To implement proposed system with various feature extraction as well as selection
techniques and evaluate the performance analysis of system.

17
 To validate the proposed system with respective machine learning algorithm and deploy
on real time environment.
 To explore and validate the proposed system comparative analysis on various dataset with
classification accuracy

2.4. Summary
A lot of research has being done in the phenomenon of churn prediction of customer reviews.
Various techniques have been studied and analyzed. The work of various researchers has been
tabulated on the basis of their techniques, framework, dataset, evaluation metrics,
performance and limitations. By analyzing the survey we can conclude that a lot of work has
been done and some are still working on it as there are still many areas which are need to be
studied and applied.

18
Chapter - 3
METHODOLOGY

This chapter gives the description of the system in detail. It gives details information about
proposed system with the benefits and architecture of the model. It also explains the existing
system of churn prediction model and limitations and the problems related to existing
system.

3.1 System Analysis

It is found that data mining techniques [8] [12] are more effective in predicting consumer
churn from the research conducted over the past few years. Creating an efficient churn
prediction model is an essential activity requiring a lot of work right from determining
appropriate predictor variables (features) from the large volume of available customer data to
choosing an effective predictive data mining technique suitable for the feature set. Telecom
Industries collect a large amount of customer-related data such as customer profiling, calling
pattern, and democratic data in addition to the network data they generate. Based on the
customer‘s history of calling behavior and behavior, there is a possibility to classify their
attitude of either going away or not.
Data mining techniques are found to be more effective in predicting churn from the
research done over the past decade [15]. The predictive modeling techniques in churn
prediction are also considered to be more accurate. Churn prediction systems and sentiment
analysis using classification as well as clustering techniques to classify churn customers and
the reasons behind the churning of telecom customers [11]. In telecom industry should we
generate large amount of data on daily basis, it is very tedious task to mine such a kind of
last data using specific data mining techniques, while hard to interpret the prediction on
classical techniques. Various researchers already described search a work to eliminate churn
from large data sets fusion static as well as dynamic approaches, but still such systems are
facing many problems actual identification of churn. Sometime such telecommunication data
may be containing some churn and, it is much necessary to identify search problems. To
successful identification of churn from large data is providing effectiveness to customer
relationship management (CRM) [10].

19
In today‘s computer environment writing comments to churn more frequently while
voice mail plan customers can disposed to churn less frequently. Customers with four or
more customer service calls churn as often as other customers churn more than four times.
We calculate the average churn rate during model training using different machine learning
approaches and evaluate the for testing [5].
To maximize the organization's sales, as we suggested in our study, predicting
accuracy churn is very critical. The cost of making an excessive retention effort (false
positives) and the cost of losing a customer because the model does not accurately anticipate
churn can be reduced by combining the customer lifetime value with the churn prediction
(false negatives) [19].
3.1.1 Existing Algorithms

According to [1] Clustering algorithms are clustered input functions with k-means and fuzzy
c-means to position subscribers in independent, distinct classes. Using these groups the
Adaptive Neuro Fuzzy Inference Framework (ANFIS) is implemented to construct a
predictive model for successful churn management. The first step towards prediction starts
with the parallel classification of Neuro soft [18]. FIS then uses the outputs of Neuro fuzzy
classifiers as feedback to settle on the behaviors of the churners. Progress metrics can be
used to identify issues of inefficiency. Churn reduction indicators are concerned with the
facilities, processes and performance of customer support network. Versatility of GSM
numbers is a critical criterion for churner’s determination
3.1.2 Limitations of previous algorithm

 System reflects good accuracy on structured dataset only.


 One disadvantage of the some soft computing methods are that the complexity of the
algorithm is high when there are more than a number of inputs fed into the system..

3.1.3 Analysis of the problem

The algorithm's main goal is to create a system that produces highly fixable results with
exceptional precision. The machine learning algorithm in use seeks to accomplish the same
thing. The input of the system can be of size or resolution. It does not depend on the
operating system. The dataset here are trained and tested. In the proposed research work to
design and develop an approach for churn prediction using NLP and machine learning
approaches to enhance the system accuracy [8] [17]. It is very important for making the data
20
useful because noisy data can lead to poor results. In telecom dataset, there are a lot of
missing values, incorrect values like ``Null'' and imbalance attributes in the dataset. In our
dataset, the number of features is 29. We analyzed the dataset for filtering and reduced the
number of features so that it contains only useful features.
3.1.4 Proposed System

In the proposed research work to design and develop an approach for churn prediction using
NLP and machine learning approaches to enhance the system accuracy. Then we identify the
customer changing behavior pattern during prediction [8]. We also evaluate the factor which
mostly influences to reduce accuracy of churn prediction and finally evaluate and calculate
churn rate for month wise as well as day wise, which useful for enhance the service quality
of system. In this research we proposed churn prediction from large scale data, system
initially deals with telecommunication synthetic data set which contains some imbalance
meta data [10]. To apply data preprocessing, data normalization, feature extraction as well as
feature selection respectively. During this execution some Optimization strategies have been
used to eliminate redundant features which sometimes generate high error rate during the
execution. The proposed system execution for training and testing. After completion both
phases system describe classification accuracy for entire data set [17].

3.2 Software Requirement Specification

The Software Requirement Specification document is explained as follows. The overall


description and overview of all components which are included in SRS document are noted
in brief.
3.2.1 Introduction

A Software requirements specification document describes the intended purpose,


requirements and nature of software to be developed. It also includes the yield and cost of
the software. A Software Requirements Specification (SRS) is describes the nature of a
project, software or application. In simple words, SRS is a manual of a project provided it is
prepared before you kick-start a project/application. A software document is primarily
prepared for a project, software or any kind of application. Create two infrastructure real-
time customer churn prediction analysis and long-term analysis of telecom sector using
engine capable of optimizing both infrastructure. Implement machine learning algorithms to

21
analyze Churn Prediction for telecom sector. This document is written in the following style
given below.
1. Font style : Times new roman
2. Heading : 16 size, Bold
3. Sub-headings: 14 size, Bold
4. Description : 12 size

3.2.2 Overall Description:

The overall description of the system includes the following points.


Product features

1. User friendly: As the architecture is simple, thus it allow the users to navigate through
the project easily
1. Maintainability: All the modules are clearly separated to allow the future development of
user interface and the system in a thoughtful and effective software engineering.
2. User interface: Users can easily load the dataset and obtain the required results.

User Documentation

A user manual will be included in the programmer to assist and guide users on how to
interact with the system and perform various duties. This paper is for any individual user
who needs to know about the system's basic architecture and standards. In the next sections,
we'll go through the main components and how they're used.

Input and Output of the System

 Input to the System: The input of the system is the CSV file where the customer churn in
the telecom sector can be estimated.
 Output of the System: The system provides the Customer Churn Prediction in Telecom
Sector.

3.2.3 External interface requirements:

The external interface includes user interface, hardware and software interface. These are
explained in detailed as follows.

22
 User Interface.

The Interface will be in the form of an application. It is designed to be functional and


minimal in its styling. Python app will be used to setup the page layout and add minimal
styling to make the interface user friendly.

 Hardware Interface.

The hardware interface required to run the system is the graphics card or the GPU (Graphical
processing Unit).As we know that a computer has enough space along with good processing
speed which is sufficient for developing for a particular projects. Apart from a standard
personal computer, a graphics card to perform large computations and parallel task with
good efficiency is needed.

 Software Interface.

The language used for developing the application is Python. The GitHub IDE is used as an
interface to deal with various python packages. Tensor flow is also required to develop
project. The operating system used can be either Windows 7

3.2.4 Software requirements:

Following is the list of hardware specification for the system:

 Operating system: Windows 7 or Ubuntu


 Internet Explorer 6.0/above
 Tool : PyCharm on words
 Programming Language: Python 2.7 onwards

3.2.5 Hardware requirements:

Following is the list of hardware specification for the system:

 Processor : Intel i5 processor


 GPU: 8 GB
 Memory : 16 GB RAM
 Hard Disk Drive : 1TB

23
3.3. Applications

 BPO centers churn prediction systems.


 Service application churns prediction systems.
 Customer behaviors mapping system using churn prediction.

3.4. Future Scope

To implement a proposed system with deep learning algorithm to achieve better accuracy, as
well as the input data contains large size and volume, if we deal the proposed systems with
HDFS framework and parallel machine learning algorithm which will provide better result in
low computation cost

3.5 System Architecture

In the proposed research work to design and develop an approach for churn prediction using
NLP and machine learning approaches to enhance the system accuracy. Then we identify the
customer changing behavior pattern during prediction [4]. We also evaluate the factor which
mostly influences to reduce accuracy of churn prediction and finally evaluate and calculate
churn rate for month wise as well as day wise, which useful for enhance the service quality
of system. In this research we proposed churn prediction from large scale data, system
initially deals with telecommunication synthetic data set which contains some imbalance
Meta data. To apply data preprocessing, data normalization, feature extraction as well as
feature selection respectively [17]. During this execution some Optimization strategies have
been used to eliminate redundant features which sometimes generate high error rate during
the execution. The proposed system execution for training and testing. After completion both
phases system describe classification accuracy for entire data set

3.5.1 Proposed architecture

24
Figure 3.1 Block diagram of Architecture

System overview

The aim of this kind of research in the telecommunications industry is to help businesses
make more profit. Telecom companies have become known to forecast turnover as one of the
most important sources of income. Therefore, this research was aimed at building a system
in the Telecom Company that predicts customer churn. Such prediction models will achieve
high AUC values. The sample data was divided into 70% for training and 30% for testing to
evaluate and develop the model [9]. We chose 10-fold cross-validation for evaluating and
optimizing hyper parameters. We used engineering tools, effective function transformation
and selection approach. Making the interface fit for machine learning algorithms. Another
concern was also found: the data was not balanced. Only about 5% of the entries are
customers ' churn. A problem has been solved by under-sampling or using trees algorithms

25
that are not affected by this issue. In detecting the churn in large data and providing accurate
prediction, our different classifiers can be more accurate.
This work contributes to suggesting a supervised approach to the extraction of dimensional
categories, selecting suitable characteristics and avoiding duplication by measuring
correlation between characteristics. The results obtained show that there is a comparatively
higher f-score in the weighted frequency of the term with the correlation process. In this
regard, selecting features using weighted word frequency is more important [16]. The
overlap between features in a category of aspect is avoided by measuring the association
3.5.2 Modules

 Data Acquisition: First of all the information for different Telecom Sector Customer
based on certain parameters is extracted data.
 Preprocessing: Then we will apply various preprocessing steps such as lexical analysis,
stop word removal, stemming (Porters algorithm), index term selection and data cleaning
in order to make our dataset proper.
 Lexical analysis: Lexical analysis separates the input alphabet into,
1. Word characters (e.g. the letters a-z) and 2)
2. Word separators (e.g. space, newline, and tab).
 Stop word removal: Stop word removal refers to the removal of words that occur most
frequently in documents.
 Stemming: Stemming replaces all the variants of a word with a single stem word.
Variants include plurals, gerund forms , third person suffixes, past tense suffixes, etc.).
 Data Training: We compile artificial as well as real time using online news data and
provide training with any machine learning classifier.
 Testing with machine learning: We predict online news using any machine learning
classifier, weight calculator for real time or synthetic input data accordingly.
 Analysis: We demonstrate the accuracy of proposed system and evaluate with other
existing systems

3.6 UML Diagrams

The Unified Modeling Language (UML) gives a standard way to write a system model
covering the conceptual ideas. It can be used for modeling a system independent of a
platform language. It is a graphical language for visualizing, specifying, constructing and
documenting information about software intensive system.
26
3.6.1 Use case Diagram:

Figure 3.2 Use case Diagram

3.6.2 Activity Diagram

Activity diagram is a flow chart to represent the flow from one activity to another activity.
The activity can be described as an operation of the system. The flow can be sequential,
branched or concurrent. Here we have two activity diagrams, one for the user and the other
for the system. The purpose of activity diagrams is to capture the dynamic behavior of the
system. These are drawn as follows.

3.6.2.1 Activity Diagram for User

An activity diagram outwardly shows a progression of activities or flow of control in a


framework like a flowchart or a data flow diagram. Activity diagrams are frequently utilized
27
as a part of business process displaying. They can likewise depict the means in a utilization
case diagram.
Exercises displayed can be successive and simultaneous. In the two cases an
activity diagram will have a start and an end.

Figure 3.3 Activity Diagram for User

28
3.6.3 Class Diagram

A class diagram in the Unified Modeling Language (UML) is a sort of static structure
diagram that portrays the structure of a framework by demonstrating the framework's classes,
their characteristics, operations (or techniques), and the connections among objects.
The class diagram is the primary building piece of protest situated modeling. It is
utilized for general theoretical modeling of the precise of the application, and for point by
point modeling making an interpretation of the models into programming code. Class
diagrams can likewise be utilized for information modeling. The classes in a class diagram
speak to both the primary components, interactions in the application, and the classes to be
modified

Figure 3.4 Class Diagram

29
3.6.4 Data Flow Diagram

A data flow diagram (DFD) maps out the flow of data for any procedure or framework. It
utilizes characterized images like rectangles, circles and bolts, in addition to short content
marks, to demonstrate data inputs, yields, stockpiling focuses and the courses between every
goal. Data flowcharts can extend from straightforward, even hand-drawn process diagrams,
to top to bottom, multi-level DFDs that delve logically more profound into how the data is
taken care of. They can be utilized to break down a current framework or model another one.
Like all the best diagrams and graphs, a DFD can frequently outwardly "say" things that
would be difficult to clarify in words, and they work for both specialized and nontechnical
gatherings of people, from designer to CEO. That is the reason DFDs remain so mainstream
after such a long time

3.9.4.1 DFD Level 0

Figure 3.5 DFD Level 0 Diagram

30
3.6.4.2 DFD Level 1

Figure 3.6 DFD Level 1 Diagram

31
3.6.5 Component Diagram

Figure 3.7 Component Diagram

32
3.6.6 Sequence Diagram

Figure 3.8 Sequence Diagram

3.7 Summary
The methodology chapter involves system analysis. The architecture of the proposed system
is outlined. Modeling describes a high level of abstraction of a software system. The models
are building for better understanding of the system that is developing. Graphical
representation of static, logical, dynamic view of system and flow of execution is shown.
Different view of the system is represented using UML diagram is represented in this
chapter. Each UML diagram is designed for view a software system from a different
perspective and in varying degrees of abstraction.

33
Chapter - 4
IMPLEMENTATION
This chapter describes implementation of proposed system. It also describes the necessary
steps for churn prediction and also the details about each algorithm (Random forest,
Decision tree, Bagging Classifier and Knearest neighbor). It also explains the data set used
by algorithm for training and testing and gives the detail flow chart of the system.

4.1 System model


Data Preprocessing and Normalization
It is very important for making the data useful because noisy data can lead to poor results. In
telecom dataset, there are a lot of missing values, incorrect values like ``Null'' and imbalance
attributes in the dataset. In our dataset, the number of features is 29. We analyzed the dataset
for filtering and reduced the number.
Feature extraction

Each sentence in the training dataset undergoes preprocessing like tokenization, transforming
case (uppercase to lowercase), stop word filtering and stemming. The standard stop word
dictionary is used which is available on https://gist.github.com/larsyencken/1440509.
Stemming as well as Lemmatization these are two crucial feature normalization methods
used in the preprocessing stage. The stemming method restores all the affected words in the
text into a root form called stem words. e.g., ‗studying, studies are each converted into the
stem study’ respectively. Basically, lemmatization converts all the forms of words to its
basic lemma. For example, the terms studying, studies will be converted to lemma study. So
lemma features are considered as further accurate than stemmed features. In this
experimentation, lemmas are extracted as features and then it undergoes through feature
selection methods.

Feature selection:
In the system, various feature selection approaches are analyzed and hybrid approach for
feature selection is proposed. The feature selection strategies analyzed are:
Term frequency (TF)
In this approach features are selected based on term frequency count. Term frequency of
each feature with respect to each aspect category is calculated.

34
A threshold is set for feature selection. Features having term frequency greater than 2 are
selected in each aspect category.

Weighted Term Frequency (WTF)

In this approach, the weight of each term is calculated using (1). The weight of a term is the
conditional probability where is the occurrence count of term in aspect category and is the
total occurrence count of a term in all aspect categories. If the proportion of occurrence of a
term in aspect category is more with respect to other aspect categories, then weight of
increases. A threshold is set on weight for each aspect category. Terms (features) having
weight greater than threshold are selected to generate a binary train matrix. Weight
calculation of term is also done in Kirui et. al. [2]. This work follows a similar approach as
[2] and proposes a hybrid approach for feature selection using correlation to avoid
redundancy in features. In [2], weights are used to determine aspect category of test sentence
and in this approach, weight is used for feature selection and further to generate a binary
train matrix.
Xt,k
weight ( t ) = (4.1)
Xt

Term Frequency with Correlation Coefficient (TF+CC)

In classification, features must be relevant but should not be redundant to increase the
accuracy of the classifier. In this strategy, the term frequency matrix obtained is used.
Features obtained using this matrix are relevant but are redundant. So to avoid redundancy
correlation of each feature is calculated with other features in that aspect category. Pearson
correlation coefficient is used to calculate correlation.
C 0 weight [ t i ]=n ¿ ¿ (4.2)

Eq. (4.2) is used to calculate the correlation of each term with other terms where x and y are
vectors
Of term and resp., containing term frequency with respect to each aspect category.
Correlation
Of a term‘t’ with other terms in that aspect category is averaged. Terms having correlation
value less than equal to 0.85 are selected to generate a binary train matrix.

35
Weighted Term Frequency with Correlation Coefficient (WTF+CC)

In this approach, weighted matrix obtained in (ii) is used to generate a new matrix which
contains the weight of a term with respect to each aspect category. Eq. (4.2) is used to
calculate the correlation of each term with other terms where x and y are vectors of term
and resp. containing the weight of term tin each aspect category. Finally, a binary train
matrix is generated as mentioned in (iii).
Contribution of this work is to propose a supervised approach for aspect category extraction
which selects relevant features and avoids redundancy by calculating correlation among
features.
Obtained results show that weighted term frequency with correlation approach has
comparatively more f-score. In this, features selected using weighted term frequency are
more relevant. Redundancy among features in an aspect category is avoided by calculating
the correlation.

4.2 Training and Testing


We used a telecom sector dataset available on Kaggle.com for prediction of churn customers
as it contains data of both the customers i.e. churn as well as no churn. It contains around 21
attributes and 2767 rows with class label as churn as yes or no. The class label is the last
attribute defined in numeric value like 1 and 2.

4.3 Algorithms
4.3.1 Bagging Classifier

Input: input 1…..n all input parameters, Desired Threshold Th.

Output: Executed for output as lable.

Step 1: Read all records from database (R into DB)

Step 2: Parts []  Split(R)

Step 3:

Step 4: check (Cval with Respective threshold )

36
Step 5: T get current state with timestamp

Step 6: if(T.time > Defined Time)

Read all measure of for TP and FN

Else continue. Tot++

Step 7: calculate score = (TP *100 / Tot)

Step 8: if (score >= Th)

Generate event

end for

4.3.2 Decision Tree Classifier

Input : Selected feature of all test instances D i….n , Training database policies {T1
………….Tn }

Output: No. of probable classified trees with weight and label.

Step 1: Read (D into D[i])

VExtract features (D)


Step 2: NCount_Features(D)

Step 3: for each(c into TrainDB)

Step 4: Nc[i] ==Ext_Features(c)

Step 5: select relevant features of w= {Nc[i], N}

Step 6: Statement (w>t)

Step 9: Return Tree Insatnce { Nc[i], N, w, label}

37
4.3.3 Knearest neighbour Classifier
Input: Train_DatasetF TrF[], Test_DatesetF TsF[], Threshold T.

Output: Classified label

Step 1: Read R {All attributes} from current parameters.

Step 2: Map with train features with each sample.

Step 3: Calculate distance of train DB with same evidences


n
distance=∑ ( TrF ,TsF )
k=0

Step 4: evaluate distance> threshold

Step 5: Return the predicted label

4.3.4 Random Forest Classifier


Input: Training Rules Tr [], Test Instances Ts [], Threshold T.
Output: Weight w{0-1}
Step 1: Read each test instance from (TsInstnace from Ts)
n
Step 2: TsIns=∑ { A k … An }
k=0

Step 3: Read each train instance from (TrInstnace from Tr)


n
Step 4: TrIns=∑ { A j … … A m}
j=0

Step 5: w = WeightCalc(TsIns, TrIns)


Step 6: if (w >= T)
Step 7: Forward feed layer to input layer for output OutLayer []  {Tsf, w}
Step 8: optimized feed layer weight, Cweigt  OutLayer [0]
Step 9: Return Cweight

38
4.4. Flow chart of System

Figure 4.1 Flow chart of System

• To apply similar NLP (stop word removal, Lemmatization, feature extraction and
selection) and Machine learning algorithm on large scale data and identify the churn.
• To develop an algorithm which can work on structured semi-structured as well as
unstructured large dataset?

39
• To improve the proposed system accuracy than classical machine learning algorithms.
• In the proposed research work to measure identify the churn using text analysis using
NLP and some soft computing based machine learning classifier.
• To identify the customer changing behavior pattern during prediction.
• To identify the factor which mostly influence to reduce accuracy of churn prediction?
• To evaluate and calculate churn rate for month wise as well as day wise, which useful for
enhance the service quality of system.

4.5 Summary
This chapter includes an implementation process of the project. Explanation of system
model, desired goals of system are presented and proposed system is described in this
chapter.

40
Chapter - 5

RESULTS AND DISCUSSION

This chapter provides the results and a complete analysis of each algorithm. Confusion
matrix describes the performance of each algorithm and with the help of confusion matrix
it’s easy to analyze which algorithm is best for customer churn prediction.

5.1 Experimental Setup


This concept is divided into two parts. The first part centers around building a classifier and
the experimental analysis behind the technologies used to do so. The second section is about
using the learned model that is meant to optimization of storage and efficient resource use as
in the mobile platform. This chapter then would cover the various guidelines provided in the
course of realizing this project. Under the several APIs that arrives with Tensor Flow, one in
specific reflects the task use case. This API helps new learners, who are not acquainted with
machine learning to explore and test some conceptual methods offered by the system, to
familiarize with the world and inspire more people to tackle such a daunting topic. A
description of how this API was used to train and deploy a prototype in this project may
proceed in the next section.
For the system performance evaluation, calculate the matrices for accuracy. The
system is implemented on python framework with INTEL 2.8 GHz i7 processor and 16 GB
RAM with open source environment. After the implementation of system comparison
between numerous existing system and proposed system has evaluate. The below figures
describes GUI testing in details with data validation.

5.2 Simulation Scenario


The suggested soft computing algorithm outlines the current classification method using
machine learning algorithms. The parameters are being carried out to determine the optimal
computational efficiency which used illustrated in Table 2:

41
Table 2: Testing Parameters for the Algorithm

Serial No
Properties/Parameters
1 Data Set
2 Size of Dataset
Number of Records for training and
3 testing
4 Framework selection
5 Name of classifier
6 Epoch size
7 Activation function
8 Time required for execution
9 Accuracy of each algorithm
10 Error Rate of system
11 Elapse time of specific algorithm

 System should be providing the n number of outcomes with minimum time complexity.
 The algorithms has used all given parameters which is described in Table 2
 System collects data from local file system and the proposed data placement mechanism
can also provide the fast detection of fake account from application GUI.

5.3 Evaluation Metrics


The final systems phase is a summary of the findings, which includes the tests, the results
obtained, and the analysis and conclusions to be established. The research is accomplished
by conducting multiple analyses to investigate the suggested algorithm's performance is
based on various parameters such as data set size, dataset form, and algorithm inputs.
A deep learning model, also known as just an error matrix, is a table structure that
allows the visualization of an application's output, typically a supervised learning one, in the
machine learning field, and particularly the issue of statistical classification. Each column of
the matrix represents the instances in an actual class, while each row represents the predicted
instances in a class. The name comes from the fact that the approach makes it easy to see

42
whether two groups are confused. In supervised learning, an uncertainty matrix is a simple
tool for evaluating outcomes. It's used to describe the test result of a prediction model.
Each column of the matrix represents the instances in a predicted class, while each row
represents those classes in a class diagram. Four independent experiments were performed to
test the discriminant function for various dataset formats.

Table 3: Confusion Matrix Analysis

Confusion matrix Predicted


Negative Positive
Actual Negative TN FP
Positive FN TP

The calculation strategy for confusion matrix in implemented after experimental analysis
which is defined in below section:

 TN : It gives negative prediction for actual negative label classes,


 FP : It gives positive as prediction to all negative classes,
 FN : It given negative to all positive classes,
 TP: It gives positive prediction for positive classes.

The accuracy (Eq. 5.1) is the percentage of accurate predictions out of an overall amount of
projections.
The equation is used to measure it:

TP+TN
acc =
TP+TN + FP+ FN
(5.1)

The recommended accuracy has been estimated using the equation described, and it achieves
about 97.23% precise forecasts, which is better than all other methods.
To compare the outcomes of different experiments, the F1-score was used as an
assessment measure in this analysis. Convergent and discriminant validity are used to
measure the F1 score (Eq. 5.2). TP stands for a positive result, FP for false positive, and FN
for the negative test in Eqs. (5.3) and (5.4).

43
(5.2)

(5.3)

(5.4)
The proposed implementation has done in Windows open-source environment, python
Platform has used due to availability of open source. The file system dataset has used to
extract the data from file system application. We create various data chunks to perform
the system classification accuracy with different deep learning algorithms.

5.4 Methods of Comparison


The below analysis is the system classification graph. The graphs display how system
classify the overall inputs into classification various instances. The proposed system is
implemented with RNN combination, which gives all results with satisfactory level. For
performance evaluation 2500 instances given for training and 267 reviews given for
testing with different cross validation. Here system compares the proposed results with
two different existing systems.
Table 4: Comparative analysis of various classification algorithms

No. Method Accuracy Precision Recall F-1 score


1 Random Forest 0.95 0.95 1 0.97
2 DT 0.89 0.92 0.96 094
Bagging
3 Classifier 0.94 0.96 0.94 0.97
Knearest
4 neighbors 0.81 0.86 0.92 0.89

The above Table 4 depicts the comparative analysis of various classification algorithms
evaluation for proposed churn prediction module. The KNeighbors provides lowest
accuracy thus Random forest classification gives highest accuracy with 95% on various
cross validation. The similar results have been demonstrated in below Figure 5.1.

44
Figure 5.1: Comparative analysis of various classification algorithms

5.5 Result Analysis

Random Forest

Figure 5.2: Classification Report of Random Forest

45
Figure 5.3: Performance evaluation of Random forest classification

Confusion Matrix Shows the Actual Performance of Model. And ROC is helps to
analyze the True Positive rate and True negative Rate.

46
Figure 5.4: Discrimination threshold evaluation of Random Forest classification

Threshold plot shows the score of Precision, Recall and f1 measure with the help of this
we can analyze which algorithm is best to predict churn customer.

Decision Tree

Figure 5.5: Classification Report of DT

47
Figure 5.6: Performance evaluation of DT classification

Confusion Matrix is a table that is used to describe the performance of a classification


algorithm and ROC is helps to analyze the True Positive rate and True negative Rate .

Figure 5.7: Discrimination threshold evaluation of DT classification

Threshold plot shows the score of Precision, Recall and f1 measure with the help of this
we can analyze which algorithm is best for prediction of churn data.

48
Bagging Classifier

Figure 5.8: Classification Report of Bagging

49
Figure 5.9: Performance evaluation of bagging classification

Confusion matrix is best way to know the Actual Performance of Model. And ROC is
helps to analyze the True Positive rate and True negative Rate.

Figure 5.10: Discrimination threshold evaluation of Bagging classification

50
Threshold plot shows the score of Precision, Recall and f1 measure these terms are helps
to analyze the performance of the classifier.

51
Knearest Neighbor

Figure 5.11: Classification Report of K- Neighbor

Figure 5.12: Performance evaluation of K-Neighbors classification

Confusion Matrix Shows the Actual Performance of Model. And ROC is helps to analyze the
True Positive rate and True negative Rate.

52
Figure 5.13: Discrimination threshold evaluation of K-Neighbors classification

Threshold plot shows the score of Precision, Recall and f1 measure with the help of this we
can analyze which algorithm is best to predict churn and non-churn data.

5.6 Summary
This chapter describes the experimental data, results and their performance on the proposed
model.it also provides details of the bigml dataset used as the input dataset for the system.
The results show that the proposed model performs better than the existing methods. The
results are more precise and accurate.

53
Chapter - 6
ADVANTAGES

Accuracy: Proposed system gives highest accuracy based on real time data with multiple
classification algorithms.

Identify at-risk customers: For any business that wants to enjoy the benefits of
customer churn prediction, machine learning opens dozens of opportunities. Machine
learning is able to analyze client behavior and measure their probability of churning. In
particular, to precisely identify churn rate, machine learning algorithms can be trained to
learn the behavior patterns of clients/partners who have already canceled their contracts or
any other relationships with a particular company and compare them with the existing ones.
Then correlations between the actions of active and inactive clients are done. As a result, the
algorithm recognizes the customers that are more likely to leave.

Identify pain points: Different companies lose their clients for different reasons. In most
cases, there are numerous "pain points," which remain unknown for product owners. From
the bad quality and absent features to unpleasant design and poor customer service — there
are a lot of details which you do not take into account that your clients do. Even if your
product is almost perfect, you can still reward your new customers with some attractive
discounts and offers and ignore your loyal ones. When a business applies churn prediction,
machine learning can do analysis and forecasts based not only on customer behavior but also
on the brands.

Identify methods to implement: After the root cause of client churn has been
identified, companies can reconsider and rebuild their products and change their business
strategy accordingly. Transformed data and automated flow can be used in CRM and
marketing automation systems. However, this doesn't mean that using machine learning for
churn prediction is about building a certain model for a certain task. It is more about domain
knowledge and an ability to deliver the best possible solution based on learning data,
processes, and behavior.

54
55
Chapter - 7
CONCLUSION

This work mainly focuses on identifying and detecting churn consumers from massive data
set of telecommunications and discusses churn prediction systems produced by different
algorithms. Some systems still face problems of conversion of linguistic data, which can
occur at high error rate during execution. Many researchers have been putting forward
Natural Language Processing (NLP) techniques as well as various machine learning
algorithms such a combination is likely to generate good performance when structuring data.
Customer churn is a major problem and one of the most important concerns for large
companies. Due to the direct effect on the revenues of the companies, especially in the
telecom field, companies are seeking to develop machine learning algorithms to predict
potential customer churn. In this work Random Forest, Decision tree, Bagging Classifier and
K-nearest neighbor classifiers are employed to find out churn prediction rate. Among all
these algorithm random forest classifier gives highest accuracy of 95% as compared to
decision tree, Bagging classifier and K-nearest neighbor classifier whereas KNN classifier
has lowest accuracy of 81%.

56
BIBLIOGRAPHY
[1] Karahoca, Adem, and Dilek Karahoca. "GSM churn management by using fuzzy c-
means clustering and adaptive neuro fuzzy inference system." Expert Systems with
Applications 38.3 (2011): 1814-1822.
[2] Kirui, Clement, et al. "Predicting customer churn in mobile telephony industry using
probabilistic classifiers in data mining." International Journal of Computer Science
Issues (IJCSI) 10.2 Part 1 (2013): 165.
[3] Ballings, Michel, and Dirk Van den Poel. "Customer event history for churn
prediction: How long is long enough?" Expert Systems with Applications 39.18
(2012): 13517-13522.
[4] Ismail, Mohammad Ridwan, et al. "A multi-layer perceptron approach for customer
churn prediction." International Journal of Multimedia and Ubiquitous Engineering
10.7 (2015): 213-222.
[5] Lee, Hyeseon, et al. "Mining churning behaviors and developing retention strategies
based on a partial least squares (PLS) model." Decision Support Systems 52.1
(2011): 207-216.
[6] Burez D, den Poel V. Handling class imbalance in customer churn prediction. Expert
Syst Appl. 2009; 36(3):4626–36.
[7] Brandusoiu I, Toderean Gavril, Ha B. Methods for churn prediction in the prepaid
mobile telecommunications industry. In: International conference on
communications. 2016. p. 97–100.
[8] He Y, He Z, Zhang D. A study on prediction of customer churns in fixed
communication network based on data mining. In: Sixth international conference on
fuzzy systems and knowledge discovery, vol. 1. 2009. p. 92–4.
[9] Idris A, Khan A, Lee YS. Genetic programming and adaboosting based churn prediction
for telecom. In: IEEE international conference on systems, man, and cybernetics (SMC).
2012. p. 1328–32.
[10] Huang F, Zhu M, Yuan K, Deng EO. Telco churn prediction with big data. In: ACM
SIGMOD international conference on management of data. 2015. p .607–18
[11] Karahoca, Adem, and Dilek Karahoca. "GSM churn management by using fuzzy c-
means clustering and adaptive neuro fuzzy inference system." Expert Systems with
Applications 38.3 (2011): 1814-1822.

57
[12] Kirui, Clement, et al. "Predicting customer churn in mobile telephony industry using
probabilistic classifiers in data mining." International Journal of Computer Science
Issues (IJCSI) 10.2 Part 1 (2013): 165.
[13] Ballings, Michel, and Dirk Van den Poel. "Customer event history for churn
prediction: How long is long enough?" Expert Systems with Applications 39.18
(2012): 13517-13522.
[14] Ismail, Mohammad Ridwan, et al. "A multi-layer perceptron approach for customer
prediction." International Journal of Multimedia and Ubiquitous Engineering 10.7
(2015): 213-222.
[15] Lee, Hyeseon, et al. "Mining churning behaviors and developing retention strategies
based on a partial least squares (PLS) model." Decision Support Systems 52.1
(2011): 207-216.
[16] Burez, Jonathan, and Dirk Van den Poel. "Handling class imbalance in customer
churn prediction." Expert Systems with Applications 36.3 (2009): 4626-4636.
[17] Schouten, Kim, et al. "Supervised and unsupervised aspect category detection for
sentiment analysis with co-occurrence data." IEEE transactions on cybernetics 48.4
(2017): 1263-1275.
[18] Karahoca, Adem, and Dilek Karahoca. "GSM churn management by using fuzzy c-
means clustering and adaptive neuro fuzzy inference system." Expert Systems with
Applications 38.3 (2011): 1814-1822.
[19] Kamalraj, N., and A. Malathi. "A survey on churn prediction techniques in
communication sector." International Journal of Computer Applications 64.5 (2013):
39-42.

58

You might also like