0% found this document useful (0 votes)

25 views75 pages

Attrition Project Mangal

This project report outlines the development of an Employee Attrition Prediction Model using machine learning techniques to identify factors contributing to employee turnover. The model aims to provide actionable insights for HR teams to improve retention strategies by analyzing historical employee data. Key steps include data collection, preprocessing, model selection, and evaluation, ultimately demonstrating the model's effectiveness in predicting attrition and informing HR decision-making.

Uploaded by

mangleshwarpratap.gts

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views75 pages

Attrition Project Mangal

Uploaded by

mangleshwarpratap.gts

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 75

Institute of Technology and

Management, Meerut
( Affiliated to Dr. A.P.J. Abdul Kalam Technical University,
Lucknow ) College Code 285
Session 2024-25

Department of Computer Science

Project Report
On
Employee Attrition Prediction Model
Using Machine Learning

Submitted By : Submitted To:

Mangleshwar Pratap Dr. P K Vashistha
B.Tech (CSE) 4th Year
Roll no. 2102850100008
Acknowledgement

I would like to express my sincere gratitude to everyone

who contributed to the successful completion of this
project, "Employee Attrition Prediction using Machine
Learning."
I am particularly thankful for the support of my friends,
whose encouragement and motivation kept me focused
throughout this journey.
Additionally, I am grateful to the open-source community
and online platforms for providing invaluable resources,
datasets, and documentation that served as the
foundation for this work.
This project is the result of dedication and effort, and I
truly appreciate all the assistance and resources that
made it possible.
Project Description

Overview:
This project aims to leverage machine learning techniques
to predict employee attrition and identify the factors
contributing to it. Employee attrition is a significant
challenge for organizations, leading to increased costs for
recruitment, onboarding, and training, as well as
disruptions to workflow and morale. By analyzing historical
employee data, the project seeks to provide actionable
insights to reduce turnover rates and improve employee
retention strategies.
Objectives:
1. To develop a machine learning model capable of
accurately predicting whether an employee is likely to
leave the organization.
2. To identify key factors influencing attrition, such as job
satisfaction, work-life balance, compensation, and
career growth opportunities.
3. To provide insights that can guide human resource
teams in making data-driven decisions to improve
employee engagement and retention.
Problem Statement
Employee attrition is a critical challenge faced by
organizations across industries. High attrition rates lead to
increased costs associated with recruitment, onboarding,
and training, as well as disruptions in workflow, team
dynamics, and overall productivity. Identifying the
underlying reasons for employee turnover and predicting
potential attrition are essential for designing effective
retention strategies.
Despite the availability of HR data, many organizations
struggle to leverage it effectively for proactive decision-
making. Traditional methods of analyzing attrition are often
time-consuming, lack accuracy, and fail to identify subtle
patterns that contribute to employee dissatisfaction and
eventual resignation.
The goal of this project is to develop a machine learning-
based solution that can:
1. Accurately predict whether an employee is likely to
leave the organization.
2. Identify and rank the factors that contribute to
attrition. Provide actionable insights for human
resource teams to implement targeted interventions
and reduce turnover rates.
Dataset Info
Possible Sources:
• Kaggle Dataset
Key Features of the Dataset
1. Age: Employee's age.
2. Attrition: Target variable indicating whether the employee left the
organization.
3. BusinessTravel: Frequency of business travel (e.g., Rarely,
Frequently).
4. DailyRate / MonthlyIncome: Financial metrics related to employee
salary.
5. Department: Department of the employee (e.g., Sales, HR, R&D).
6. DistanceFromHome: Commute distance from home to work.
7. Education & EducationField: Employee’s education level and field of
study.
8. EnvironmentSatisfaction: Satisfaction with the work environment (1–
4 scale).
9. Gender: Gender of the employee.
10. JobRole & JobLevel: Role and position level in the organization.
11. JobSatisfaction: Satisfaction with the job itself (1–4 scale).
12. MaritalStatus: Employee’s marital status.
13. OverTime: Whether the employee works overtime.
14. WorkLifeBalance: Perception of work-life balance (1–4 scale).
15. YearsAtCompany / TotalWorkingYears: Employee’s tenure and
total work experience.
16. YearsSinceLastPromotion: Time since the last promotion.
17. TrainingTimesLastYear: Number of training sessions attended in
the past year.

Steps to Build the Model

1. Problem Understanding
• Objective: Predict whether an employee will leave the company or not based
on historical data.
• Business Goal: Minimize attrition and optimize retention strategies by
understanding key predictors.
2. Data Collection
• Gather data that can include:
o Employee demographics (age, gender, marital status)
o Job role and department
o Work environment factors (satisfaction, work-life balance)
o Compensation and benefits
o Performance data
o Historical data on employees who left vs. stayed
• Typically, datasets such as the "IBM HR Analytics Employee Attrition &
Performance" dataset can be useful.
3. Data Exploration and Preprocessing
• Exploratory Data Analysis (EDA): Visualize distributions, correlations, and
basic statistics to understand the data.
• Handle Missing Values: Fill or drop missing data depending on the situation
(e.g., using mean imputation or dropping rows with too many missing
values).
• Feature Engineering:
o Convert categorical variables into numerical values using encoding
methods like one-hot encoding or label encoding.
o Create new features if necessary (e.g., creating a "tenure category"
based on years of service).
• Scaling and Normalization: Scale numerical features using methods like
MinMaxScaler or StandardScaler if necessary, especially for algorithms
sensitive to feature scaling like SVM or k-NN.
4. Feature Selection
• Use correlation matrices, feature importance (via tree-based models), or
recursive feature elimination to identify the most relevant features.
• Remove irrelevant or highly correlated features that may lead to overfitting.
5. Model Selection
• Train-test Split: Split your dataset into training and test sets (usually 70%-80%
for training, 20%-30% for testing).
• Model Selection: Start with a few machine learning algorithms:
o Logistic Regression (for binary classification)
o Decision Trees / Random Forests (to capture non-linear relationships)
o Support Vector Machine (SVM)
o Gradient Boosting Methods (XGBoost, LightGBM)
o Neural Networks (if you have large data)
• For classification problems, consider using cross-validation for robust
performance estimation.
6. Model Training
• Train your selected models on the training dataset.
• For models like Random Forests or XGBoost, tune hyperparameters using
GridSearchCV or RandomizedSearchCV to optimize model performance.
7. Model Evaluation
• Confusion Matrix: To evaluate accuracy, precision, recall, and F1-score.
• ROC Curve & AUC: To measure the performance in terms of true positive rate
vs. false positive rate.
• Cross-Validation: Validate the model using K-fold cross-validation to ensure
robustness and minimize overfitting.
• Feature Importance: Analyze which features are driving the model’s
predictions.
8. Model Interpretation
• Use model interpretation techniques like SHAP (Shapley Additive
Explanations) to understand how different features influence model
predictions.
• This can be valuable for business stakeholders to explain model decisions.
9. Model Deployment
• Once the model performs well on the test set, you can deploy it into a
production environment where it can predict attrition for new employees.
• Integrate the model into an HR system or dashboard where the business can
monitor predictions and take action.
10. Monitoring and Maintenance
• Continuously monitor the model’s performance. As new data is added, the
model’s accuracy might change, and it may require retraining.
• Set up a retraining pipeline if the data distribution changes over time.
Conclusion
In this project, we developed a machine learning model to
predict employee attrition, offering valuable insights into
the factors that influence an employee's decision to leave
the organization. By analyzing historical employee data, we
identified key predictors such as job satisfaction,
compensation, tenure, and work-life balance, which
significantly impact attrition rates.
The process involved data collection, preprocessing, feature
engineering, and model selection. Several machine learning
algorithms, including Logistic Regression, Decision Trees,
and Random Forests, were tested to determine the most
effective model. After fine-tuning hyperparameters and
evaluating the models using metrics like accuracy, precision,
recall, and AUC, the best-performing model was identified.
The model's performance showed promising results,
accurately predicting which employees are at risk of leaving.
Feature importance analysis highlighted the role of job
satisfaction, salary, and tenure in the predictions, providing
actionable insights for HR teams. These insights can help
companies identify at-risk employees early and implement
targeted retention strategies.
Key takeaways from the project include:
• Accurate Prediction: The machine learning model
successfully predicted employee attrition with a high
degree of accuracy, providing HR with a useful tool to
forecast turnover.
• Business Application: The model’s insights can help HR
departments develop strategies to improve employee
retention, such as addressing dissatisfaction or offering
career advancement opportunities.
• Model Interpretability: Using techniques like SHAP, we
ensured the model’s predictions could be explained,
making it easier for HR professionals to understand and
trust the results.
Ultimately, this project demonstrates how machine learning
can be leveraged to predict employee attrition and improve
HR decision-making. With further refinements and regular
updates, the model can continue to provide valuable
support to organizations in reducing turnover and
enhancing employee engagement.
Employee Attrition Prediction Using
Machine Learning
In [1]: import math, time, random, datetime

# data analysis and wrangling

import pandas as pd
import numpy as np
from pandas_profiling import ProfileReport

In [2]: # visualization
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')

#import for interactive plotting

import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
%matplotlib inline

In [3]: # Preprocessing
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, label_binarize, StandardScaler

In [4]: # machine learning

from sklearn import model_selection, tree, preprocessing, metrics, linear_model
from sklearn.metrics import confusion_matrix,classification_report
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron,SGDClassifier,LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split,StratifiedKFold, GridSearchCV, learning_curve, cross_val_score
from catboost import CatBoostClassifier, Pool, cv

In [5]: # ignore Warnings

import warnings
warnings.filterwarnings('ignore')

Import and Inspect Data

In [6]: df = pd.read_csv("Data/WA_Fn-UseC_-HR-Employee-Attrition.csv")

In [7]: df.head()

Out[7]: Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount Employe

0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1

Research &
1 49 No Travel_Frequently 279 8 1 Life Sciences 1
Development

Research &
2 37 Yes Travel_Rarely 1373 2 2 Other 1
Development

Research &
3 33 No Travel_Frequently 1392 3 4 Life Sciences 1
Development

Research &
4 27 No Travel_Rarely 591 2 1 Medical 1
Development

5 rows × 35 columns

In [8]: df.shape

Out[8]: (1470, 35)

Exploratory Data Analysis
Job level is strongly correlated with total working hours
Monthly income is strongly correlated with Job level
Monthly income is strongly correlated with total working hours
Age is stongly correlated with monthly income

In [9]: ProfileReport(df)
Out[9]:

Overview

Dataset info

Number of variables 35
Number of observations 1470
Total Missing (%) 0.0%
Total size in memory 402.1 KiB
Average record size in memory 280.1 B
Variables types

Numeric 22
Categorical 8
Boolean 1
Date 0
Text (Unique) 0
Rejected 4
Unsupported 0

Warnings

EmployeeCount has constant value 1 Rejected

MonthlyIncome is highly correlated with JobLevel (ρ = 0.9503) Rejected
NumCompaniesWorked has 197 / 13.4% zeros Zeros
Over18 has constant value Y Rejected
StandardHours has constant value 80 Rejected
StockOptionLevel has 631 / 42.9% zeros Zeros
TrainingTimesLastYear has 54 / 3.7% zeros Zeros
YearsAtCompany has 44 / 3.0% zeros Zeros
YearsInCurrentRole has 244 / 16.6% zeros Zeros
YearsSinceLastPromotion has 581 / 39.5% zeros Zeros
YearsWithCurrManager has 263 / 17.9% zeros Zeros

Variables

Age
Numeric

Distinct count 43
Unique (%) 2.9%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 36.924
Minimum 18
Maximum 60
Zeros (%) 0.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 18
5-th percentile 24
Q1 30
Median 36
Q3 43
95-th percentile 54
Maximum 60
Range 42
Interquartile range 13
Descriptive statistics

Standard deviation 9.1354

Coef of variation 0.24741
Kurtosis -0.40415
Mean 36.924
MAD 7.4098
Skewness 0.41329
Sum 54278
Variance 83.455
Memory size 11.6 KiB

ValueCountFrequency (%)
35 78 5.3%
34 77 5.2%
31 69 4.7%
36 69 4.7%
29 68 4.6%
32 61 4.1%
30 60 4.1%
33 58 3.9%
38 58 3.9%
40 57 3.9%
Other values (33) 815 55.4%

Minimum 5 values

ValueCountFrequency (%)
18 8 0.5%
19 9 0.6%
20 11 0.7%
21 13 0.9%
22 16 1.1%

Maximum 5 values

ValueCountFrequency (%)
56 14 1.0%
57 4 0.3%
58 14 1.0%
59 10 0.7%
60 5 0.3%

Attrition
Categorical
Distinct count 2
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0

No 1233
Yes 237

Toggle details

ValueCountFrequency (%)
No 1233 83.9%
Yes 237 16.1%

BusinessTravel
Categorical

Distinct count 3
Unique (%) 0.2%
Missing (%) 0.0%
Missing (n) 0

Travel_Rarely 1043
Travel_Frequently 277
Non-Travel 150

Toggle details

ValueCountFrequency (%)
Travel_Rarely 1043 71.0%
Travel_Frequently 277 18.8%
Non-Travel 150 10.2%

DailyRate
Numeric

Distinct count 886

Unique (%) 60.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 802.49
Minimum 102
Maximum 1499
Zeros (%) 0.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 102
5-th percentile 165.35
Q1 465
Median 802
Q3 1157
95-th percentile 1424.1
Maximum 1499
Range 1397
Interquartile range 692
Descriptive statistics

Standard deviation 403.51

Coef of variation 0.50282
Kurtosis -1.2038
Mean 802.49
MAD 350.25
Skewness -0.0035186
Sum 1179654
Variance 162820
Memory size 11.6 KiB

ValueCountFrequency (%)
691 6 0.4%
1082 5 0.3%
329 5 0.3%
1329 5 0.3%
530 5 0.3%
408 5 0.3%
715 4 0.3%
589 4 0.3%
906 4 0.3%
350 4 0.3%
Other values (876) 1423 96.8%

Minimum 5 values

ValueCountFrequency (%)
102 1 0.1%
103 1 0.1%
104 1 0.1%
105 1 0.1%
106 1 0.1%

Maximum 5 values

ValueCountFrequency (%)
1492 1 0.1%
1495 3 0.2%
1496 2 0.1%
1498 1 0.1%
1499 1 0.1%

Department
Categorical

Distinct count 3
Unique (%) 0.2%
Missing (%) 0.0%
Missing (n) 0
Research & Development 961
Sales 446
Human Resources 63

Toggle details

ValueCountFrequency (%)
Research & Development 961 65.4%
Sales 446 30.3%
Human Resources 63 4.3%

DistanceFromHome
Numeric

Distinct count 29
Unique (%) 2.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 9.1925
Minimum 1
Maximum 29
Zeros (%) 0.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 7
Q3 14
95-th percentile 26
Maximum 29
Range 28
Interquartile range 12

Descriptive statistics

Standard deviation 8.1069

Coef of variation 0.8819
Kurtosis -0.22483
Mean 9.1925
MAD 6.5727
Skewness 0.95812
Sum 13513
Variance 65.721
Memory size 11.6 KiB
ValueCountFrequency (%)
2 211 14.4%
1 208 14.1%
10 86 5.9%
9 85 5.8%
3 84 5.7%
7 84 5.7%
8 80 5.4%
5 65 4.4%
4 64 4.4%
6 59 4.0%
Other values (19) 444 30.2%

Minimum 5 values

ValueCountFrequency (%)
1 208 14.1%
2 211 14.4%
3 84 5.7%
4 64 4.4%
5 65 4.4%

Maximum 5 values

ValueCountFrequency (%)
25 25 1.7%
26 25 1.7%
27 12 0.8%
28 23 1.6%
29 27 1.8%

Education
Numeric

Distinct count 5
Unique (%) 0.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 2.9129
Minimum 1
Maximum 5
Zeros (%) 0.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 4
95-th percentile 4
Maximum 5
Range 4
Interquartile range 2

Descriptive statistics

Standard deviation 1.0242

Coef of variation 0.35159
Kurtosis -0.55911
Mean 2.9129
MAD 0.79271
Skewness -0.28968
Sum 4282
Variance 1.0489
Memory size 11.6 KiB

ValueCountFrequency (%)
3 572 38.9%
4 398 27.1%
2 282 19.2%
1 170 11.6%
5 48 3.3%

Minimum 5 values

ValueCountFrequency (%)
1 170 11.6%
2 282 19.2%
3 572 38.9%
4 398 27.1%
5 48 3.3%

Maximum 5 values

ValueCountFrequency (%)
1 170 11.6%
2 282 19.2%
3 572 38.9%
4 398 27.1%
5 48 3.3%
EducationField
Categorical

Distinct count 6
Unique (%) 0.4%
Missing (%) 0.0%
Missing (n) 0

Life Sciences 606

Medical 464
Marketing 159
Other values (3) 241

Toggle details

ValueCountFrequency (%)
Life Sciences 606 41.2%
Medical 464 31.6%
Marketing 159 10.8%
Technical Degree 132 9.0%
Other 82 5.6%
Human Resources 27 1.8%

EmployeeCount
Constant

This variable is constant and should be ignored for analysis

Constant value 1

EmployeeNumber
Numeric

Distinct count 1470

Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 1024.9
Minimum 1
Maximum 2068
Zeros (%) 0.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 1
5-th percentile 96.45
Q1 491.25
Median 1020.5
Q3 1555.8
95-th percentile 1967.5
Maximum 2068
Range 2067
Interquartile range 1064.5
Descriptive statistics

Standard deviation 602.02

Coef of variation 0.58742
Kurtosis -1.2232
Mean 1024.9
MAD 522.41
Skewness 0.016574
Sum 1506552
Variance 362430
Memory size 11.6 KiB

ValueCountFrequency (%)
2046 1 0.1%
641 1 0.1%
644 1 0.1%
645 1 0.1%
647 1 0.1%
648 1 0.1%
649 1 0.1%
650 1 0.1%
652 1 0.1%
653 1 0.1%
Other values (1460) 1460 99.3%

Minimum 5 values

ValueCountFrequency (%)
1 1 0.1%
2 1 0.1%
4 1 0.1%
5 1 0.1%
7 1 0.1%

Maximum 5 values

ValueCountFrequency (%)
2061 1 0.1%
2062 1 0.1%
2064 1 0.1%
2065 1 0.1%
2068 1 0.1%

EnvironmentSatisfaction
Numeric

Distinct count 4
Unique (%) 0.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 2.7218
Minimum 1
Maximum 4
Zeros (%) 0.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 4
95-th percentile 4
Maximum 4
Range 3
Interquartile range 2

Descriptive statistics

Standard deviation 1.0931

Coef of variation 0.40161
Kurtosis -1.2025
Mean 2.7218
MAD 0.94712
Skewness -0.32165
Sum 4001
Variance 1.1948
Memory size 11.6 KiB

ValueCountFrequency (%)
3 453 30.8%
4 446 30.3%
2 287 19.5%
1 284 19.3%

Minimum 5 values

ValueCountFrequency (%)
1 284 19.3%
2 287 19.5%
3 453 30.8%
4 446 30.3%

Maximum 5 values
ValueCountFrequency (%)
1 284 19.3%
2 287 19.5%
3 453 30.8%
4 446 30.3%

Gender
Categorical

Distinct count 2
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0

Male 882
Female 588

Toggle details

ValueCountFrequency (%)
Male 882 60.0%
Female 588 40.0%

HourlyRate
Numeric

Distinct count 71
Unique (%) 4.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 65.891
Minimum 30
Maximum 100
Zeros (%) 0.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 30
5-th percentile 33
Q1 48
Median 66
Q3 83.75
95-th percentile 97
Maximum 100
Range 70
Interquartile range 35.75

Descriptive statistics

Standard deviation 20.329

Coef of variation 0.30853
Kurtosis -1.1964
Mean 65.891
MAD 17.649
Skewness -0.032311
Sum 96860
Variance 413.29
Memory size 11.6 KiB

ValueCountFrequency (%)
66 29 2.0%
42 28 1.9%
98 28 1.9%
48 28 1.9%
84 28 1.9%
79 27 1.8%
96 27 1.8%
57 27 1.8%
52 26 1.8%
87 26 1.8%
Other values (61) 1196 81.4%

Minimum 5 values

ValueCountFrequency (%)
30 19 1.3%
31 15 1.0%
32 24 1.6%
33 19 1.3%
34 12 0.8%

Maximum 5 values

ValueCountFrequency (%)
96 27 1.8%
97 21 1.4%
98 28 1.9%
99 20 1.4%
100 19 1.3%

JobInvolvement
Numeric

Distinct count 4
Unique (%) 0.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 2.7299
Minimum 1
Maximum 4
Zeros (%) 0.0%

Toggle details
Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 3
95-th percentile 4
Maximum 4
Range 3
Interquartile range 1

Descriptive statistics

Standard deviation 0.71156

Coef of variation 0.26065
Kurtosis 0.271
Mean 2.7299
MAD 0.56777
Skewness -0.49842
Sum 4013
Variance 0.50632
Memory size 11.6 KiB

ValueCountFrequency (%)
3 868 59.0%
2 375 25.5%
4 144 9.8%
1 83 5.6%

Minimum 5 values

ValueCountFrequency (%)
1 83 5.6%
2 375 25.5%
3 868 59.0%
4 144 9.8%

Maximum 5 values

ValueCountFrequency (%)
1 83 5.6%
2 375 25.5%
3 868 59.0%
4 144 9.8%
JobLevel
Numeric

Distinct count 5
Unique (%) 0.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 2.0639
Minimum 1
Maximum 5
Zeros (%) 0.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 1
5-th percentile 1
Q1 1
Median 2
Q3 3
95-th percentile 4
Maximum 5
Range 4
Interquartile range 2

Descriptive statistics

Standard deviation 1.1069

Coef of variation 0.53632
Kurtosis 0.39915
Mean 2.0639
MAD 0.83248
Skewness 1.0254
Sum 3034
Variance 1.2253
Memory size 11.6 KiB

ValueCountFrequency (%)
1 543 36.9%
2 534 36.3%
3 218 14.8%
4 106 7.2%
ValueCountFrequency (%)
5 69 4.7%
Minimum 5 values

ValueCountFrequency (%)
1 543 36.9%
2 534 36.3%
3 218 14.8%
4 106 7.2%
5 69 4.7%

Maximum 5 values

ValueCountFrequency (%)
1 543 36.9%
2 534 36.3%
3 218 14.8%
4 106 7.2%
5 69 4.7%

JobRole
Categorical

Distinct count 9
Unique (%) 0.6%
Missing (%) 0.0%
Missing (n) 0

Sales Executive 326

Research Scientist 292
Laboratory Technician 259
Other values (6) 593

Toggle details

ValueCountFrequency (%)
Sales Executive 326 22.2%
Research Scientist 292 19.9%
Laboratory Technician 259 17.6%
Manufacturing Director 145 9.9%
Healthcare Representative 131 8.9%
Manager 102 6.9%
Sales Representative 83 5.6%
Research Director 80 5.4%
Human Resources 52 3.5%

JobSatisfaction
Numeric

Distinct count 4
Unique (%) 0.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 2.7286
Minimum 1
Maximum 4
Zeros (%) 0.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 4
95-th percentile 4
Maximum 4
Range 3
Interquartile range 2

Descriptive statistics

Standard deviation 1.1028

Coef of variation 0.40418
Kurtosis -1.2222
Mean 2.7286
MAD 0.95722
Skewness -0.32967
Sum 4011
Variance 1.2163
Memory size 11.6 KiB

ValueCountFrequency (%)
4 459 31.2%
3 442 30.1%
1 289 19.7%
2 280 19.0%

Minimum 5 values

ValueCountFrequency (%)
1 289 19.7%
2 280 19.0%
3 442 30.1%
4 459 31.2%

Maximum 5 values

ValueCountFrequency (%)
1 289 19.7%
2 280 19.0%
3 442 30.1%
4 459 31.2%

MaritalStatus
Categorical

Distinct count 3
Unique (%) 0.2%
Missing (%) 0.0%
Missing (n) 0
Married 673
Single 470
Divorced 327

Toggle details

ValueCountFrequency (%)
Married 673 45.8%
Single 470 32.0%
Divorced 327 22.2%

MonthlyIncome
Highly correlated

This variable is highly correlated with JobLevel and should be ignored for analysis

Correlation 0.9503

MonthlyRate
Numeric

Distinct count 1427

Unique (%) 97.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 14313
Minimum 2094
Maximum 26999
Zeros (%) 0.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 2094
5-th percentile 3384.6
Q1 8047
Median 14236
Q3 20462
95-th percentile 25432
Maximum 26999
Range 24905
Interquartile range 12414

Descriptive statistics

Standard deviation 7117.8

Coef of variation 0.49729
Kurtosis -1.215
Mean 14313
MAD 6188.1
Skewness 0.018578
Sum 21040262
Variance 50663000
Memory size 11.6 KiB
ValueCountFrequency (%)
4223 3 0.2%
9150 3 0.2%
6670 2 0.1%
7324 2 0.1%
4658 2 0.1%
21534 2 0.1%
16154 2 0.1%
13008 2 0.1%
12355 2 0.1%
6069 2 0.1%
Other values (1417) 1448 98.5%

Minimum 5 values

ValueCountFrequency (%)
2094 1 0.1%
2097 1 0.1%
2104 1 0.1%
2112 1 0.1%
2122 1 0.1%

Maximum 5 values

ValueCountFrequency (%)
26956 1 0.1%
26959 1 0.1%
26968 1 0.1%
26997 1 0.1%
26999 1 0.1%

NumCompaniesWorked
Numeric

Distinct count 10
Unique (%) 0.7%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 2.6932
Minimum 0
Maximum 9
Zeros (%) 13.4%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 0
5-th percentile 0
Q1 1
Median 2
Q3 4
95-th percentile 8
Maximum 9
Range 9
Interquartile range 3

Descriptive statistics

Standard deviation 2.498

Coef of variation 0.92753
Kurtosis 0.010214
Mean 2.6932
MAD 2.0598
Skewness 1.0265
Sum 3959
Variance 6.24
Memory size 11.6 KiB

ValueCountFrequency (%)
1 521 35.4%
0 197 13.4%
3 159 10.8%
2 146 9.9%
4 139 9.5%
7 74 5.0%
6 70 4.8%
5 63 4.3%
9 52 3.5%
8 49 3.3%

Minimum 5 values

ValueCountFrequency (%)
0 197 13.4%
1 521 35.4%
2 146 9.9%
3 159 10.8%
4 139 9.5%

Maximum 5 values

ValueCountFrequency (%)
5 63 4.3%
6 70 4.8%
7 74 5.0%
ValueCountFrequency (%)
8 49 3.3%
9 52 3.5%

Over18
Constant

This variable is constant and should be ignored for analysis

Constant value Y

OverTime
Categorical

Distinct count 2
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0

No 1054
Yes 416

Toggle details

ValueCountFrequency (%)
No 1054 71.7%
Yes 416 28.3%

PercentSalaryHike
Numeric

Distinct count 15
Unique (%) 1.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 15.21
Minimum 11
Maximum 25
Zeros (%) 0.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 11
5-th percentile 11
Q1 12
Median 14
Q3 18
95-th percentile 22
Maximum 25
Range 14
Interquartile range 6

Descriptive statistics

Standard deviation 3.6599

Coef of variation 0.24063
Kurtosis -0.3006
Mean 15.21
MAD 3.0552
Skewness 0.82113
Sum 22358
Variance 13.395
Memory size 11.6 KiB

ValueCountFrequency (%)
11 210 14.3%
13 209 14.2%
14 201 13.7%
12 198 13.5%
15 101 6.9%
18 89 6.1%
17 82 5.6%
16 78 5.3%
19 76 5.2%
22 56 3.8%
Other values (5) 170 11.6%

Minimum 5 values

ValueCountFrequency (%)
11 210 14.3%
12 198 13.5%
13 209 14.2%
14 201 13.7%
15 101 6.9%

Maximum 5 values

ValueCountFrequency (%)
21 48 3.3%
22 56 3.8%
23 28 1.9%
24 21 1.4%
25 18 1.2%

PerformanceRating
Boolean

Distinct count 2
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0

Mean 3.1537

3 1244
4 226
Toggle details

ValueCountFrequency (%)
3 1244 84.6%
4 226 15.4%

RelationshipSatisfaction
Numeric

Distinct count 4
Unique (%) 0.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 2.7122
Minimum 1
Maximum 4
Zeros (%) 0.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 4
95-th percentile 4
Maximum 4
Range 3
Interquartile range 2

Descriptive statistics

Standard deviation 1.0812

Coef of variation 0.39864
Kurtosis -1.1848
Mean 2.7122
MAD 0.93658
Skewness -0.30283
Sum 3987
Variance 1.169
Memory size 11.6 KiB
ValueCountFrequency (%)
3 459 31.2%
4 432 29.4%
2 303 20.6%
1 276 18.8%

Minimum 5 values

ValueCountFrequency (%)
1 276 18.8%
2 303 20.6%
3 459 31.2%
4 432 29.4%

Maximum 5 values

ValueCountFrequency (%)
1 276 18.8%
2 303 20.6%
3 459 31.2%
4 432 29.4%

StandardHours
Constant

This variable is constant and should be ignored for analysis

Constant value 80

StockOptionLevel
Numeric

Distinct count 4
Unique (%) 0.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 0.79388
Minimum 0
Maximum 3
Zeros (%) 42.9%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 1
Q3 1
95-th percentile 3
Maximum 3
Range 3
Interquartile range 1

Descriptive statistics

Standard deviation 0.85208

Coef of variation 1.0733
Kurtosis 0.36463
Mean 0.79388
MAD 0.68155
Skewness 0.96898
Sum 1167
Variance 0.72603
Memory size 11.6 KiB

ValueCountFrequency (%)
0 631 42.9%
1 596 40.5%
2 158 10.7%
3 85 5.8%

Minimum 5 values

ValueCountFrequency (%)
0 631 42.9%
1 596 40.5%
2 158 10.7%
3 85 5.8%

Maximum 5 values

ValueCountFrequency (%)
0 631 42.9%
1 596 40.5%
2 158 10.7%
3 85 5.8%

TotalWorkingYears
Numeric
Distinct count 40
Unique (%) 2.7%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 11.28
Minimum 0
Maximum 40
Zeros (%) 0.7%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 0
5-th percentile 1
Q1 6
Median 10
Q3 15
95-th percentile 28
Maximum 40
Range 40
Interquartile range 9

Descriptive statistics

Standard deviation 7.7808

Coef of variation 0.68981
Kurtosis 0.91827
Mean 11.28
MAD 6.0342
Skewness 1.1172
Sum 16581
Variance 60.541
Memory size 11.6 KiB

ValueCountFrequency (%)
10 202 13.7%
6 125 8.5%
8 103 7.0%
9 96 6.5%
5 88 6.0%
1 81 5.5%
7 81 5.5%
4 63 4.3%
ValueCountFrequency (%)
12 48 3.3%
3 42 2.9%
Other values (30) 541 36.8%
Minimum 5 values

ValueCountFrequency (%)
0 11 0.7%
1 81 5.5%
2 31 2.1%
3 42 2.9%
4 63 4.3%

Maximum 5 values

ValueCountFrequency (%)
35 3 0.2%
36 6 0.4%
37 4 0.3%
38 1 0.1%
40 2 0.1%

TrainingTimesLastYear
Numeric

Distinct count 7
Unique (%) 0.5%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 2.7993
Minimum 0
Maximum 6
Zeros (%) 3.7%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 0
5-th percentile 1
Q1 2
Median 3
Q3 3
95-th percentile 5
Maximum 6
Range 6
Interquartile range 1

Descriptive statistics

Standard deviation 1.2893

Coef of variation 0.46057
Kurtosis 0.49499
Mean 2.7993
MAD 0.97434
Skewness 0.55312
Sum 4115
Variance 1.6622
Memory size 11.6 KiB
ValueCountFrequency (%)
2 547 37.2%
3 491 33.4%
4 123 8.4%
5 119 8.1%
1 71 4.8%
6 65 4.4%
0 54 3.7%

Minimum 5 values

ValueCountFrequency (%)
0 54 3.7%
1 71 4.8%
2 547 37.2%
3 491 33.4%
4 123 8.4%

Maximum 5 values

ValueCountFrequency (%)
2 547 37.2%
3 491 33.4%
4 123 8.4%
5 119 8.1%
6 65 4.4%

WorkLifeBalance
Numeric

Distinct count 4
Unique (%) 0.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 2.7612
Minimum 1
Maximum 4
Zeros (%) 0.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values
Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 3
95-th percentile 4
Maximum 4
Range 3
Interquartile range 1

Descriptive statistics

Standard deviation 0.70648

Coef of variation 0.25586
Kurtosis 0.41946
Mean 2.7612
MAD 0.54797
Skewness -0.55248
Sum 4059
Variance 0.49911
Memory size 11.6 KiB

ValueCountFrequency (%)
3 893 60.7%
2 344 23.4%
4 153 10.4%
1 80 5.4%

Minimum 5 values

ValueCountFrequency (%)
1 80 5.4%
2 344 23.4%
3 893 60.7%
4 153 10.4%

Maximum 5 values

ValueCountFrequency (%)
1 80 5.4%
2 344 23.4%
3 893 60.7%
4 153 10.4%

YearsAtCompany
Numeric

Distinct count 37
Unique (%) 2.5%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 7.0082
Minimum 0
Maximum 40
Zeros (%) 3.0%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 0
5-th percentile 1
Q1 3
Median 5
Q3 9
95-th percentile 20
Maximum 40
Range 40
Interquartile range 6

Descriptive statistics

Standard deviation 6.1265

Coef of variation 0.8742
Kurtosis 3.9355
Mean 7.0082
MAD 4.4717
Skewness 1.7645
Sum 10302
Variance 37.534
Memory size 11.6 KiB

ValueCountFrequency (%)
5 196 13.3%
1 171 11.6%
3 128 8.7%
2 127 8.6%
10 120 8.2%
4 110 7.5%
7 90 6.1%
9 82 5.6%
8 80 5.4%
6 76 5.2%
Other values (27) 290 19.7%

Minimum 5 values
ValueCountFrequency (%)
0 44 3.0%
1 171 11.6%
2 127 8.6%
3 128 8.7%
4 110 7.5%

Maximum 5 values

ValueCountFrequency (%)
33 5 0.3%
34 1 0.1%
36 2 0.1%
37 1 0.1%
40 1 0.1%

YearsInCurrentRole
Numeric

Distinct count 19
Unique (%) 1.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 4.2293
Minimum 0
Maximum 18
Zeros (%) 16.6%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 0
5-th percentile 0
Q1 2
Median 3
Q3 7
95-th percentile 11
Maximum 18
Range 18
Interquartile range 5

Descriptive statistics

Standard deviation 3.6231

Coef of variation 0.85669
Kurtosis 0.47742
Mean 4.2293
MAD 3.0409
Skewness 0.91736
Sum 6217
Variance 13.127
Memory size 11.6 KiB
ValueCountFrequency (%)
2 372 25.3%
0 244 16.6%
7 222 15.1%
3 135 9.2%
4 104 7.1%
8 89 6.1%
9 67 4.6%
1 57 3.9%
6 37 2.5%
5 36 2.4%
Other values (9) 107 7.3%

Minimum 5 values

ValueCountFrequency (%)
0 244 16.6%
1 57 3.9%
2 372 25.3%
3 135 9.2%
4 104 7.1%

Maximum 5 values

ValueCountFrequency (%)
14 11 0.7%
15 8 0.5%
16 7 0.5%
17 4 0.3%
18 2 0.1%

YearsSinceLastPromotion
Numeric

Distinct count 16
Unique (%) 1.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 2.1878
Minimum 0
Maximum 15
Zeros (%) 39.5%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 1
Q3 3
95-th percentile 9
Maximum 15
Range 15
Interquartile range 3

Descriptive statistics

Standard deviation 3.2224

Coef of variation 1.4729
Kurtosis 3.6127
Mean 2.1878
MAD 2.3469
Skewness 1.9843
Sum 3216
Variance 10.384
Memory size 11.6 KiB

ValueCountFrequency (%)
0 581 39.5%
1 357 24.3%
2 159 10.8%
7 76 5.2%
4 61 4.1%
3 52 3.5%
5 45 3.1%
6 32 2.2%
11 24 1.6%
8 18 1.2%
Other values (6) 65 4.4%

Minimum 5 values

ValueCountFrequency (%)
0 581 39.5%
1 357 24.3%
2 159 10.8%
3 52 3.5%
4 61 4.1%

Maximum 5 values

ValueCountFrequency (%)
11 24 1.6%
12 10 0.7%
ValueCountFrequency (%)
13 10 0.7%
14 9 0.6%
15 13 0.9%

YearsWithCurrManager
Numeric

Distinct count 18
Unique (%) 1.2%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0

Mean 4.1231
Minimum 0
Maximum 17
Zeros (%) 17.9%

Toggle details

Statistics
Histogram
Common Values
Extreme Values

Quantile statistics

Minimum 0
5-th percentile 0
Q1 2
Median 3
Q3 7
95-th percentile 10
Maximum 17
Range 17
Interquartile range 5

Descriptive statistics

Standard deviation 3.5681

Coef of variation 0.8654
Kurtosis 0.17106
Mean 4.1231
MAD 3.0254
Skewness 0.83345
Sum 6061
Variance 12.732
Memory size 11.6 KiB
ValueCountFrequency (%)
2 344 23.4%
0 263 17.9%
7 216 14.7%
3 142 9.7%
8 107 7.3%
4 98 6.7%
1 76 5.2%
9 64 4.4%
5 31 2.1%
6 29 2.0%
Other values (8) 100 6.8%

Minimum 5 values

ValueCountFrequency (%)
0 263 17.9%
1 76 5.2%
2 344 23.4%
3 142 9.7%
4 98 6.7%

Maximum 5 values

ValueCountFrequency (%)
13 14 1.0%
14 5 0.3%
15 5 0.3%
16 2 0.1%
17 7 0.5%

Correlations
Sample

Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField Em

0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences

1 49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences

2 37 Yes Travel_Rarely 1373 Research & Development 2 2 Other

3 33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences

4 27 No Travel_Rarely 591 Research & Development 2 1 Medical

In [10]: # drop the unnecessary columns

df.drop(['EmployeeNumber','Over18','StandardHours','EmployeeCount'],axis=1,inplace=True)

In [11]: df['Attrition'] = df['Attrition'].apply(lambda x:1 if x == "Yes" else 0 )

df['OverTime'] = df['OverTime'].apply(lambda x:1 if x =="Yes" else 0 )

In [12]: attrition = df[df['Attrition'] == 1]

no_attrition = df[df['Attrition']==0]

Visualization of Categorical Features

In [13]: def categorical_column_viz(col_name):

f,ax = plt.subplots(1,2, figsize=(10,6))

# Count Plot
df[col_name].value_counts().plot.bar(cmap='Set2',ax=ax[0])
ax[1].set_title(f'Number of Employee by {col_name}')
ax[1].set_ylabel('Count')
ax[1].set_xlabel(f'{col_name}')

# Attrition Count per factors

sns.countplot(col_name, hue='Attrition',data=df, ax=ax[1], palette='Set2')
ax[1].set_title(f'Attrition by {col_name}')
ax[1].set_xlabel(f'{col_name}')
ax[1].set_ylabel('Count')

In [14]: categorical_column_viz('BusinessTravel')

In [15]: categorical_column_viz('Department')
In [16]: categorical_column_viz('EducationField')

In [17]: categorical_column_viz('Education')
In [18]: categorical_column_viz('EnvironmentSatisfaction')

In [19]: categorical_column_viz('Gender')
In [20]: categorical_column_viz('JobRole')

In [21]: categorical_column_viz('JobInvolvement')
In [22]: categorical_column_viz('MaritalStatus')

In [23]: categorical_column_viz('NumCompaniesWorked')
In [24]: categorical_column_viz('OverTime')

In [25]: categorical_column_viz('StockOptionLevel')
In [26]: categorical_column_viz('TrainingTimesLastYear')

In [27]: categorical_column_viz('YearsWithCurrManager')
Visualization of Numerical Features
In [28]: def numerical_column_viz(col_name):
f,ax = plt.subplots(1,2, figsize=(18,6))
sns.kdeplot(attrition[col_name], label='Employee who left',ax=ax[0], shade=True, color='palegreen')
sns.kdeplot(no_attrition[col_name], label='Employee who stayed', ax=ax[0], shade=True, color='salmon')

sns.boxplot(y=col_name, x='Attrition',data=df, palette='Set3', ax=ax[1])

In [29]: numerical_column_viz("Age")

In [30]: numerical_column_viz("Age")
In [31]: numerical_column_viz("DailyRate")

In [32]: numerical_column_viz("DistanceFromHome")

In [33]: numerical_column_viz("MonthlyIncome")

In [34]: numerical_column_viz("HourlyRate")
In [35]: numerical_column_viz("JobInvolvement")

In [36]: numerical_column_viz("PercentSalaryHike")

In [37]: numerical_column_viz("Age")
In [38]: numerical_column_viz("DailyRate")

In [39]: numerical_column_viz("TotalWorkingYears")

In [40]: numerical_column_viz("YearsAtCompany")

In [41]: numerical_column_viz("YearsInCurrentRole")
In [42]: numerical_column_viz("YearsSinceLastPromotion")

In [43]: numerical_column_viz("YearsWithCurrManager")

Visualization of Categorical vs Numericals Features

In [44]: def categorical_numerical(numerical_col, categorical_col1, categorical_col2):

f,ax = plt.subplots(1,2, figsize=(20,8))

g1= sns.swarmplot( categorical_col1, numerical_col,hue='Attrition', data=df, dodge=True, ax=ax[0], palette='Set2')

ax[0].set_title(f'{numerical_col} vs {categorical_col1} separeted by Attrition')
g1.set_xticklabels(g1.get_xticklabels(), rotation=90)

g2 = sns.swarmplot( categorical_col2, numerical_col,hue='Attrition', data=df, dodge=True, ax=ax[1], palette='Set2')

ax[1].set_title(f'{numerical_col} vs {categorical_col1} separeted by Attrition')
g2.set_xticklabels(g2.get_xticklabels(), rotation=90)

In [45]: categorical_numerical('Age','Gender','MaritalStatus')
In [46]: categorical_numerical('Age','JobRole','EducationField')

In [47]: categorical_numerical('MonthlyIncome','Gender','MaritalStatus')

Feature Engineering
In [48]: # 'EnviornmentSatisfaction', 'JobInvolvement', 'JobSatisfacction', 'RelationshipSatisfaction', 'WorklifeBalance' can be cl

df['Total_Satisfaction'] = (df['EnvironmentSatisfaction'] +
df['JobInvolvement'] +
df['JobSatisfaction'] +
df['RelationshipSatisfaction'] +
df['WorkLifeBalance']) /5

# Drop Columns
df.drop(['EnvironmentSatisfaction','JobInvolvement','JobSatisfaction','RelationshipSatisfaction','WorkLifeBalance'], axis=

In [49]: categorical_column_viz('Total_Satisfaction')

In [50]: df.Total_Satisfaction.describe()

Out[50]: count 1470.000000

mean 2.730748
std 0.428551
min 1.200000
25% 2.400000
50% 2.800000
75% 3.000000
max 4.000000
Name: Total_Satisfaction, dtype: float64

In [51]: # Convert Total satisfaction into boolean

# median = 2.8
# x = 1 if x >= 2.8

df['Total_Satisfaction_bool'] = df['Total_Satisfaction'].apply(lambda x:1 if x>=2.8 else 0 )

df.drop('Total_Satisfaction', axis=1, inplace=True)

In [52]: # It can be observed that the rate of attrition of employees below age of 35 is high

df['Age_bool'] = df['Age'].apply(lambda x:1 if x<35 else 0)

df.drop('Age', axis=1, inplace=True)

In [53]: # It can be observed that the employees are more likey the drop the job if dailtRate less than 800

df['DailyRate_bool'] = df['DailyRate'].apply(lambda x:1 if x<800 else 0)

df.drop('DailyRate', axis=1, inplace=True)

In [54]: # Employees working at R&D Department have higher attrition rate

df['Department_bool'] = df['Department'].apply(lambda x:1 if x=='Research & Development' else 0)

df.drop('Department', axis=1, inplace=True)

In [55]: # Rate of attrition of employees is high if DistanceFromHome > 10

df['DistanceFromHome_bool'] = df['DistanceFromHome'].apply(lambda x:1 if x>10 else 0)
df.drop('DistanceFromHome', axis=1, inplace=True)

In [56]: # Employees are more likey to drop the job if the employee is working as Laboratory Technician

df['JobRole_bool'] = df['JobRole'].apply(lambda x:1 if x=='Laboratory Technician' else 0)

df.drop('JobRole', axis=1, inplace=True)

In [57]: # Employees are more likey to the drop the job if the employee's hourly rate < 65

df['HourlyRate_bool'] = df['HourlyRate'].apply(lambda x:1 if x<65 else 0)

df.drop('HourlyRate', axis=1, inplace=True)

In [58]: # Employees are more likey to the drop the job if the employee's MonthlyIncome < 4000

df['MonthlyIncome_bool'] = df['MonthlyIncome'].apply(lambda x:1 if x<4000 else 0)

df.drop('MonthlyIncome', axis=1, inplace=True)

In [59]: # Rate of attrition of employees is high if NumCompaniesWorked < 3

df['NumCompaniesWorked_bool'] = df['NumCompaniesWorked'].apply(lambda x:1 if x>3 else 0)

df.drop('NumCompaniesWorked', axis=1, inplace=True)

In [60]: # Employees are more likey to the drop the job if the employee's TotalWorkingYears < 8

df['TotalWorkingYears_bool'] = df['TotalWorkingYears'].apply(lambda x:1 if x<8 else 0)

df.drop('TotalWorkingYears', axis=1, inplace=True)

In [61]: # Employees are more likey to the drop the job if the employee's YearsAtCompany < 3

df['YearsAtCompany_bool'] = df['YearsAtCompany'].apply(lambda x:1 if x<3 else 0)

df.drop('YearsAtCompany', axis=1, inplace=True)

In [62]: # Employees are more likey to the drop the job if the employee's YearsInCurrentRole < 3

df['YearsInCurrentRole_bool'] = df['YearsInCurrentRole'].apply(lambda x:1 if x<3 else 0)

df.drop('YearsInCurrentRole', axis=1, inplace=True)

In [63]: # Employees are more likey to the drop the job if the employee's YearsSinceLastPromotion < 1

df['YearsSinceLastPromotion_bool'] = df['YearsSinceLastPromotion'].apply(lambda x:1 if x<1 else 0)

df.drop('YearsSinceLastPromotion', axis=1, inplace=True)

In [64]: # Employees are more likey to the drop the job if the employee's YearsWithCurrManager < 1

df['YearsWithCurrManager_bool'] = df['YearsWithCurrManager'].apply(lambda x:1 if x<1 else 0)

df.drop('YearsWithCurrManager', axis=1, inplace=True)

In [65]: df['Gender'] = df['Gender'].apply(lambda x:1 if x=='Female' else 0)

In [66]: df.drop('MonthlyRate', axis=1, inplace=True)

df.drop('PercentSalaryHike', axis=1, inplace=True)

In [67]: convert_category = ['BusinessTravel','Education','EducationField','MaritalStatus','StockOptionLevel','OverTime','Gender','

for col in convert_category:
df[col] = df[col].astype('category')

In [69]: #separate the categorical and numerical data

X_categorical = df.select_dtypes(include=['category'])
X_numerical = df.select_dtypes(include=['int64'])
X_numerical.drop('Attrition', axis=1, inplace=True)

In [70]: y = df['Attrition']

In [68]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1470 entries, 0 to 1469
Data columns (total 25 columns):
Attrition 1470 non-null int64
BusinessTravel 1470 non-null category
Education 1470 non-null category
EducationField 1470 non-null category
Gender 1470 non-null category
JobLevel 1470 non-null int64
MaritalStatus 1470 non-null category
OverTime 1470 non-null category
PerformanceRating 1470 non-null int64
StockOptionLevel 1470 non-null category
TrainingTimesLastYear 1470 non-null category
Total_Satisfaction_bool 1470 non-null int64
Age_bool 1470 non-null int64
DailyRate_bool 1470 non-null int64
Department_bool 1470 non-null int64
DistanceFromHome_bool 1470 non-null int64
JobRole_bool 1470 non-null int64
HourlyRate_bool 1470 non-null int64
MonthlyIncome_bool 1470 non-null int64
NumCompaniesWorked_bool 1470 non-null int64
TotalWorkingYears_bool 1470 non-null int64
YearsAtCompany_bool 1470 non-null int64
YearsInCurrentRole_bool 1470 non-null int64
YearsSinceLastPromotion_bool 1470 non-null int64
YearsWithCurrManager_bool 1470 non-null int64
dtypes: category(8), int64(17)
memory usage: 208.2 KB

In [72]: #concat the categorical and numerical values

X_all = pd.concat([X_categorical, X_numerical], axis=1)

X_all.head()

Out[72]: 0 1 2 3 4 5 6 7 8 9 ... DistanceFromHome_bool JobRole_bool HourlyRate_bool MonthlyIncome_bool N

0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 ... 0 0 0 0

1 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 0 0 1 0

2 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 0 1 0 1

3 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 ... 0 0 1 1

4 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0 1 1 1

5 rows × 48 columns

In [71]: # One HOt Encoding Categorical Features

onehotencoder = OneHotEncoder()

X_categorical = onehotencoder.fit_transform(X_categorical).toarray()
X_categorical = pd.DataFrame(X_categorical)
X_categorical
Out[71]: 0 1 2 3 4 5 6 7 8 9 ... 22 23 24 25 26 27 28 29 30 31

0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0

1 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

2 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

3 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

4 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

5 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

6 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

7 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

8 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

9 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

10 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0

11 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

12 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0

13 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

14 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0

15 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 ... 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0

16 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0

17 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

18 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

19 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

20 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0

21 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0

22 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0

23 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0

24 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

25 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

26 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0

27 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

28 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0

29 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

1440 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

1441 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 ... 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

1442 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

1443 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

1444 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0

1445 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

1446 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

1447 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0

1448 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0

1449 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0

1450 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

1451 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 ... 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0

1452 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 ... 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

1453 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

1454 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

1455 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
0 1 2 3 4 5 6 7 8 9 ... 22 23 24 25 26 27 28 29 30 31

1456 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 ... 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

1457 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

1458 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0

1459 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

1460 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

1461 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

1462 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

1463 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

1464 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

1465 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

1466 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0

1467 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 ... 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0

1468 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

1469 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

1470 rows × 32 columns

In [73]: X_all.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1470 entries, 0 to 1469
Data columns (total 48 columns):
0 1470 non-null float64
1 1470 non-null float64
2 1470 non-null float64
3 1470 non-null float64
4 1470 non-null float64
5 1470 non-null float64
6 1470 non-null float64
7 1470 non-null float64
8 1470 non-null float64
9 1470 non-null float64
10 1470 non-null float64
11 1470 non-null float64
12 1470 non-null float64
13 1470 non-null float64
14 1470 non-null float64
15 1470 non-null float64
16 1470 non-null float64
17 1470 non-null float64
18 1470 non-null float64
19 1470 non-null float64
20 1470 non-null float64
21 1470 non-null float64
22 1470 non-null float64
23 1470 non-null float64
24 1470 non-null float64
25 1470 non-null float64
26 1470 non-null float64
27 1470 non-null float64
28 1470 non-null float64
29 1470 non-null float64
30 1470 non-null float64
31 1470 non-null float64
JobLevel 1470 non-null int64
PerformanceRating 1470 non-null int64
Total_Satisfaction_bool 1470 non-null int64
Age_bool 1470 non-null int64
DailyRate_bool 1470 non-null int64
Department_bool 1470 non-null int64
DistanceFromHome_bool 1470 non-null int64
JobRole_bool 1470 non-null int64
HourlyRate_bool 1470 non-null int64
MonthlyIncome_bool 1470 non-null int64
NumCompaniesWorked_bool 1470 non-null int64
TotalWorkingYears_bool 1470 non-null int64
YearsAtCompany_bool 1470 non-null int64
YearsInCurrentRole_bool 1470 non-null int64
YearsSinceLastPromotion_bool 1470 non-null int64
YearsWithCurrManager_bool 1470 non-null int64
dtypes: float64(32), int64(16)
memory usage: 551.4 KB
Split Data
In [74]: X_train,X_test, y_train, y_test = train_test_split(X_all,y, test_size=0.30)

In [75]: print(f"Train data shape: {X_train.shape}, Test Data Shape {X_test.shape}")

Train data shape: (1029, 48), Test Data Shape (441, 48)

In [76]: X_train.head()

Out[76]: 0 1 2 3 4 5 6 7 8 9 ... DistanceFromHome_bool JobRole_bool HourlyRate_bool MonthlyIncome_boo

772 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0 0 1 1

1403 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 1 0 0 0

9 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 1 0 0 0

662 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0 0 1 1

1387 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 ... 0 0 0 0

5 rows × 48 columns

Train Data
In [77]: # Function that runs the requested algorithm and returns the accuracy metrics
def fit_ml_algo(algo, X_train,y_train, cv):

# One Pass
model = algo.fit(X_train, y_train)
acc = round(model.score(X_train, y_train) * 100, 2)

# Cross Validation
train_pred = model_selection.cross_val_predict(algo,X_train,y_train,cv=cv,n_jobs = -1)

# Cross-validation accuracy metric

acc_cv = round(metrics.accuracy_score(y_train, train_pred) * 100, 2)

return train_pred, acc, acc_cv

Logistic Regression
In [78]: # Logistic Regression
start_time = time.time()
train_pred_log, acc_log, acc_cv_log = fit_ml_algo(LogisticRegression(), X_train,y_train, 10)
log_time = (time.time() - start_time)
print("Accuracy: %s" % acc_log)
print("Accuracy CV 10-Fold: %s" % acc_cv_log)
print("Running Time: %s" % datetime.timedelta(seconds=log_time))

Accuracy: 89.89
Accuracy CV 10-Fold: 87.66
Running Time: 0:00:02.534987

Support Vector Machine

In [79]: # SVC
start_time = time.time()
train_pred_svc, acc_svc, acc_cv_svc = fit_ml_algo(SVC(),X_train,y_train,10)
svc_time = (time.time() - start_time)
print("Accuracy: %s" % acc_svc)
print("Accuracy CV 10-Fold: %s" % acc_cv_svc)
print("Running Time: %s" % datetime.timedelta(seconds=svc_time))

Accuracy: 87.76
Accuracy CV 10-Fold: 86.1
Running Time: 0:00:00.207994

Linear Support Vector Machines

In [80]: # Linear SVC
start_time = time.time()
train_pred_svc, acc_linear_svc, acc_cv_linear_svc = fit_ml_algo(LinearSVC(),X_train, y_train,10)
linear_svc_time = (time.time() - start_time)
print("Accuracy: %s" % acc_linear_svc)
print("Accuracy CV 10-Fold: %s" % acc_cv_linear_svc)
print("Running Time: %s" % datetime.timedelta(seconds=linear_svc_time))
Accuracy: 89.5
Accuracy CV 10-Fold: 87.27
Running Time: 0:00:00.269995

K Nearest Neighbour
In [81]: # K Nearest Neighbour
start_time = time.time()
train_pred_knn, acc_knn, acc_cv_knn = fit_ml_algo(KNeighborsClassifier(n_neighbors = 3),X_train,y_train,10)
knn_time = (time.time() - start_time)
print("Accuracy: %s" % acc_knn)
print("Accuracy CV 10-Fold: %s" % acc_cv_knn)
print("Running Time: %s" % datetime.timedelta(seconds=knn_time))

Accuracy: 89.21
Accuracy CV 10-Fold: 83.28
Running Time: 0:00:00.239998

Gaussian Naive Bayes

In [82]: # Gaussian Naive Bayes
start_time = time.time()
train_pred_gaussian, acc_gaussian, acc_cv_gaussian = fit_ml_algo(GaussianNB(),X_train,y_train,10)
gaussian_time = (time.time() - start_time)
print("Accuracy: %s" % acc_gaussian)
print("Accuracy CV 10-Fold: %s" % acc_cv_gaussian)
print("Running Time: %s" % datetime.timedelta(seconds=gaussian_time))

Accuracy: 80.17
Accuracy CV 10-Fold: 77.45
Running Time: 0:00:00.064000

Perceptron
In [83]: # Perceptron
start_time = time.time()
train_pred_gaussian, acc_perceptron, acc_cv_perceptron = fit_ml_algo(Perceptron(),X_train,y_train,10)
perceptron_time = (time.time() - start_time)
print("Accuracy: %s" % acc_perceptron)
print("Accuracy CV 10-Fold: %s" % acc_cv_perceptron)
print("Running Time: %s" % datetime.timedelta(seconds=perceptron_time))

Accuracy: 87.27
Accuracy CV 10-Fold: 82.12
Running Time: 0:00:00.073985

Stochastic Gradient Descent

In [84]: # Stochastic Gradient Descent
start_time = time.time()
train_pred_sgd, acc_sgd, acc_cv_sgd = fit_ml_algo(SGDClassifier(),X_train, y_train,10)
sgd_time = (time.time() - start_time)
print("Accuracy: %s" % acc_sgd)
print("Accuracy CV 10-Fold: %s" % acc_cv_sgd)
print("Running Time: %s" % datetime.timedelta(seconds=sgd_time))

Accuracy: 86.78
Accuracy CV 10-Fold: 83.97
Running Time: 0:00:00.096004

Decision Tree
In [85]: # Decision Tree
start_time = time.time()
train_pred_dt, acc_dt, acc_cv_dt = fit_ml_algo(DecisionTreeClassifier(),X_train, y_train,10)
dt_time = (time.time() - start_time)
print("Accuracy: %s" % acc_dt)
print("Accuracy CV 10-Fold: %s" % acc_cv_dt)
print("Running Time: %s" % datetime.timedelta(seconds=dt_time))

Accuracy: 100.0
Accuracy CV 10-Fold: 78.62
Running Time: 0:00:00.098000

Gradient Boosting Trees

In [86]: # Gradient Boosting Trees
start_time = time.time()
train_pred_gbt, acc_gbt, acc_cv_gbt = fit_ml_algo(GradientBoostingClassifier(),X_train, y_train,10)
gbt_time = (time.time() - start_time)
print("Accuracy: %s" % acc_gbt)
print("Accuracy CV 10-Fold: %s" % acc_cv_gbt)
print("Running Time: %s" % datetime.timedelta(seconds=gbt_time))

Accuracy: 93.0
Accuracy CV 10-Fold: 87.17
Running Time: 0:00:00.702999

Random Forest
In [87]: # Random Forest
start_time = time.time()
train_pred_dt, acc_rf, acc_cv_rf = fit_ml_algo(RandomForestClassifier(n_estimators=100),X_train, y_train,10)
rf_time = (time.time() - start_time)
print("Accuracy: %s" % acc_rf)
print("Accuracy CV 10-Fold: %s" % acc_cv_rf)
print("Running Time: %s" % datetime.timedelta(seconds=rf_time))

Accuracy: 100.0
Accuracy CV 10-Fold: 85.33
Running Time: 0:00:00.789033

CatBoost Classifier
In [88]: # Define the categorical features for the CatBoost model
cat_features = np.where(X_train.dtypes != np.float)[0]
cat_features

Out[88]: array([32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47],
dtype=int64)

In [89]: # pool training data and categorical feature labels together

train_pool = Pool(X_train, y_train,cat_features)

In [90]: # CatBoost
catboost_model = CatBoostClassifier(iterations=1000,custom_loss=['Accuracy'],loss_function='Logloss')

# Fit CatBoost model

catboost_model.fit(train_pool,plot=True)

# CatBoost accuracy
acc_catboost = round(catboost_model.score(X_train, y_train) * 100, 2)

MetricVisualizer(layout=Layout(align_self='stretch', height='500px'))
990: learn: 0.0269810 test: 0.4227576 best: 0.3451640 (189) total: 8m 16s remaining: 4.51s
991: learn: 0.0269395 test: 0.4228075 best: 0.3451640 (189) total: 8m 16s remaining: 4.01s
992: learn: 0.0269058 test: 0.4228815 best: 0.3451640 (189) total: 8m 17s remaining: 3.51s
993: learn: 0.0268638 test: 0.4230957 best: 0.3451640 (189) total: 8m 18s remaining: 3.01s
994: learn: 0.0268283 test: 0.4231735 best: 0.3451640 (189) total: 8m 18s remaining: 2.5s
995: learn: 0.0267771 test: 0.4232267 best: 0.3451640 (189) total: 8m 19s remaining: 2s
996: learn: 0.0267233 test: 0.4233080 best: 0.3451640 (189) total: 8m 19s remaining: 1.5s
997: learn: 0.0266916 test: 0.4234219 best: 0.3451640 (189) total: 8m 20s remaining: 1s
998: learn: 0.0266528 test: 0.4236120 best: 0.3451640 (189) total: 8m 20s remaining: 501ms
999: learn: 0.0266080 test: 0.4235866 best: 0.3451640 (189) total: 8m 21s remaining: 0us

Training Model Results

In [93]: models = pd.DataFrame({
'Model': ['Logistic Regression','SVM','Linear SVC','KNN','Naive Bayes','Perceptron',
'Stochastic Gradient Decent','Decision Tree', 'Gradient Boosting Trees','Random Forest',
'CatBoost'],
'Score': [
acc_log,
acc_svc,
acc_linear_svc,
acc_knn,
acc_gaussian,
acc_perceptron,
acc_sgd,
acc_dt,
acc_gbt,
acc_rf,
acc_catboost
]})
models.sort_values(by='Score', ascending=False)

Out[93]: Model Score

7 Decision Tree 100.00

9 Random Forest 100.00

10 CatBoost 95.82

8 Gradient Boosting Trees 93.00

0 Logistic Regression 89.89

2 Linear SVC 89.50

3 KNN 89.21

1 SVM 87.76

5 Perceptron 87.27

6 Stochastic Gradient Decent 86.78

4 Naive Bayes 80.17

In [94]: cv_models = pd.DataFrame({

'Model': ['Logistic Regression','SVM','Linear SVC','KNN','Naive Bayes','Perceptron',
'Stochastic Gradient Decent','Decision Tree', 'Gradient Boosting Trees','Random Forest',
'CatBoost'],
'Score': [
acc_cv_log,
acc_cv_svc,
acc_cv_linear_svc,
acc_cv_knn,
acc_cv_gaussian,
acc_cv_perceptron,
acc_cv_sgd,
acc_cv_dt,
acc_cv_gbt,
acc_cv_rf,
acc_cv_catboost
]})
cv_models.sort_values(by='Score', ascending=False)
Out[94]: Model Score

0 Logistic Regression 87.66

2 Linear SVC 87.27

8 Gradient Boosting Trees 87.17

10 CatBoost 86.49

1 SVM 86.10

9 Random Forest 85.33

6 Stochastic Gradient Decent 83.97

3 KNN 83.28

5 Perceptron 82.12

7 Decision Tree 78.62

4 Naive Bayes 77.45

Predict Data using Logistic Regression

In [95]: model = LogisticRegression().fit(X_train, y_train)

In [96]: predictions = model.predict(X_test)

In [97]: pred_df = pd.DataFrame(index=X_test.index)

In [98]: pred_df['Attrition'] = predictions

pred_df.head()

Out[98]: Attrition

71 0

464 0

294 0

1230 0

1181 0

In [99]: # Cross-validation accuracy metric

score = round(metrics.accuracy_score(y_test, predictions) * 100, 2)

In [100… print("Accuracy: %s" % score)

Accuracy: 87.76

In [101… print(classification_report(y_test, predictions))

precision recall f1-score support

0 0.90 0.96 0.93 375

1 0.64 0.42 0.51 66

accuracy 0.88 441

macro avg 0.77 0.69 0.72 441
weighted avg 0.86 0.88 0.87 441

In [102… # get importance

importance = model.coef_[0]
# summarize feature importance
for i,v in enumerate(importance):
print('Feature: %0d, Score: %.5f' % (i,v))
# plot feature importance
plt.bar([x for x in range(len(importance))], importance)
plt.show()
Feature: 0, Score: -0.83977
Feature: 1, Score: 0.78917
Feature: 2, Score: 0.05069
Feature: 3, Score: -0.02039
Feature: 4, Score: 0.11335
Feature: 5, Score: 0.02325
Feature: 6, Score: 0.01159
Feature: 7, Score: -0.12771
Feature: 8, Score: 0.20678
Feature: 9, Score: -0.28549
Feature: 10, Score: 0.12482
Feature: 11, Score: -0.28421
Feature: 12, Score: -0.39745
Feature: 13, Score: 0.63564
Feature: 14, Score: 0.13060
Feature: 15, Score: -0.13052
Feature: 16, Score: -0.29474
Feature: 17, Score: 0.11663
Feature: 18, Score: 0.17819
Feature: 19, Score: -0.92836
Feature: 20, Score: 0.92844
Feature: 21, Score: 0.70871
Feature: 22, Score: -0.35084
Feature: 23, Score: -0.43281
Feature: 24, Score: 0.07503
Feature: 25, Score: 0.89239
Feature: 26, Score: -0.12934
Feature: 27, Score: 0.04080
Feature: 28, Score: -0.24705
Feature: 29, Score: 0.27531
Feature: 30, Score: -0.46624
Feature: 31, Score: -0.36578
Feature: 32, Score: 0.05549
Feature: 33, Score: -0.27119
Feature: 34, Score: -1.16726
Feature: 35, Score: 0.83722
Feature: 36, Score: 0.26956
Feature: 37, Score: -0.91975
Feature: 38, Score: 0.81994
Feature: 39, Score: 0.84985
Feature: 40, Score: 0.27492
Feature: 41, Score: 0.83162
Feature: 42, Score: 0.80548
Feature: 43, Score: 0.36834
Feature: 44, Score: 0.42694
Feature: 45, Score: -0.01816
Feature: 46, Score: -0.43036
Feature: 47, Score: 0.92481

In [ ]:
Employee Attrition Prediction

Employee Attrition Prediction

Age : 18-80

BusinessTravel : Rarely Frequently No Travel

Daily Rate : 100-1600

Department : Research & Development Human Resources Sales

Distance From Home : 1-29

Education : 1-5

Education Field : Life Sciences Medical Marketing Technical

Degree Human Resources Other

Environment Satisfaction : 1-4

Gender : Male Female

Hourly Rate : 30-100

Job Involvement : 1-4

Job Level : 1-5

Job Role : Sales Executive Research Scientist Laboratory Technician

Manufacturing Director Healthcare Representative Manager Sales
Representative Research Director Human Resources

Job Satisfaction : 1-4

Marital Status : Married Single Divorced

Monthly Income : 1000-2000 (1000-20000)

Number of Companies Worked in : 0-9

Over Time : Yes No

Performance Rating : 1-4

Relationship Satisfaction : 1-4

Stock Option Level : 0-3

Total Working Years : 0-40

Training Times Last Year : 0-6

Work Life Balance : 1-4

Years At Company : 0-40

Years In Current Role : 0-18

Years Since Last Promotion : 0-15

Years With Curr Manager : 0-17

Predict
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount
41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1
49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences 1
37 Yes Travel_Rarely 1373 Research & Development 2 2 Other 1
33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences 1
27 No Travel_Rarely 591 Research & Development 2 1 Medical 1
32 No Travel_Frequently 1005 Research & Development 2 2 Life Sciences 1
59 No Travel_Rarely 1324 Research & Development 3 3 Medical 1
30 No Travel_Rarely 1358 Research & Development 24 1 Life Sciences 1
38 No Travel_Frequently 216 Research & Development 23 3 Life Sciences 1
36 No Travel_Rarely 1299 Research & Development 27 3 Medical 1
35 No Travel_Rarely 809 Research & Development 16 3 Medical 1
29 No Travel_Rarely 153 Research & Development 15 2 Life Sciences 1
31 No Travel_Rarely 670 Research & Development 26 1 Life Sciences 1
34 No Travel_Rarely 1346 Research & Development 19 2 Medical 1
28 Yes Travel_Rarely 103 Research & Development 24 3 Life Sciences 1
29 No Travel_Rarely 1389 Research & Development 21 4 Life Sciences 1
32 No Travel_Rarely 334 Research & Development 5 2 Life Sciences 1
22 No Non-Travel 1123 Research & Development 16 2 Medical 1
53 No Travel_Rarely 1219 Sales 2 4 Life Sciences 1
38 No Travel_Rarely 371 Research & Development 2 3 Life Sciences 1
24 No Non-Travel 673 Research & Development 11 2 Other 1
36 Yes Travel_Rarely 1218 Sales 9 4 Life Sciences 1
34 No Travel_Rarely 419 Research & Development 7 4 Life Sciences 1
21 No Travel_Rarely 391 Research & Development 15 2 Life Sciences 1
34 Yes Travel_Rarely 699 Research & Development 6 1 Medical 1
53 No Travel_Rarely 1282 Research & Development 5 3 Other 1
32 Yes Travel_Frequently 1125 Research & Development 16 1 Life Sciences 1
42 No Travel_Rarely 691 Sales 8 4 Marketing 1
44 No Travel_Rarely 477 Research & Development 7 4 Medical 1
46 No Travel_Rarely 705 Sales 2 4 Marketing 1
33 No Travel_Rarely 924 Research & Development 2 3 Medical 1
44 No Travel_Rarely 1459 Research & Development 10 4 Other 1
30 No Travel_Rarely 125 Research & Development 9 2 Medical 1
39 Yes Travel_Rarely 895 Sales 5 3 Technical Degree 1
24 Yes Travel_Rarely 813 Research & Development 1 3 Medical 1
43 No Travel_Rarely 1273 Research & Development 2 2 Medical 1
50 Yes Travel_Rarely 869 Sales 3 2 Marketing 1
35 No Travel_Rarely 890 Sales 2 3 Marketing 1
36 No Travel_Rarely 852 Research & Development 5 4 Life Sciences 1
33 No Travel_Frequently 1141 Sales 1 3 Life Sciences 1
35 No Travel_Rarely 464 Research & Development 4 2 Other 1
27 No Travel_Rarely 1240 Research & Development 2 4 Life Sciences 1
26 Yes Travel_Rarely 1357 Research & Development 25 3 Life Sciences 1
27 No Travel_Frequently 994 Sales 8 3 Life Sciences 1
30 No Travel_Frequently 721 Research & Development 1 2 Medical 1
41 Yes Travel_Rarely 1360 Research & Development 12 3 Technical Degree 1
34 No Non-Travel 1065 Sales 23 4 Marketing 1
37 No Travel_Rarely 408 Research & Development 19 2 Life Sciences 1
46 No Travel_Frequently 1211 Sales 5 4 Marketing 1
35 No Travel_Rarely 1229 Research & Development 8 1 Life Sciences 1
48 Yes Travel_Rarely 626 Research & Development 1 2 Life Sciences 1
28 Yes Travel_Rarely 1434 Research & Development 5 4 Technical Degree 1
44 No Travel_Rarely 1488 Sales 1 5 Marketing 1
35 No Non-Travel 1097 Research & Development 11 2 Medical 1
26 No Travel_Rarely 1443 Sales 23 3 Marketing 1
33 No Travel_Frequently 515 Research & Development 1 2 Life Sciences 1
35 No Travel_Frequently 853 Sales 18 5 Life Sciences 1
35 No Travel_Rarely 1142 Research & Development 23 4 Medical 1
31 No Travel_Rarely 655 Research & Development 7 4 Life Sciences 1
37 No Travel_Rarely 1115 Research & Development 1 4 Life Sciences 1
32 No Travel_Rarely 427 Research & Development 1 3 Medical 1
38 No Travel_Frequently 653 Research & Development 29 5 Life Sciences 1
50 No Travel_Rarely 989 Research & Development 7 2 Medical 1
59 No Travel_Rarely 1435 Sales 25 3 Life Sciences 1
36 No Travel_Rarely 1223 Research & Development 8 3 Technical Degree 1
55 No Travel_Rarely 836 Research & Development 8 3 Medical 1
36 No Travel_Frequently 1195 Research & Development 11 3 Life Sciences 1

Employee Attration Synopsis Report
No ratings yet
Employee Attration Synopsis Report
4 pages
DOCUMENTATION12
No ratings yet
DOCUMENTATION12
42 pages
Predictive Employee Turnover Model
No ratings yet
Predictive Employee Turnover Model
1 page
Summer Internship Report
No ratings yet
Summer Internship Report
24 pages
Applsci 13 00267
No ratings yet
Applsci 13 00267
8 pages
Abstract:: Disadvantages
No ratings yet
Abstract:: Disadvantages
2 pages
IBM HR Analytics For Employee Attrition and Performance Prediction
No ratings yet
IBM HR Analytics For Employee Attrition and Performance Prediction
44 pages
AIP - Aip 202501 0006
No ratings yet
AIP - Aip 202501 0006
16 pages
Db15 Conference
No ratings yet
Db15 Conference
6 pages
HR Review1
No ratings yet
HR Review1
11 pages
Assighment3 4 AI Projecct
No ratings yet
Assighment3 4 AI Projecct
58 pages
Employee Attrition Prediction Analysis Report
No ratings yet
Employee Attrition Prediction Analysis Report
6 pages
DATA4800 Report
No ratings yet
DATA4800 Report
6 pages
Attrition Prediction Docs
No ratings yet
Attrition Prediction Docs
27 pages
Conference Paper
No ratings yet
Conference Paper
5 pages
CA Cover Sheet For Submissions
No ratings yet
CA Cover Sheet For Submissions
9 pages
AKSHAY VERMA-symbiosis-pgdds-Employees Turnover Prediction System
No ratings yet
AKSHAY VERMA-symbiosis-pgdds-Employees Turnover Prediction System
62 pages
Employee Attrition Prediction Using Machine Learning
No ratings yet
Employee Attrition Prediction Using Machine Learning
9 pages
Peerj Cs 09 1570
No ratings yet
Peerj Cs 09 1570
12 pages
Report
No ratings yet
Report
45 pages
Employee Attrition Analysis
100% (1)
Employee Attrition Analysis
16 pages
HR Analytics Synopsis
50% (2)
HR Analytics Synopsis
3 pages
Employee Attrition Prediction Using Machine Learning
No ratings yet
Employee Attrition Prediction Using Machine Learning
9 pages
Batch 16
No ratings yet
Batch 16
8 pages
Research Paper
No ratings yet
Research Paper
5 pages
Applsci 12 06424
No ratings yet
Applsci 12 06424
17 pages
Employee Attrition Prediction Report
No ratings yet
Employee Attrition Prediction Report
66 pages
Shetu Srivastav (11NT22IS200 T)
No ratings yet
Shetu Srivastav (11NT22IS200 T)
30 pages
Employee Attrition PREDICTION Using Machine Learning
No ratings yet
Employee Attrition PREDICTION Using Machine Learning
11 pages
Ataiml 02.04 04
No ratings yet
Ataiml 02.04 04
14 pages
Employee Attrition Classification
No ratings yet
Employee Attrition Classification
16 pages
Advanced Attrition Forecasting Model
No ratings yet
Advanced Attrition Forecasting Model
6 pages
Cdu 1121 09
No ratings yet
Cdu 1121 09
10 pages
BerkeGündüz MelihAydın Cmpe442 Training Report
No ratings yet
BerkeGündüz MelihAydın Cmpe442 Training Report
14 pages
Human Retention Using Data Science
No ratings yet
Human Retention Using Data Science
16 pages
941-Article Text-9536-1-10-20240830
No ratings yet
941-Article Text-9536-1-10-20240830
12 pages
Is 451 Report 1
No ratings yet
Is 451 Report 1
4 pages
Iinx Project Summary
No ratings yet
Iinx Project Summary
20 pages
Emloyee Attrition and Retention
No ratings yet
Emloyee Attrition and Retention
17 pages
Final Capstone Project Report
100% (1)
Final Capstone Project Report
35 pages
Retention Is All You Need
No ratings yet
Retention Is All You Need
7 pages
Predictive Modeling For Employee Attrition Group 7
No ratings yet
Predictive Modeling For Employee Attrition Group 7
5 pages
MSC Salunkhe T P 2018
No ratings yet
MSC Salunkhe T P 2018
79 pages
Employee Attrition Prediction
100% (1)
Employee Attrition Prediction
21 pages
CSEIT206654
No ratings yet
CSEIT206654
7 pages
Employee Turnover Prediction Project
No ratings yet
Employee Turnover Prediction Project
10 pages
Employee Future Prediction
No ratings yet
Employee Future Prediction
3 pages
RESEARCH PAPER (HR Analytics)
No ratings yet
RESEARCH PAPER (HR Analytics)
11 pages
Employee Turnover
No ratings yet
Employee Turnover
19 pages
(Slides) Module 8 (Employee Attrition Prediction)
No ratings yet
(Slides) Module 8 (Employee Attrition Prediction)
100 pages
Final Report) Employee
No ratings yet
Final Report) Employee
18 pages
HR Analytics - Employee Attrition Analysis Using Random Forest
No ratings yet
HR Analytics - Employee Attrition Analysis Using Random Forest
7 pages
Employee Attrition Prediction Using Machine Learning Models: A Review Paper
No ratings yet
Employee Attrition Prediction Using Machine Learning Models: A Review Paper
27 pages
Why An Employee Leaves Predicting Using
No ratings yet
Why An Employee Leaves Predicting Using
23 pages
Employee Performance Prediction Abstract
No ratings yet
Employee Performance Prediction Abstract
2 pages
Predicting Employee Attrition Using Machine Learning
No ratings yet
Predicting Employee Attrition Using Machine Learning
21 pages
Predicting Employee Attrition with Data Mining
No ratings yet
Predicting Employee Attrition with Data Mining
6 pages
Tentative Research Topic
No ratings yet
Tentative Research Topic
4 pages
1 s2.0 S2772662224000651 Main
No ratings yet
1 s2.0 S2772662224000651 Main
17 pages
Mba Marketing Project Report
88% (64)
Mba Marketing Project Report
72 pages
Contractor Agreement Guide
80% (843)
Contractor Agreement Guide
3 pages
RoleOf It Technology in Banking Sector
88% (24)
RoleOf It Technology in Banking Sector
70 pages
Project Report On Flipkart
78% (40)
Project Report On Flipkart
54 pages
Fraud Detection Project
50% (4)
Fraud Detection Project
29 pages
Industrial Relations and Labour Laws, 6th - S.C. Srivastava PDF
77% (62)
Industrial Relations and Labour Laws, 6th - S.C. Srivastava PDF
1,043 pages
Project Report Marketing Strategy Flipkart
95% (21)
Project Report Marketing Strategy Flipkart
83 pages
A Study On Customer Satisfaction Towards Online Shopping
75% (65)
A Study On Customer Satisfaction Towards Online Shopping
71 pages
Project Report On Data Analytics
50% (4)
Project Report On Data Analytics
44 pages
Summer Internship Report MBA 3rd Sem
No ratings yet
Summer Internship Report MBA 3rd Sem
79 pages
PROJECT REPORT ON "E - Commerce"
79% (34)
PROJECT REPORT ON "E - Commerce"
58 pages
MBA Project Report On HR
73% (15)
MBA Project Report On HR
66 pages
Impact of Social Media on Consumer Behavior
100% (2)
Impact of Social Media on Consumer Behavior
86 pages
Project Report On Artificial Intelligence
86% (50)
Project Report On Artificial Intelligence
23 pages
Project Report On Cyber Security
65% (23)
Project Report On Cyber Security
60 pages
Research Methodology MCQ Questions With Answers
81% (783)
Research Methodology MCQ Questions With Answers
60 pages
Mba 2ND Sem Project Report
80% (20)
Mba 2ND Sem Project Report
30 pages
Consumer Behaviour Complete Project Report
80% (237)
Consumer Behaviour Complete Project Report
91 pages
Finance Mba Project Report
100% (16)
Finance Mba Project Report
95 pages
Final Project Report On Digital Marketing
70% (181)
Final Project Report On Digital Marketing
88 pages
100 Finance Project Topics For MBA Projects
90% (21)
100 Finance Project Topics For MBA Projects
6 pages
Project Management Skills Lecture Notes
95% (42)
Project Management Skills Lecture Notes
50 pages
RURAL Banking in India Project
70% (30)
RURAL Banking in India Project
107 pages
Business Studies Project On Principle of Management
87% (477)
Business Studies Project On Principle of Management
67 pages
Labor Welfare Management Report
84% (43)
Labor Welfare Management Report
68 pages
A Study On Financial Performance of Banks in India
No ratings yet
A Study On Financial Performance of Banks in India
94 pages
B.Com Banking Project Report
75% (20)
B.Com Banking Project Report
82 pages
Mini Project MBA
100% (3)
Mini Project MBA
35 pages
Impact of Artificial Intelligence in Financial Services Industry
No ratings yet
Impact of Artificial Intelligence in Financial Services Industry
109 pages
Project Report SCDL Final
71% (7)
Project Report SCDL Final
63 pages
JD Data Analyst
No ratings yet
JD Data Analyst
3 pages
Complete
No ratings yet
Complete
27 pages
Al For Restoring and Reconstructing Damaged GHeritage Nikhil
No ratings yet
Al For Restoring and Reconstructing Damaged GHeritage Nikhil
48 pages
Linear Regression Essentials
No ratings yet
Linear Regression Essentials
25 pages
MGEB12 SampleFinal
No ratings yet
MGEB12 SampleFinal
19 pages
(Chapters 1 To 3) Impact of Multiple Taxation On SMEs in Kumasi.-1-1
No ratings yet
(Chapters 1 To 3) Impact of Multiple Taxation On SMEs in Kumasi.-1-1
33 pages
Descriptive Data Mining
No ratings yet
Descriptive Data Mining
8 pages
Citation
No ratings yet
Citation
2 pages
C SAC 2421-Demo
No ratings yet
C SAC 2421-Demo
5 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
Machine Learning for Business
No ratings yet
Machine Learning for Business
42 pages
Delay Analysis of Construction Project
No ratings yet
Delay Analysis of Construction Project
3 pages
Power BI & SAP BW Integration Guide
100% (1)
Power BI & SAP BW Integration Guide
59 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
BIS 541 Ch01 20-21 S
No ratings yet
BIS 541 Ch01 20-21 S
129 pages
ETI New2 Microproject
No ratings yet
ETI New2 Microproject
15 pages
Journal of Constructional Steel Research: Gang Shi, Xi Zhu, Huiyong Ban
No ratings yet
Journal of Constructional Steel Research: Gang Shi, Xi Zhu, Huiyong Ban
15 pages
Olympic Data Analysis Insights
No ratings yet
Olympic Data Analysis Insights
6 pages
To Study On Banking Facilities Availing by College Student With Reference To N.M.D College Gondia
No ratings yet
To Study On Banking Facilities Availing by College Student With Reference To N.M.D College Gondia
4 pages
CMU Statistics Course Catalog
No ratings yet
CMU Statistics Course Catalog
8 pages
Data Summarization in Excel
No ratings yet
Data Summarization in Excel
3 pages
Pa Syllabus
No ratings yet
Pa Syllabus
2 pages
Psych Stats Exam Study Guide
100% (1)
Psych Stats Exam Study Guide
24 pages
Accounting Info in Financial Forecasting
No ratings yet
Accounting Info in Financial Forecasting
26 pages
Time Series Forecasting Guide
No ratings yet
Time Series Forecasting Guide
63 pages
Linear Regression-1: Prof. Asim Tewari IIT Bombay
No ratings yet
Linear Regression-1: Prof. Asim Tewari IIT Bombay
27 pages
TH4457CD
No ratings yet
TH4457CD
131 pages
Literature Review Data Mining Techniques
100% (1)
Literature Review Data Mining Techniques
6 pages
Big Data Insights for Engineers
71% (21)
Big Data Insights for Engineers
21 pages
Data Visualization CAE-1
No ratings yet
Data Visualization CAE-1
8 pages