100% found this document useful (1 vote)

412 views25 pages

Project Notes 1: Purpose of Document

This document provides an overview and initial analysis of a dataset containing information on 14,178 auto insurance policies. It describes the purpose of collecting the data and performing exploratory analysis. It then analyzes various policy attributes like claim status, premium amounts, coverage details, and driver characteristics through univariate and bivariate analyses. The goal is to build a risk analytics model to understand customer renewal potential and claim propensity.

Uploaded by

Aadesh Srivastav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

412 views25 pages

Project Notes 1: Purpose of Document

Uploaded by

Aadesh Srivastav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Project Notes 1

Purpose of document:

 Data Report
o Understand how data was collected in terms of time, frequency, methodology
o Visual inspection of data (rows, columns, descriptive details)
o Understanding of attributes (variable info, renaming if required)
 Initial Exploratory Data Analysis
o Univariate analysis (distribution and spread for every continuous attribute, distribution of
the data in categories for categorical ones)
o Bivariate analysis (relationship between different variables, correlations)

Problem Statement:

The objective of this project is to build a “Risk Analytics model” to understand the renewal potential and
claim propensity of Existing Customers under Personal Auto Insurance Lines.
Will the policy holder initiate a claim (Yes/No) for this policy in the next policy year?

Need of this Study:

Auto insurance is important because it not only covers any physical damage that may occur in an
accident, but also any damage or injury that might be caused because of a vehicular accident or which
may be done upon oneself or one’s vehicle by another vehicle or accident – a falling tree for example.
By analyzing the data set we can understand about the various factors that result in claim renewal by
existing customer base and hence can provide important insights as to how to target existing customer
base to get auto insurance thereby improving safety and security for the existing auto-owners.

The dataset is a primary source of data collected during the year 2015-2016 for an auto insurance
company. Cars manufactured through the year 1957-2000 are recorded in this data set.
The data consists of 127 columns, 14178 rows and several missing and incorrect values

Univariate Analysis:

ClaimStatus: Indicates whether the policy holder has made a claim or not. 1 indicates a claim
and 0 indicates no claim
Initial DataType: Numeric
Type converted To: Factor

0 1
13399 778

Out of 14177 records/policies only 778 users~ 5% have claimed the insurance.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

ClaimFrequency: Gives the number of claims claimed.

Initial DataType: Numeric

Type converted To: Factor

0 1 2 3 4 5
13399 580 144 36 16 2
From claim frequency we can say user mostly claim once or twice in a year and the no. of users claiming
for more than 2 time in is very less 2 users.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

Premium: Premium in $1000

Type: Numeric
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 40.0 71.0 191.2 239.0 2869.0
Contains Outliers.

Most of the premiums are bought for the under 50 thousand.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
Billing Term: How often the premium is paid, i.e. once a year = 1, Three times in a year = 3, or 6 times in
a year = 6.

Initial DataType: Numeric

Type converted To: Factor

1 3 6
6724 482 6971

Looking at the data we can say people prefer premium paid 6 times and once in a year.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

Renewed: Indicates whether the policy has been renewed or not. 1= Renewed, 0 = not renewed

Initial DataType: Numeric

Type converted To: Factor

0 1
7207 6970

Policy renewal has almost equal ratio of zeros and ones.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

DOB1 to DOB5: Date of Birth of the main driver(policy holder), second driver till fifth driver.

DOB1 to DOB5 need to be changed in one single format.

Analysis can be done after removing impurities and getting all DOB’s in single format.

*************************************************************************************

Number of Drivers: Count of the number of drivers in the policy

Initial DataType: Numeric

Type converted To: Factor

1 2 3 4 5
8624 4636 733 150 34

No. of drivers registered into the policy ranges from 1-5 and it seems user prefer to insure only first 2
drivers in most of the case.
AgeUSdriving_1 to AgeUSdriving_5: How long the driver has been driving

Type: Numeric

AgeUSDriving_1 experience ranges from 17 to 79 years and

the distribution of data is left skewed with most of the drivers having experience of 25-35 year.

AgeUSDriving_2 to AgeUSDriving_5 contain impurities and missing values.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Amendment: Number of changes made to the policy during the year (may it is: No. of changes made to
the policy till date from the date of buying)

Initial DataType: Numeric

Type converted To: Factor

0 1 2 3 4 5 6 7 8 10
13622 346 135 42 14 11 3 1 1 2

Amendment in a year ranges from 1 to 10 amendments with mostly no amendments per year and 1-2
amendments/year

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
Coverage Liability: Coverage liability represents the state's financial responsibility law i.e. the minimum
requirement. the first figure is the amount to be paid for injuries per head, the second figure is the
amount for all injuries and the third is the amount per vehicle damaged.

25/50/25 and 20/40/15 is the preferred package by the policy holders.

table(car_insu$CoverageLiability)

20/40/15 25/50/25 30/60/25 None

7061 7077 38 1
table(car_insu$CoveragePD_1)

1000/1000 500/500 None

1 3094 11030

Coverage PD(physical damage):most of the policy holders have opted out for it however there are user
opting for 500/500 and only 1 user have opted for 1000/1000

CoverageMP(medical payment): is none though there is one user with coverage MP of 535 thousand
table(car_insu$CoverageMP)

535 None
1 14119
sum(is.na(car_insu$CoverageMP))
[1] 57

CoverageUMBI and CoverageUMPD(for uninsured or underinsured motorist) is only accepted by 316

policy holder

table(car_insu$CoverageUMBI)

Accepted None
316 13855
> table(car_insu$CoverageUMPD)

Accepted None
316 13855
sum(is.na(car_insu$CoverageUMBI))
[1] 6
sum(is.na(car_insu$CoverageUMPD))
[1] 6

Those who have opted for CoveragePIP_CDW(personal inury protection) have package of around 2569
and 2535 thousand.

table(car_insu$CoveragePIP_CDW)

2535 2569 None

1 90 14036
sum(is.na(car_insu$CoveragePIP_CDW))
[1] 50
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
Distance to work:

For driver 1 distance to work for most of the driver is within 100 miles and for 9 of the drivers distance
to work exceeds the 100 miles and ranges upto 500 miles

Same is the case with driver 2

For driver 3 most of the driver have distance to work as in range of 0-5 miles with maximum miles upto
176
Driver 4 has miles in range 0-20 miles while for driver 5 miles are in range 0-5
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

No. of driver assigned to first vehicle is max upto 2 driver followed by 3 and then 4,5
table(car_insu$DriverAssigned_1)

1 2 3 4 5
12495 1400 248 30 4

>
DOBs of driver1-3

Summary for DOB for driver (1-3)

Moreover there are large NA values for DOBs of driver 4 and 5

sum(is.na(car_insu$DOB4))
[1] 13992
sum(is.na(car_insu$DOB5))
[1] 14135

Data for Engine variable required to be clean as some of data is also provided into CC which needs to be
converted into liter
Variable : ExcludedDriverName_01 - ExcludedDriverName_20

Meaning: An excluded driver is someone you intentionally remove from your auto insurance policy. If
that person uses your car, either with or without your permission, and has an accident, your car
insurance will not provide coverage.

First Person to twentied person declared as an excluded driver.

Type: Character

Number of records in each Variable(ExcludedDriverName)

**************************************************************************

Variable : GaragedZIP_1

Meaning: Zipcode of the place where the first vehicle is parked.

Type: Numeric

**************************************************************************

Variable: Marital Status1 to Marital Status5

Meaning: Marital status of the first driver to fifth driver M or S

Type: Character

New DataType: Factor

**************************************************************************

Variable: Occupation_1 to Occupation_5

Meaning: Occupation of the first driver to fifth driver

Type: Character
New DataType: Factor

**************************************************************************

Variable: Relation_1 to Relation_5

Meaning: Relationship of the driver with the main policy holder. (Relation_1 is onlyself)

Type: Character

NewDataType: Factor

**************************************************************************

Variable: Rental_1

Meaning: First vehicle (if Rental is allowed)

Type: Numeric

NewDataType: Factor (It should be yes or no.. 0 or 1)

**************************************************************************
Variable: Sex_1 to Sex_5

Meaning: Gender of the driver one to driver five

Type: Char

NewDataType: Factor

**************************************************************************

Variable: Surcharge1Unit_1 to Surcharge2Unit_1

Meaning: A surcharge is an amount that insurance companies typically add to a person’s insurance
premium, which is the amount he or she pays for insurance, due to some event or occurrence(accident
in which that driver was at fault)

Insurance companies apply a surcharge because they assume that a ticket for a moving violation, such as
reckless driving or speeding, makes it more likely that a person is going to be involved in an accident and
injure someone

First Surcharge of the first vehicle (Y/N) or Second Surcharge of the first vehicle etc.

Type: Character

NewDataType: Factor
MaritalStatus_1 to MaritalStatus_5

Levels are only "M" and "S"

Occupation_1 to Occupation_5

levels in occupation are around 2000.

Relation_1 to Relation_5
Levels are around 200 for relations.

Sex_1 to Sex_5

Levels are only "F" and "M"

Units is a continuous variable hence value is Numerical.
Vehicle Inspected is a categorical variable hence has to be converted to Factor.
The remaining ViolPointsDriver variables has to be converted to a Factor

Units: Number of vehicles covered for that Policy

VehicleInspected_1: First Vehicle inspected or not(Categorical) 1-Inspected 0-Not Inspected
ViolPointsDriver: Nth time the mth driver is scoring a violation point categorical variable.
On analyzing variables from Units to ViolPoints8Driver_5 it is observed that majority of the
ViolPointsDriver columns are all 0’s and hence can be eliminated from the dataset.

Units: Analyzing Units we see that the maximum number of Units occur for 1
Majority of the vehicles have not been inspected

Majority of the vehicles have not been inspected and the maximum number of vehicles are inspected for
units 1 and 2.
From the above Violation Points by Drivers it is apparent that for majority of the policies violation has not
been done and maximum Violation has been done for ViolationPoints1 Driver1.

Year_1:

The variable Year_1 denotes the year in which the first unit (vehicle) was manufactured. Further looking
at the data we find that there are some impurities in the data and the range of valid values is from 1957
to 2011:
Looking at the NULLs/blanks in the data, we see :

All records have some Year_1 values and there is no Null/blanks value.

Make_1:

Make_1 in the dataset denotes the marque/brand of the first unit (vehicle) registered in the policy. This
variable is essential for the study since overall quality of the brand line is set. Some brand vehicles could
be more prone to accidents than others.

Analysing the data in the dataset, we find that:

There are many impurities in the data, the name of the brand are spelled in correctly, moreover there
are some data points with no owner as make_1. We look the data availability for the attribute to find
that 99.4% of the records have values. On a side note, after data cleansing we believe there are
approximately 48 brand in total in the dataset.

Model_1 :
Model_1 has the information pertaining to the model name of the first vehicle insured in the policy.
Looking at the data we find:

Same as the make_1 we see there are many impurities in the data. The Model_1 is particularly
important for determining the claim status since the overall quality of the design and the models being
produced have an impact on the safety factor on field. Looking at how much data we have in the column
we find that about 99 percentage of columns have data points.

Zip:
The zip variable presumably has the zip code information of the zip code where the policy is registered.
This could be important feature in the dataset to indicate how is the tendency of people in each area in
terms of claims being made. To extract meaning out of the zip codes, one reference dataset containing
the US pin code information was merged with car_insurance dataset. We see that:

All records have values for zip column.

All policies were registered in the state of Texas.

Total_Distance_To_Work:

Total_distance_to_work signifies the total distance to work of all the drivers combined in the dataset.
Looking at the dataset, we see that the distance ranges from 1 to 500 units.
All records have total distance to work values.

NoLossSigned:
No loss signed variable signifies if the no loss statement has been signed in the policy. The statement of
no loss basically states that there has been no loss in the period of lapse. Or the time the car was not
insured. This may prove to be an important variable in the model as the indicator could tell us about the
tendency of the customer.

All the records have an indicator value, and 653 of all the customers in the dataset have opted for this
declaration.

Type:
Type variable signifies the type of insurance the record signifies in the dataset. Looking at the dataset,
we observe that:
The nine type of insurance policies have been spread across all the records and there are no NULL values
in the columns. The meaning of each level is being explored for US market.

CancellationType:

The column cancellation type signifies the type of cancellation of a policy if any. Upon exploring the data
we find:

About 2.2% of policy have been cancelled in the dataset and majority have been of the type “NP”. The
significance and meaning of each level is being explored through secondary research.

Insurance Data
No ratings yet
Insurance Data
12 pages
Swedish Insurance Data Analysis
33% (3)
Swedish Insurance Data Analysis
9 pages
Datascience R Project Insurance
No ratings yet
Datascience R Project Insurance
9 pages
B5W3 - End To End Insurance Risk Analytics and Predictive Modeling
No ratings yet
B5W3 - End To End Insurance Risk Analytics and Predictive Modeling
8 pages
File T NG H P N I Dung - Team 3 - VJPE205 1.1 1
No ratings yet
File T NG H P N I Dung - Team 3 - VJPE205 1.1 1
40 pages
French Motor Insurance Claim Analysis
No ratings yet
French Motor Insurance Claim Analysis
41 pages
Car Crash - ML1 - Jishma Kuruchy
No ratings yet
Car Crash - ML1 - Jishma Kuruchy
27 pages
DVT Project For PGP-BABI (2019-20) Done by Saleesh Satheeshcahandran (G6)
No ratings yet
DVT Project For PGP-BABI (2019-20) Done by Saleesh Satheeshcahandran (G6)
3 pages
Data Visualisation - Car Claim Insurance Project
100% (5)
Data Visualisation - Car Claim Insurance Project
6 pages
Vehicle Insurance Claim
No ratings yet
Vehicle Insurance Claim
18 pages
DVT (Car Cliams Insurance) Project
No ratings yet
DVT (Car Cliams Insurance) Project
20 pages
Telematics AXA ULB Revision
No ratings yet
Telematics AXA ULB Revision
23 pages
Specsem f2006 Handouts Francis2
No ratings yet
Specsem f2006 Handouts Francis2
49 pages
Problem Statement-Auto Insurance Project-1
No ratings yet
Problem Statement-Auto Insurance Project-1
3 pages
Insurance Premium Renewal Analysis
100% (3)
Insurance Premium Renewal Analysis
13 pages
Rajendra Ladda DVT Car Insurance Tableau Project
No ratings yet
Rajendra Ladda DVT Car Insurance Tableau Project
8 pages
Consumer Transportation Survey Final
No ratings yet
Consumer Transportation Survey Final
18 pages
Improving Automobile Insurance Ratemaking Using Telematics: Incorporating Mileage and Driver Behaviour Data
No ratings yet
Improving Automobile Insurance Ratemaking Using Telematics: Incorporating Mileage and Driver Behaviour Data
23 pages
CaseStudy Insurance
No ratings yet
CaseStudy Insurance
43 pages
The Use of Telematics Devices To Improve Automobile Insurrance Rates
No ratings yet
The Use of Telematics Devices To Improve Automobile Insurrance Rates
11 pages
Data Visualization Project Nilanjan Das PDF
100% (1)
Data Visualization Project Nilanjan Das PDF
26 pages
Machine Learning Approaches For Auto Insurance Big Data
No ratings yet
Machine Learning Approaches For Auto Insurance Big Data
23 pages
Car Insurance Claims Analysis
50% (2)
Car Insurance Claims Analysis
4 pages
R in Actuarial Pricing Analysis
0% (1)
R in Actuarial Pricing Analysis
40 pages
Risks: Machine Learning Approaches For Auto Insurance Big Data
No ratings yet
Risks: Machine Learning Approaches For Auto Insurance Big Data
23 pages
Car Insurance Case Study - Claims Analysis and Findings
No ratings yet
Car Insurance Case Study - Claims Analysis and Findings
15 pages
Capstone Project Vivek
100% (4)
Capstone Project Vivek
145 pages
Auto Insurance Fraud Detection
No ratings yet
Auto Insurance Fraud Detection
28 pages
Cars Insurance Project Samuel
No ratings yet
Cars Insurance Project Samuel
21 pages
Midterm Project Group 6
No ratings yet
Midterm Project Group 6
41 pages
Exposure To Risk Increases The Excess of Zero Accident Claims Frequency in Automobile Insurance
No ratings yet
Exposure To Risk Increases The Excess of Zero Accident Claims Frequency in Automobile Insurance
29 pages
DS Jan2025
No ratings yet
DS Jan2025
10 pages
FinalReport Life Insurance
80% (5)
FinalReport Life Insurance
34 pages
DBMS (Micro Project)
No ratings yet
DBMS (Micro Project)
11 pages
Probabilistic Estimation Based Data Mining For Discovering Insurance Risks
No ratings yet
Probabilistic Estimation Based Data Mining For Discovering Insurance Risks
19 pages
4 Doc
No ratings yet
4 Doc
23 pages
Solutions: Part 5 Practice Exam 2
No ratings yet
Solutions: Part 5 Practice Exam 2
23 pages
Swedish Motor Insurance Analysis
No ratings yet
Swedish Motor Insurance Analysis
14 pages
Health Insurance Cross Sell Prediction
No ratings yet
Health Insurance Cross Sell Prediction
32 pages
Dataset of An Actual Motor Vehicle Insurance Portfolio
No ratings yet
Dataset of An Actual Motor Vehicle Insurance Portfolio
13 pages
Car Insurance Data Insights
100% (2)
Car Insurance Data Insights
27 pages
Mca - Data Analysis Using R - Manuel
No ratings yet
Mca - Data Analysis Using R - Manuel
3 pages
Data Analytics On Vechicle Insurance Data
No ratings yet
Data Analytics On Vechicle Insurance Data
22 pages
Auto Insurance Output
No ratings yet
Auto Insurance Output
22 pages
Project DVT CarInsurance
No ratings yet
Project DVT CarInsurance
10 pages
Arpita Saha SMDM Coded Project Module 2 10 01 2024 G2 Business Report
No ratings yet
Arpita Saha SMDM Coded Project Module 2 10 01 2024 G2 Business Report
21 pages
Untitled Design
No ratings yet
Untitled Design
9 pages
10 1109@BigData 2018 8622491
No ratings yet
10 1109@BigData 2018 8622491
10 pages
Mini Project DVT
No ratings yet
Mini Project DVT
3 pages
Car Insurance Claim Insights
100% (1)
Car Insurance Claim Insights
3 pages
Problem Statement
No ratings yet
Problem Statement
6 pages
Business Report Suchita Bhovar Coded Project
No ratings yet
Business Report Suchita Bhovar Coded Project
18 pages
Disruptive Innovation For Auto Insurance Entrepreneurs: New Paradigm Using Telematics and Machine Learning
No ratings yet
Disruptive Innovation For Auto Insurance Entrepreneurs: New Paradigm Using Telematics and Machine Learning
14 pages
Revolut Car Insurance Analytics
No ratings yet
Revolut Car Insurance Analytics
9 pages
Insurance Insights with Tableau
No ratings yet
Insurance Insights with Tableau
4 pages
Integration of Traditional and Telematics Data For Efficient Insurance Claims Prediction
No ratings yet
Integration of Traditional and Telematics Data For Efficient Insurance Claims Prediction
17 pages
Car Crash Survival Prediction Business Report
No ratings yet
Car Crash Survival Prediction Business Report
23 pages
Ratemaking - 2
No ratings yet
Ratemaking - 2
57 pages
Exercise 4
No ratings yet
Exercise 4
2 pages
Cycle Time Reduction of Queue Hour Calculation - Final
100% (1)
Cycle Time Reduction of Queue Hour Calculation - Final
37 pages
DVT Dashboard-1
No ratings yet
DVT Dashboard-1
1 page
As 2
No ratings yet
As 2
10 pages
As 2
No ratings yet
As 2
10 pages
Using A Windows 7 Installation Disk: Backup Your Files
No ratings yet
Using A Windows 7 Installation Disk: Backup Your Files
15 pages
Paramore That S What You Get PDF Free
No ratings yet
Paramore That S What You Get PDF Free
7 pages
Case Presentation Grading Rubric
No ratings yet
Case Presentation Grading Rubric
1 page
The Importance of Preserving Cultural Heritage
No ratings yet
The Importance of Preserving Cultural Heritage
3 pages
Gender Equality in Education: Teachers' Role
No ratings yet
Gender Equality in Education: Teachers' Role
5 pages
BBA Bachelor of Business Administration Guru Gobind Singh Indraprastha University BBA-206 Marketing Research
No ratings yet
BBA Bachelor of Business Administration Guru Gobind Singh Indraprastha University BBA-206 Marketing Research
4 pages
Minecraft Mod Loading Error
No ratings yet
Minecraft Mod Loading Error
7 pages
Std09 III Maths em
No ratings yet
Std09 III Maths em
120 pages
Approved Manufacturers List
100% (1)
Approved Manufacturers List
41 pages
Ch-1 Introduction To Artificial Intelligence
No ratings yet
Ch-1 Introduction To Artificial Intelligence
20 pages
Data Analysis and Interpretation Guide
No ratings yet
Data Analysis and Interpretation Guide
4 pages
BΩSS - B1000 NEW
No ratings yet
BΩSS - B1000 NEW
8 pages
ABRITES Commander For VAG Manual
No ratings yet
ABRITES Commander For VAG Manual
108 pages
Lesson Plan: Cambridge Primary Science
No ratings yet
Lesson Plan: Cambridge Primary Science
2 pages
Field Study 2 Module Amurao Micalyne R.
No ratings yet
Field Study 2 Module Amurao Micalyne R.
88 pages
Pixel CO2 Omnifit Brochure
No ratings yet
Pixel CO2 Omnifit Brochure
6 pages
Learning Mastercam X7 Lathe Step by Step James Valentino Joseph Goldenberg Instant Download
100% (1)
Learning Mastercam X7 Lathe Step by Step James Valentino Joseph Goldenberg Instant Download
32 pages
Heritage Institute of Technology, Kolkata: Information For Admission For The Session 2016 - 2017
No ratings yet
Heritage Institute of Technology, Kolkata: Information For Admission For The Session 2016 - 2017
2 pages
Financial Analysis for Accountancy Students
No ratings yet
Financial Analysis for Accountancy Students
18 pages
Testbank and Solutions For Elementary Statistics A Step by Step Approach 11th Edition Bluman
0% (1)
Testbank and Solutions For Elementary Statistics A Step by Step Approach 11th Edition Bluman
17 pages
Biotechnology and Gene Tech Answers F215
No ratings yet
Biotechnology and Gene Tech Answers F215
31 pages
How All Systems Can Collapse Overnight 709 - Martin Armstrong
100% (3)
How All Systems Can Collapse Overnight 709 - Martin Armstrong
26 pages
Market Equilibrium
No ratings yet
Market Equilibrium
5 pages
Online Math Lesson Plan 2
No ratings yet
Online Math Lesson Plan 2
4 pages
Electrical Circuits
No ratings yet
Electrical Circuits
30 pages
TAX LG Taxation Preferential Taxation Sample Problems
No ratings yet
TAX LG Taxation Preferential Taxation Sample Problems
3 pages
Predicting Employees Performance Using Data Mining Techniques
No ratings yet
Predicting Employees Performance Using Data Mining Techniques
12 pages
Chirality & Stereochemistry Guide
No ratings yet
Chirality & Stereochemistry Guide
108 pages
Necromancer List
No ratings yet
Necromancer List
5 pages
813 Magnetic Core Dimensioning Limits in Hydro Generators
No ratings yet
813 Magnetic Core Dimensioning Limits in Hydro Generators
77 pages
News Angling & In-Depth Story Writing
No ratings yet
News Angling & In-Depth Story Writing
27 pages

Project Notes 1: Purpose of Document

Uploaded by

Project Notes 1: Purpose of Document

Uploaded by

Project Notes 1

Need of this Study:

ClaimFrequency: Gives the number of claims claimed.

Initial DataType: Numeric

Premium: Premium in $1000

Most of the premiums are bought for the under 50 thousand.

Initial DataType: Numeric

Initial DataType: Numeric

Policy renewal has almost equal ratio of zeros and ones.

DOB1 to DOB5 need to be changed in one single format.

Number of Drivers: Count of the number of drivers in the policy

Initial DataType: Numeric

AgeUSDriving_1 experience ranges from 17 to 79 years and

AgeUSDriving_2 to AgeUSDriving_5 contain impurities and missing values.

Initial DataType: Numeric

25/50/25 and 20/40/15 is the preferred package by the policy holders.

20/40/15 25/50/25 30/60/25 None

1000/1000 500/500 None

CoverageUMBI and CoverageUMPD(for uninsured or underinsured motorist) is only accepted by 316

2535 2569 None

Same is the case with driver 2

Summary for DOB for driver (1-3)

Moreover there are large NA values for DOBs of driver 4 and 5

First Person to twentied person declared as an excluded driver.

Number of records in each Variable(ExcludedDriverName)

Meaning: Zipcode of the place where the first vehicle is parked.

Variable: Marital Status1 to Marital Status5

Meaning: Marital status of the first driver to fifth driver M or S

New DataType: Factor

Variable: Occupation_1 to Occupation_5

Meaning: Occupation of the first driver to fifth driver

Variable: Relation_1 to Relation_5

Meaning: First vehicle (if Rental is allowed)

NewDataType: Factor (It should be yes or no.. 0 or 1)

Meaning: Gender of the driver one to driver five

Variable: Surcharge1Unit_1 to Surcharge2Unit_1

Levels are only "M" and "S"

levels in occupation are around 2000.

Levels are only "F" and "M"

Units: Number of vehicles covered for that Policy

Analysing the data in the dataset, we find that:

All records have values for zip column.

All policies were registered in the state of Texas.

You might also like