0% found this document useful (0 votes)
61 views11 pages

Data Analytics & R: Regression and Anova

This document contains two data analyses summaries: 1) A credit card repayment data set of 10,000 customers to predict defaults. Income, credit ratings, number of cards, and demographics are analyzed against age, education, gender, student/marital status, and ethnicity. Income and ratings are correlated with age. 2) An infant mortality data set analyzing rates against per capita income and region. Both income and region significantly impact infant mortality rates.

Uploaded by

Rabi Kant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views11 pages

Data Analytics & R: Regression and Anova

This document contains two data analyses summaries: 1) A credit card repayment data set of 10,000 customers to predict defaults. Income, credit ratings, number of cards, and demographics are analyzed against age, education, gender, student/marital status, and ethnicity. Income and ratings are correlated with age. 2) An infant mortality data set analyzing rates against per capita income and region. Both income and region significantly impact infant mortality rates.

Uploaded by

Rabi Kant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

MUMBAI

Data Analytics & R


Assignment-1

Topic
Regression and Anova

SUBMITTED BY-

Ravikant

SEMESTER VI

DEPARTMENT OF FASHION TECHNOLOGY

NATIONAL INSTITUTE OF FASHION TECHNOLOGY, MUMBAI


Data no .1 for task 1

Repayment of the credit card.


Description of the data set.
The data set is a simulated data seat containing information on ten thousand customers, the
aim of the data set is to predict which customers will default on their credit card debt.

The codes used for the variables are as follows;

1. Id- Its just numerical code given for identification of candidates


2. Income – Income in multiples of $10,000.
3. Ratings- The ratings of the cards of the people
4. Cards- Number of cards.
5. Age- Age of the person
6. Education – The number of years the person has attained education.
7. Gender- Male or female
8. Student- Whether the person is a student
9. Married- The card user is married or not.
10. Ethnicity- From where does the person belong i.e. Asian, American, African or
Caucasian.

The variables used for the prediction are divide in two parts dependent variables and
independent variables. Their list is as follows;

Dependent Variables Independent Variables


numerical numerical
Income Age
numerical numerical
Rating Education
numerical categorical
Number of cards Gender
categorical
Student
categorical
Married
categorical
Ethnicity
Codes for income

Outcome:
Graph of residuals versus fitted.

Codes for ratings

Observations for ratings


Graph of residual versus actual

Codes for number of cards:

Observation
Graph of residual versus fitted

Result; Income and rating and depends on Age, as the pr value of age is less than 0.05.

Income = (0.3646)x + 28.8900

Where x= age

Rating = (0.933)x + 325.9047


B Data for infant mortality

Description of the data set.


The data set consist of data of infant mortality in different countries. The data consist of
following codes:

1. Income – it is the per capita income in US dollar.


2. Infant –infant mortality rate per 1000 live births
3. Region- a factor with levels Africa, Asia, America, Oceania

The variables used for the prediction are divide in two parts dependent variables and
independent variables. Their list is as follows;

Dependent Variables Independent Variables


Infant Income
Region

Codes
Observation

Graph;

Result: The infant mortality rate depends on both income as well as region as bot oftheirs
pr value are less then 0.005, which makes them a significant value.

You might also like