MUMBAI
Data Analytics & R
Assignment-1
Topic
Regression and Anova
SUBMITTED BY-
Ravikant
SEMESTER VI
DEPARTMENT OF FASHION TECHNOLOGY
NATIONAL INSTITUTE OF FASHION TECHNOLOGY, MUMBAI
Data no .1 for task 1
Repayment of the credit card.
Description of the data set.
The data set is a simulated data seat containing information on ten thousand customers, the
aim of the data set is to predict which customers will default on their credit card debt.
The codes used for the variables are as follows;
1. Id- Its just numerical code given for identification of candidates
2. Income – Income in multiples of $10,000.
3. Ratings- The ratings of the cards of the people
4. Cards- Number of cards.
5. Age- Age of the person
6. Education – The number of years the person has attained education.
7. Gender- Male or female
8. Student- Whether the person is a student
9. Married- The card user is married or not.
10. Ethnicity- From where does the person belong i.e. Asian, American, African or
Caucasian.
The variables used for the prediction are divide in two parts dependent variables and
independent variables. Their list is as follows;
Dependent Variables Independent Variables
numerical numerical
Income Age
numerical numerical
Rating Education
numerical categorical
Number of cards Gender
categorical
Student
categorical
Married
categorical
Ethnicity
Codes for income
Outcome:
Graph of residuals versus fitted.
Codes for ratings
Observations for ratings
Graph of residual versus actual
Codes for number of cards:
Observation
Graph of residual versus fitted
Result; Income and rating and depends on Age, as the pr value of age is less than 0.05.
Income = (0.3646)x + 28.8900
Where x= age
Rating = (0.933)x + 325.9047
B Data for infant mortality
Description of the data set.
The data set consist of data of infant mortality in different countries. The data consist of
following codes:
1. Income – it is the per capita income in US dollar.
2. Infant –infant mortality rate per 1000 live births
3. Region- a factor with levels Africa, Asia, America, Oceania
The variables used for the prediction are divide in two parts dependent variables and
independent variables. Their list is as follows;
Dependent Variables Independent Variables
Infant Income
Region
Codes
Observation
Graph;
Result: The infant mortality rate depends on both income as well as region as bot oftheirs
pr value are less then 0.005, which makes them a significant value.