7.19 Problem Set

The document outlines a problem set focused on analyzing the 'tips' dataset using R. It covers saving the dataset, describing its variables, running bivariate and multivariate regression analyses, and interpreting the results. Additionally, it includes instructions for creating visualizations and predicting tips based on new total bill amounts.

Uploaded by

Trinh Lê

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views2 pages

7.19 Problem Set

Uploaded by

Trinh Lê

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

---

title: '7.19 Problem Set'

author: 'Trinh Le'
---
1. Save the “tips” dataset as a tibble to a variable. Read the documentation for “tips.”
```{r}
library(tidyverse)
library(ggplot2)
data("tips", package = "reshape2")
tips <- as_tibble(tips)
head(tips)
```

a) In your own words, describe the dataset.

The "tips" dataset shows data on tips received by a waiter in a restaurant. It includes
details like how much customers paid for their meals, the tip amount, and other factors
that might influence tipping, such as the day of the week and the number of people dining.
b) In your own words, describe each variable, including whether it is numeric or
categorical and whether you believe there is a plausible association with “tip.”
total_bill (Numeric): This is the total amount (in dollars) of the customer's bill.
Generally, a higher bill could lead to a higher tip.
tip (Numeric): This is the amount of money (in dollars) given as a tip. It’s what we’re
trying to understand and predict.
sex (Categorical): The gender of the customer (Male or Female). Gender might influence
tipping habits.
smoker (Categorical): Whether the customer is a smoker (Yes or No). It may affect tipping
behavior.
day (Categorical): The day of the week when the customer visited (Thursday, Friday,
Saturday, or Sunday). Tipping might vary based on the day.
time (Categorical): Whether the meal was Lunch or Dinner. Tips may be different based on
the meal.
size (Numeric): The number of people in the dining group. Bigger groups might lead to
higher total bills and tips.

2. Run a bivariate regression using “tip” as the outcome variable and “total bill” as the
predictor variable.
a) Write out the regression equation (you can use “b” instead of “beta,” for example:
“salary = b0 + b1*yrs.since.phd…”).
```{r}
tips_regression <- lm(tip ~ total_bill, data = tips)
summary(tips_regression)
```
tip = b0 + b1 * total_bill
Where b0 is the intercept and b1 is the coefficient for total_bill.
b) Write an interpretation of the results.
Intercept (b0): This is the tip amount you would expect if the total bill was $0 (even
though that's not a realistic situation).
Coefficient (b1): This tells us how much the tip will change for every extra dollar added
to the total bill.
c) Is the effect of “total bill” statistically significant? What percentage of the
variation in “tip” is explained by this model?
P-value: If the p-value for the total bill is less than 0.05, it means there is a strong
link between the total bill and the tip.
R-squared: This number shows how much of the change in tips can be explained by the total
bill. A higher number means a stronger connection between the bill and the tip.
d) Create a scatterplot of the data with a regression line added. Do not use
geom_smooth(); instead, use the actual results of your regression. (Hint: You will need to
use predict() to add a new column to your dataset.)
```{r}
tips <- tips %>%
mutate(predicted_tip = predict(tips_regression))

ggplot(tips, aes(x = total_bill, y = tip)) +

geom_point() +
geom_line(aes(y = predicted_tip), color = "blue") +
labs(title = "Scatterplot of Total Bill vs Tip with Regression Line",
x = "Total Bill ($)",
y = "Tip ($)")
```

e) Create a new dataset with ten random amounts for “total bill.” Predict the tip amount
for each row.
```{r}
new_data <- tibble(total_bill = runif(10, min = min(tips$total_bill), max =
max(tips$total_bill)))
new_data <- new_data %>%
mutate(predicted_tip = predict(tips_regression, newdata = new_data))
new_data
```

3. Run a multivariate regression using “tip” as the outcome variable. Use at least three
predictor variables. At least one predictor variable must be continuous and at least one
must be categorical.
a) Write out the regression equation.
```{r}
tips_multivariate <- lm(tip ~ total_bill + size + sex, data = tips)
summary(tips_multivariate)
```
tip = b0 + b1 * total_bill + b2 * size + b3 * sexFemale
sexFemale is a dummy variable (1 if the customer is female, 0 if male).

b) Write an interpretation of the results for each predictor variable (including each
dummy variable for categorical variables). If the variable is categorical, the
interpretation should state what the reference category is.
total_bill: The expected change in tip for each additional dollar spent on the total bill.
size: The expected change in tip for each additional person in the dining party.
sexFemale: The expected difference in tip amount when the customer is female compared to
the reference category (male).
c) Which of your predictor variables have effects that are statistically significant? What
percentage of the variation in “tip” is explained by this model? What changes do you
notice from the previous model?
Significance of Predictors: Look at the p-values for each predictor. If a p-value is less
than 0.05, it means that predictor has a meaningful effect on the tip.
R-squared Value: This tells us how much of the changes in tip amounts are explained by all
the predictors combined.
Comparing R-squared: Check how the R-squared value changes compared to the model with just
one predictor (total bill). A bigger increase means the additional predictors add more
explanatory power to the model.

Lecture 3
No ratings yet
Lecture 3
27 pages
EDA-Assignment 1
No ratings yet
EDA-Assignment 1
7 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
Make Up Cat
No ratings yet
Make Up Cat
6 pages
13 Predictive Analysis - Tests of Association - Regression
No ratings yet
13 Predictive Analysis - Tests of Association - Regression
70 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
Regresssion Analysis
No ratings yet
Regresssion Analysis
19 pages
Lecture 3
No ratings yet
Lecture 3
35 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Data Analysis On Tips Dataset
No ratings yet
Data Analysis On Tips Dataset
13 pages
Algorithm M
No ratings yet
Algorithm M
8 pages
Lab 3 - Logistic Regression: Part B
No ratings yet
Lab 3 - Logistic Regression: Part B
7 pages
Statistical Modelling
No ratings yet
Statistical Modelling
39 pages
Machine Learning (Class 4-6) 6
No ratings yet
Machine Learning (Class 4-6) 6
69 pages
Regression Analysis Using R
No ratings yet
Regression Analysis Using R
17 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
No ratings yet
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
8 pages
Building Regression Models
No ratings yet
Building Regression Models
22 pages
Lec 05 - Time Series Regression Model
No ratings yet
Lec 05 - Time Series Regression Model
32 pages
Regression Analysis Essentials
No ratings yet
Regression Analysis Essentials
31 pages
Unit 3
No ratings yet
Unit 3
24 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
Time Series Regression Models Guide
No ratings yet
Time Series Regression Models Guide
74 pages
Homework 5 Solutions
No ratings yet
Homework 5 Solutions
10 pages
Diagnostic Tests2
No ratings yet
Diagnostic Tests2
25 pages
BZAN 535: Linear Regression
No ratings yet
BZAN 535: Linear Regression
11 pages
Regression Models Course Notes
No ratings yet
Regression Models Course Notes
102 pages
Individual Part 3
No ratings yet
Individual Part 3
4 pages
Flexible Data Models: Dummy Variables and Interaction Effects
100% (1)
Flexible Data Models: Dummy Variables and Interaction Effects
31 pages
Lec 05 2 - Time Series Regression Model
No ratings yet
Lec 05 2 - Time Series Regression Model
75 pages
Pandas PD Numpy NP Matplotlib - Pyplot PLT Seaborn SNS: Import As Import As Import As Import As
No ratings yet
Pandas PD Numpy NP Matplotlib - Pyplot PLT Seaborn SNS: Import As Import As Import As Import As
16 pages
Mindanao State University General Santos City: Simple Linear Regression
No ratings yet
Mindanao State University General Santos City: Simple Linear Regression
12 pages
t2 Sol
No ratings yet
t2 Sol
5 pages
Chapter 14
No ratings yet
Chapter 14
15 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
Multiple Linear Regressioin Part 1
0% (1)
Multiple Linear Regressioin Part 1
27 pages
Linearregressioninr 151028142740 Lva1 App6892
No ratings yet
Linearregressioninr 151028142740 Lva1 App6892
31 pages
Multiple Linear Regression in R
No ratings yet
Multiple Linear Regression in R
5 pages
Stats Notes
No ratings yet
Stats Notes
4 pages
Homework 3 R Tutorial: How To Use This Tutorial
No ratings yet
Homework 3 R Tutorial: How To Use This Tutorial
8 pages
Common Stat 101 Commands For Rstudio: 1 One Categorical Variable
No ratings yet
Common Stat 101 Commands For Rstudio: 1 One Categorical Variable
5 pages
Regrassion Analysis Lab Question and Answer
No ratings yet
Regrassion Analysis Lab Question and Answer
13 pages
3 Linear Regression 3
No ratings yet
3 Linear Regression 3
10 pages
STA302F2025 Worksheet2
No ratings yet
STA302F2025 Worksheet2
5 pages
Python Project-Tarekegn Kelta
No ratings yet
Python Project-Tarekegn Kelta
14 pages
SMDS Unit 3
No ratings yet
SMDS Unit 3
45 pages
Restaurant Tip
No ratings yet
Restaurant Tip
6 pages
R Data Analysis Techniques
No ratings yet
R Data Analysis Techniques
6 pages
Statistical Modelling: Regression: Choosing The Independent Variables
No ratings yet
Statistical Modelling: Regression: Choosing The Independent Variables
14 pages
Lecture 19: Interactions
No ratings yet
Lecture 19: Interactions
4 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
R-Programming - Unit 5
No ratings yet
R-Programming - Unit 5
43 pages
Module 4
No ratings yet
Module 4
33 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
Math 7 (SY23-24) - Q3 - Lesson 1 - Properties of Inequalities - Day 1
No ratings yet
Math 7 (SY23-24) - Q3 - Lesson 1 - Properties of Inequalities - Day 1
22 pages
Coolside Door (CW)
No ratings yet
Coolside Door (CW)
2 pages
Take Test: Online Quiz 10: Questi On 1
No ratings yet
Take Test: Online Quiz 10: Questi On 1
3 pages
Chater 9 Solutions PDF
No ratings yet
Chater 9 Solutions PDF
22 pages
Instrukcja Obslugi - TT-S6D-eng
No ratings yet
Instrukcja Obslugi - TT-S6D-eng
2 pages
Power Electronics Lab Report
No ratings yet
Power Electronics Lab Report
5 pages
57B1 (1H57B450E1) AL115C Mio Soul
No ratings yet
57B1 (1H57B450E1) AL115C Mio Soul
54 pages
Manual Calculation of Ascendant: Data Required
70% (10)
Manual Calculation of Ascendant: Data Required
4 pages
Audac - Cira7
No ratings yet
Audac - Cira7
6 pages
Ac5 Forensics Whitepowders
No ratings yet
Ac5 Forensics Whitepowders
9 pages
ME-44 - 1D Positive Displacement Pumps RF 07202015 - Rev
No ratings yet
ME-44 - 1D Positive Displacement Pumps RF 07202015 - Rev
30 pages
C4 Electronification EN PDF
No ratings yet
C4 Electronification EN PDF
6 pages
Subject Profile - IB Mathematics Analysis & Approaches
No ratings yet
Subject Profile - IB Mathematics Analysis & Approaches
3 pages
OMA Device Management
No ratings yet
OMA Device Management
172 pages
GSM Sniffing
100% (2)
GSM Sniffing
51 pages
Plagiarism Checker X Originality Report: Similarity Found: 12%
No ratings yet
Plagiarism Checker X Originality Report: Similarity Found: 12%
10 pages
Acrs2015 Paper-Id 534
No ratings yet
Acrs2015 Paper-Id 534
9 pages
RDF & RDF Schema for IT Students
No ratings yet
RDF & RDF Schema for IT Students
44 pages
Van Berkel Bos - Diagrams
No ratings yet
Van Berkel Bos - Diagrams
4 pages
Ancient Tamil Vattezhutthu Alphabets Recognition in Stone Inscription Using Wavelet Transform and SVM Classifier
No ratings yet
Ancient Tamil Vattezhutthu Alphabets Recognition in Stone Inscription Using Wavelet Transform and SVM Classifier
5 pages
Mobility and Effective Electric Field in Nonplanar Channel MOSFETs
No ratings yet
Mobility and Effective Electric Field in Nonplanar Channel MOSFETs
5 pages
Harvard Problem of The Week 19
No ratings yet
Harvard Problem of The Week 19
3 pages
3.articulated Tower Platforms
No ratings yet
3.articulated Tower Platforms
2 pages
BSC CS Syllabus, Burdwan University
No ratings yet
BSC CS Syllabus, Burdwan University
17 pages
Calculus II 2020-2021 S2 Midterm
No ratings yet
Calculus II 2020-2021 S2 Midterm
6 pages
10 04 2023 - 17 07 44 - Crash
No ratings yet
10 04 2023 - 17 07 44 - Crash
15 pages
Fujifilm Minilab Service Guide
No ratings yet
Fujifilm Minilab Service Guide
612 pages
Basic Computer Operations
No ratings yet
Basic Computer Operations
6 pages
Verifyconnectivity or Check - Connectivity Reports Some Nets As Open and Some Nets As Special Open
No ratings yet
Verifyconnectivity or Check - Connectivity Reports Some Nets As Open and Some Nets As Special Open
1 page
Admission Test Syllabus Grade 5
No ratings yet
Admission Test Syllabus Grade 5
1 page

7.19 Problem Set

Uploaded by

7.19 Problem Set

Uploaded by

---

title: '7.19 Problem Set'

a) In your own words, describe the dataset.

ggplot(tips, aes(x = total_bill, y = tip)) +

You might also like