0% found this document useful (0 votes)
192 views10 pages

Practice Final

This document contains a practice final exam for a business statistics course. It includes 21 multiple choice questions covering topics like simple linear regression, interpreting regression outputs, assessing model fit, and transforming variables. The questions test understanding of concepts like least squares regression lines, residuals, correlation, and using regression to make predictions.

Uploaded by

Jason Chan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
192 views10 pages

Practice Final

This document contains a practice final exam for a business statistics course. It includes 21 multiple choice questions covering topics like simple linear regression, interpreting regression outputs, assessing model fit, and transforming variables. The questions test understanding of concepts like least squares regression lines, residuals, correlation, and using regression to make predictions.

Uploaded by

Jason Chan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Business Statistics, ISOM2500

Practice Final

1. In a simple linear regression, the least squares regression line is


(a) the line which makes the sample correlation as close to +1 or −1 as possible.
(b) the line which best splits the data in half, with half of the data points lying above the regression
line and half of the data points lying below the regression line.
(c) the line which minimizes the sum of squared residuals.
(d) the line which minimizes the number of points that do not pass through the line.
2. A least squares regression line is determined from a sample of values for variables x and y, where
x is the size of a listed home (in square feet), and y is the selling price of the home. Which of the
following statements is true concerning the fitted line ŷ = b0 + b1 x?
(a) If there is a positive correlation r between x and y, then the slope b1 must also be positive
(b) The units on the intercept b0 and the slope b1 will be the same as the units on the variable y
(c) If r2 = 0.85, then it is appropriate to conclude that a change in x will cause a change in y
(d) None of above is true
3. The residual plot below consists of 104 observations.

Based on the plot one can conclude that RMSE is around


(a) 0 (b) 25 (c) 40 (d) 80
4-5. An insurance agent has selected a sample of drivers that she insures whose ages are in the range
from 16 to 42 years. For each driver, she records the age of the driver (x) and the dollar amount of
claims (y) that the driver filed in the previous 12 months. A scatterplot showing the dollar amount of
claims as the response and the age as the predictor shows a linear trend. The least squares regression
line is determined to be: ŷ = 3715−75.4x. A plot of the residuals versus age of the drivers showed no
pattern, and the following were reported: r2 = 0.822, standard deviation of the residuals Se = 312.1.
4. Which of the following is correct?
(a) If the age of a driver increases from 20 to 21, the dollar amount of claims is predicted to decrease
by $75.4
(b) If the age of a driver increases by one year, the dollar amount of claims is predicted to increase
by $3,715
(c) One can use the least squares regression line to obtain a reliable prediction of the dollar amount
of claims for a driver whose age is 55 years
(d) The dollar amount of claims for a driver of 10 years old is expected to be $2,961.
5. Which of the following is false?
(a) 82.2% of the variation in the dollar amounts of claims is explained by the age of the driver.
(b) The correlation r between the response and the predictor is 0.907
(c) If the histogram of the residuals is symmetric around zero and bell-shaped, then about 68% of
the dollar amounts of claims are within 312.1 dollars of the regression line.
(d) A driver in the data set whose age is 25 years had a residual of −$150 using the fitted line above;
this means his dollar amount of claims is $1,680.

1
6-7. An LS linear regression is fitted to a data set, and the residual plot is shown below.

6. Which of the following is correct?


(a) A linear model is okay because the association between the two variables is fairly strong.
(b) The linear model is not good because the correlation between the response and the predictor is
near 0.
(c) The linear model is not good because some residuals are large.
(d) The linear model is not good because of the curve in the residuals.

7. If one uses the LS linear regression to make predictions, which of the following statements is true?
(a) The predictions tend to be too high for large x’s.
(b) The predictions tend to be too high for intermediate x’s.
(c) The predictions tend to be too high for small x’s.
(d) None of the above is correct.

8. In a study of the association between the car mileage (miles per gallon, mpg) and the car weight, it
is found that the association is curved. To make the association to be linear, one decides to change
the response to be 100 multiple of the reciprocal of the mileage. The scatterplot of the new response
vs the car weight (in thousands of pounds) is shown below.

An LS linear regression is fitted to the transformed variables, and yields the following equation

Estimated new response = 0.95 + 1.25 ∗ Weight (000 lbs)

Based on the equation, what’s the predicted mileage (measured in mpg) for a car of weight 5,000
pounds?
(a) 6251 (b) 0.016 (c) 7.2 (d) 13.89

2
9-10. Each worker at an assembly plant that produces clock radios is responsible for the entire assembly of
each unit they work on. The plant manager has collected data from a sample of workers: the number
of years (YRS) of experience at the plant, and the number of hours per unit (TIME) required for
assembly. The scatterplot of TIME versus YRS is shown below.

9. Which of the following is an appropriate reason why a regression line should not be used to make
predictions based on this data?
(a) The magnitude of the slope of the line is too large
(b) The intercept of the fitted line has no practical interpretation in this context
(c) The linear condition for simple regression does not appear to be met
(d) The associate between TIME and YRS is negative

10. The manager has decided to transform the response variable from TIME (hours/unit) to 1/TIME
(units/hour). The scatterplot of 1/TIME versus YRS is shown below.

Which of the following is an appropriate interpretation of these results?


(a) The unit on Se is hours per unit
(b) More experienced workers are predicted to produce more units per hour on average than less
experienced workers
(c) Because the transformed model has a higher r2 , it is better.
(d) The slope b1 measures the elasticity between 1/TIME and YRS

3
11-15. The scatterplot of sales in thousands of cartons (y) of half-gallon orange juice versus the price (x)
is given below. We apply log transformation on both y and x to fit the nonlinear pattern. Assume
the transformed x and y agree with SRM.

Transformed Fit Log to Log


Log(Sales) = 4.811646 - 1.7523832*Log(Price)

Summary of Fit
RSquare 0.755335
Root Mean Square Error 0.385788
Mean of Response 3.136468

Parameter Estimates
Term Estimate Std Error t Ratio Prob>|t|
Intercept 4.811646 0.148033 32.50 <.0001*
Log(Price) -1.752383 0.143954 -12.17 <.0001*

11. Which of the following interpretations of the fitted equation is true?


(a) As the price increase by 1%, the sales decrease by 1.75% on average
(b) As the price increase by $1, the sales decrease by 1.75 units on average.
(c) As the price increase by 1%, the sales decrease by 1.75 units
(d) As the price increase by $1, the sales decrease by 1.75% on average.

12. Based on the fitted equation, what’s the predicted sales (in thousands of cartons) for a price of $2.3?
(a) 3.35 (b) 28.56 (c) 4.18 (d) 65.22

13. Suppose the cost of a half-gallon juice is $1.5, then the optimal price is about
(a) $1.9 (b) $3.0 (c) $3.5 (d) $4.1

14. The statistics of the slope show that


(a) The elasticity is positive with at least 95% confidence
(b) The elasticity is bigger than −1 with at least 95% confidence
(c) The elasticity is smaller than −1 with at least 95% confidence
(d) None of the above is correct.

15. About the estimated intercept 4.811646, which of the following is the appropriate interpretation?
(a) It estimates the sales in thousands of cartons when the price equals $0.
(b) It estimates the sales in thousands of cartons when the price equals $1.
(c) It estimates the logarithm of the sales in thousands of cartons when the price equals $1.
(d) None of the above is correct.

4
16. The normal quantile plot of residuals from a regression equation in the plot below suggests that

(a) The fitted equation is linear.


(b) The R-squared statistic is about 0.9 or more.
(c) The errors are normally distributed.
(d) The data in the sample are dependent.

17-21. An LS linear regression is fitted to the daily returns on HSBC (HSBC return) vs those on Hang Seng
index (HSI return). The following are some plots and summaries one gets in the fitting procedure.

5
17. Based on the plots above, which of the following assumptions about the SRM seems to be violated?
(a) Linear association (b) Normality of errors
(c) Equal variance of errors (d) Independence of errors

18. The Alpha and Beta in CAPM are estimated as (accurate to 2 decimal places)
(a) (0.59, 0.01) (b) (0, 1.02) (c) (1.02, 0) (d) (0, 0.05)

19. If the return on Hang Seng index increases by 1%, at 95% confidence level, which of the following
statements about the return on HSBC is true?
(a) It will increase by at least 0.92%.
(b) It will increase by less than 1.13%, on average.
(c) It will be at least 0.92%, on average.
(d) It will increase by 1.02%.

20. Which of the following statements is false?


(a) We do not reject the hypothesis that β0 = 0 at 5% significant level.
(b) At 5% significant level, we do not reject the hypothesis that returns on HSBC move on average
by the same amount with returns on the Hang Seng index.
(c) We do not reject the hypothesis that β1 = 0 at 5% significant level.
(d) Returns on HSBC are correlated with the returns on the market

21. Assume for now that the conditions about the SRM are satisfied. What is the approximate 95%
prediction interval for HSBC return if HSI return = 2%?
(a) (0%, 4%) (b) (−2%, 2%) (c) (−9%, 13%) (d) (1%, 3%)

6
22-26. A large national bank charges local companies for using their services. A bank official reported the
results of a regression analysis designed to predict the bank’s charges (Y ) – measured in dollars per
month – for services rendered to local companies. One explanatory variable used to predict service
charge to a company is the company’s sales revenue (X) – measured in millions of dollars. Data
for 21 companies who use the bank’s services were used to fit the model. The results of the simple
linear regression are provided below. Assume the conditions of the SRM are satisfied.

ŷ = −2, 700 + 20x, RMSE = 65, p-value for testing β1 = 0 is 0.034.

22. Interpret the estimate of β0 , the intercept of the line.


(a) All companies will be charged at least $2,700 by the bank.
(b) There is no practical interpretation since a sales revenue of $0 is a nonsensical value.
(c) About 95% of the observed service charges fall within $2,700 of the least squares line.
(d) For every $1 million increase in sales revenue, we expect a service charge to decrease $2,700.

23. Interpret the estimate of σε , the standard deviation of the error term in the model.
(a) About 95% of the observed service charges fall within $65 of the least squares line.
(b) About 95% of the observed service charges equal their corresponding predicted values.
(c) About 95% of the observed service charges fall within $130 of the least squares line.
(d) For every $1 million increase in sales revenue, we expect a service charge to increase $65.

24. Interpret the p-value for testing the hypothesis that β1 = 0.


(a) There is sufficient evidence (at α = 0.05) to conclude that sales revenue (X) is a useful linear
predictor of service charge (Y ).
(b) There is insufficient evidence (at α = 0.05) to conclude that sales revenue (X) is a useful linear
predictor of service charge (Y ).
(c) Sales revenue (X) is a poor predictor of service charge (Y ).
(d) For every $1 million increase in sales revenue, we expect a service charge to increase $0.034.

25. A 95% confidence interval for β1 is [15, 30]. Interpret the interval.
(a) We are 95% confident that the mean service charge will fall between $15 and $30 per month.
(b) We are 95% confident that the sales revenue (X) will increase between $15 and $30 million for
every $1 increase in service charge (Y ).
(c) We are 95% confident that on average the service charge (Y ) will increase between $15 and $30
for every $1 million increase in sales revenue (X).
(d) At the α = 0.05 level, there is not enough evidence of a linear relationship between service charge
(Y ) and sales revenue (X).

26. To obtain a narrower confidence interval for the estimated slope in this model, we should advise the
bank official to
(a) concentrate on companies which spent less on using the bank’s services.
(b) concentrate on companies which spent more on using the bank’s services.
(c) concentrate on companies whose sales revenues are either relatively low or relatively high.
(d) obtain additional data for companies of widely varying sales revenues.

27-28. It is believed that, the average numbers of hours spent studying per day (HOURS) during under-
graduate education should have a positive linear relationship with the starting salary (SALARY,
measured in thousands of dollars per month) after graduation. Given below is the output from
regressing SALARY on HOURS for a sample of 51 students.

R Square 0.7845
Standard Error 1.3704

7
Observations 51

Coefficients Standard Error t Stat P-value


Intercept -1.8940 0.4018 -4.7134 2.051E-05
Hours 0.9795 0.0733 13.3561 5.944E-18

27. What’s the value of the t-test statistic to test whether HOURS is a useful linear predictor of
SALARY?
(a) −4.7134 (b) −1.8940 (c) 0.9795 (d) 13.3561

28. The 90% confidence interval for the average change in SALARY (in thousands of dollars) associated
with one extra hour of studying per day is
(a) wider than [−2.70, −1.09] (b) narrower than [−2.70, −1.09]
(c) wider than [0.83, 1.13] (d) narrower than [0.83, 1.13]

29-33. A construction contractor is involved in a wide variety of construction projects. The operations
manager wants to investigate how the Total Hours of labor (design, engineering, modeling, simu-
lation, construction, software support, etc.) required for a project is related to the Total Cost of
completing the project. Based on data collected over many projects, the data was used to determine
a predicting equation for the simple regression model:

Total Cost = F + M × Total Hours + ε,

where F and M are the fixed and marginal costs respectively. After determining the predicting
equation, a scatterplot of residuals vs. Total Hours was determined as given below:

29. Which of the following statements is an appropriate interpretation of these results?


(a) The similar variances condition for simple regression does not appear to be satisfied by the data
(b) Prediction intervals for small values of the Total Hours would tend to be too narrow
(c) Confidence intervals for the slope of the line should still be considered reliable
(d) None of the above

30-33. In an attempt to improve the model, the manager decides to use 1/Total Hours as the explanatory
variable, and Cost/Hour ($/Hour) as the response. The model becomes:

Cost 1
=M +F + ε0
Hour Hours

The regression output and the scatterplot of the data are given below:

8
Summary of Fit
RSquare 0.12
Root Mean Square Error 27.2

30. Which of the following statements is correct?


(a) The total cost of a project is predicted to decrease as the number of hours required increases.
(b) The total cost of a project is expected to increase by $118.41 per additional hour of labor required
for the project.
(c) The fixed cost of a project is predicted to be approximately $118.41.
(d) None of the above.

31. Using the revised model, what is the average cost per hour for a project that will require 300 total
hours of labor to complete?
(a) −466, 401 (b) 113.2 (c) 0.0088 (d) 118.4

32. Using the revised model, what is the approximate 95% prediction interval for the total cost of a
project that will require 600 total hours of labor to complete?
(a) (61, 170) (b) (53,200, 85,800) (c) (36,900, 102,100) (d) (69,400, 69,500)

33. The information given in the parameter estimates table about the intercept implies that
(a) Fixed costs are significantly different from zero.
(b) Marginal costs are significantly different from zero.
(c) Marginal costs decrease as the total hours increase.
(d) Marginal costs cannot be estimated from the model.

9
34-35. Weekly commodity prices for heating oils (in cents) were obtained and regressed against time. The
residual plot is shown below.

34. Which assumptions of SRM appears to be violated?


(a) Linear association (b) Normality of errors
(c) Equal variance of errors (d) Independence of errors

35. If one uses the obtained regression equation to make prediction about the commodity prices for
heating oils in the next week, then compared with the actual price, the prediction is likely to be
.
(a) higher (b) lower (c) on target (d) cannot tell based on the information given.

10

You might also like