Big Grocery Multiple Regression Analysis
BAN 602 – Dr. Curtis Price
Fall 2021
Part 1: Introduction
The purpose of this analysis is to identify how far from the population center we would like to build a
new Big Grocery store. Having identified several points of data via a survey of an exploratory new
market, we have determined that the dependent variable of this analysis to be the frequency of
shoppers. More specifically, the dependent variable is the question: “Will you shop at Big Grocery again
within the next week?” We have labeled this as Q1. The independent variables will be listed as Q2, Q3,
Q4, and Q5 respectively. More specifically those variables are the questions: “How far do you live from
this Big Grocery store?”; “Did you find everything you were looking for today?”; How satisfied are you
with your shopping experience today?”; and “Family Income?” Collectively, these variables will have an
impact on where our new store location will be. To understand why and how each independent
variable/questions affects Q1, we have the table below:
Independent Variable Name Description Relation to the question “Will
you shop at Big Grocery again
within the next week?”
Distance (Q2) The length of distance (in This is our main independent
number of miles) for which the variable of interest since we are
respondent answered the trying to determine how far
question: “How far do you live from the population center to
from this Big Grocery store?” build the new store.
Options (Q3) An indicator variable, 1 if the If the choices and options
answer is yes and 0 if the offered at the store meets
answer is no to the question: customer needs, they are more
“Did you find everything you are likely to be a return customer.
looking for today?”
Satisfaction (Q4) A number on a Likert Scale (1 If the customer rates their
very unsatisfied – 5 very satisfaction of the shopping visit
satisfied) that the respondent highly (4 or 5 on the Likert
answers the question: “How scale), they are more likely to
satisfied are you with your be a return customer.
shopping today?”
Income (Q5) The number (in thousands of US A customer’s family income
dollars) the respondent threshold would likely affect the
reported to the question: frequency of their shopping and
“Family Income.” purchasing power per visit.
Part 2: The Data
The data was compiled by members of my team at Big Grocery and consists of 1,000 observations for
each variable. The data and variable descriptors are as follows:
Variable Name Description Average Max Min
An indicator variable, 1 if
the answer is yes and 0 if
Revisit the answer is no to the 0.74 1 0
Big Grocery Multiple Regression Analysis
BAN 602 – Dr. Curtis Price
Fall 2021
question: “Will you shop
at Big Grocery again
within the next week?”
The length of distance (in
number of miles) for
which the respondent
Distance answered the question: 50.45 100 0
“How far do you live
from this Big Grocery
store?”
An indicator variable, 1 if
the answer is yes and 0 if
the answer is no to the
Options question: “Did you find 0.49 1 0
everything you are
looking for today?”
A number on a Likert
Scale (1 very unsatisfied –
5 very satisfied) that the
Satisfaction respondent answers the 2.92 5 1
question: “How satisfied
are you with your
shopping today?”
The number (in
thousands of US dollars)
Income the respondent reported 60.25 100 20
to the question: “Family
Income.”
Part 3: Regression Results
We estimate the impact of the dependent variable, Revisit (Q1), by our independent variables: Distance
(Q2), Options (Q3), Satisfaction (Q4), and Income (Q5) via the model below:
Revisit = β0 + β1(Distance) + β2(Options) + β3(Satisfaction) + β4(Income) + u
The results of the regression are shown below:
Big Grocery Multiple Regression Analysis
BAN 602 – Dr. Curtis Price
Fall 2021
The result in equation form are below. For this model, we put a hat () on the dependent variable to
remind us that this is an estimate from data:
Revisit = 0.0975 – 0.005(Distance) + 0.201(Options) + 0.169(Satisfaction) + 0.005(Income)
The literal interpretation of the intercept term would be that, holding all of our independent variables at
zero, the likelihood of a return to this Big Grocery store would be 9.75%. This is not possible as a
satisfaction score of zero on the Likert scale is not possible and it also doesn’t make sense that someone
with zero income and zero satisfaction would ever return to the score. Therefore, we will ignore this
term.
Independent Variables:
1. Distance – The distance coefficient implies that, holding all other variables constant, for every
one mile increase of distance from the store location, a customer’s percentage likelihood of
returning to this Big Grocery location decreases by 0.5%. This implies a negative correlation
which makes sense as the distance a person lives from a store would likely impact their ability
and preference to commit to shopping there. However, this is an incredibly low correlation
which leads us to believe that distance from a customer’s home is not a major factor in where
we should choose to open this new location. This is great news if land and prices farther away
from the city center are remarkably cheaper.
2. Options – The independent variable of options implies that a customer’s percentage likelihood
of returning to this Big Grocery location increases by 20.1% for every one unit increase of
agreement to whether the customer found what they were looking for during their shopping
visit. This obviously represents a major positive correlation in revisit likelihood for shoppers in
this area and is a very important indicator to continue to monitor in this area.
3. Satisfaction – Similarly, the correlation of the satisfaction variable in this regression in positive.
More succinctly, for every one point increase on the Likert Scale to the question: “How satisfied
are you with your shopping today?” a customer is 16.9% more likely to revisit the store location.
Big Grocery Multiple Regression Analysis
BAN 602 – Dr. Curtis Price
Fall 2021
It is clear that customer satisfaction and options in shopping represent the two most important
variables in whether a customer revisits this store location. This positive correlation makes
sense as a customer’s satisfaction in the shopping experience would naturally lead to a higher
likelihood in revisiting the store.
4. Income – Finally, the interpretation of this coefficient implies that, holding all else constant, that
for every $1000 increase in a family’s household income, there is a 0.5% increase in the
probability the customer will revisit the store. Again, this is a positive correlation that makes
sense as more disposable income likely would influence a customer’s decision to shop more
regularly. However, much like distance, this is clearly not a major correlation that would
radically affect shopping preferences.
Part 4: The sufficient conditions for good estimates
Linear in parameters
We are mostly assured that this category is not satisfied as it does not appear evident that any of the
independent variables have a linear impact on the dependent variable. This is a very difficult thing to
happen when the dependent variable is an indicator.
It is also very difficult to tell if there is a linear relationship due to the multiple qualitative measures in
the independent variables. Therefore, we have omitted scatterplots since it is reasonable that this facet
is not satisfied in the Linear Probability Model.
Random sampling
We would need to know more about how this survey was performed and the data was generated. For
example, if this is for a single Big Grocery store and the proximity of customer’s around that location (as
is presumed, given we exploring expanding into a new market location), then we would need to
generate a random sample and then use that sample to extrapolate to the entire set of potential
customers. However, if this were generated from ALL Big Grocery stores, this we can be assured this
would satisfy this category.
Variation in the independent variable(s)
The sample size is nearly 1000 entries and the data collected from the survey appears to provide a wide
range of both qualitative and quantitative data. There is certainly much variation in the survey data for
each independent variable as a result and this would lead us to believe this is sufficient for a good
estimate.
Zero mean of the error term conditional on the independent variable(s)
Certainly, there are other factors that may determine whether a customer will return to a store in a
given week. Things such as: number of local competitors, family size, and traffic patterns seem like
factors that would impact this dependent variable. However, it is certainly reasonable to assume that
the selected independent variable provide a fairly strong correlation to whether a customer will revisit
in a given week. This allows us to make a strong inference that this is sufficient enough to answer the
question of how far the store should be located from the population center as we originally set out to
do.