Clustering Stores of Retailers Via Consumer Behavior Thesis in Business Analytics and Quantitative Marketing
Clustering Stores of Retailers Via Consumer Behavior Thesis in Business Analytics and Quantitative Marketing
Behavior
Thesis in Business Analytics and
Quantitative Marketing
Author:
Gijs van Rooij (360497)
Supervisor:
Gertjan van den Burg
Co-reader:
Dr. Michel van de Velden
February 2, 2017
Abstract
These days most retailers define clusters of stores and set different prices in the clusters,
since it is not yet attractive to define store-by-store prices due to optimization and opera-
tional issues. However, retailers define clusters of stores solely based on local competition.
Existing clusters of stores could be further broken down and price management decisions
could be adjusted accordingly by using consumer behavior. Therefore, the price elasticity
is used which is a reflection of consumer behavior. In this way the retailer enables itself
to set different prices in smaller clusters of stores in order to attain higher revenue and
profit. This research provides a clustering solution that defines clusters of stores based
on price elasticities. This clustering solution proves potential with a projected increase in
revenue of 0.36% and a projected increase in profit of 0.76% by using data of one of the
major supermarket chains in the Netherlands.
Abstract i
1 Introduction 1
1.1 Topic and data description . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Phase 1: Clusters of stores per product group . . . . . . . . . . . . 2
1.2.2 Phase 2: Final clusters of stores . . . . . . . . . . . . . . . . . . . 2
1.2.3 Phase 3: Clustering impact . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Structure paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Related work 5
2.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Covariates of price sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Data 8
3.1 Scanner data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Store-specific characteristics and trading area data . . . . . . . . . . . . . 9
4 Method 10
4.1 Dissimilarity measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2.1 Restricted K-means++ . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.2 Cluster consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.3 Cluster validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.4 Cluster evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Regression analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3.1 Regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3.2 Variable selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Evaluation 22
5.1 Cluster definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1.1 Cluster interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1.2 Cluster evaluation: Jaccard . . . . . . . . . . . . . . . . . . . . . . 23
5.2 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 New price policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4 Regression analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Conclusion 30
i
Bibliography 32
Appendices 35
A Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
B Impact clustering results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
C Panel data regression results . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1 | Introduction
In this chapter we propose the problem at hand and we briefly introduce the methods
applied to answer the research question. First, in Section 1.1 the topic of the research
alongside the research question and the context of the research are provided. Consequently,
in Section 1.2 the workflow of the research is described. Then, the applied methods are
briefly discussed in Section 1.3 and in Section 1.4 the structure of the paper is discussed.
1
1.2 | Workflow
The workflow of this research consists of three phases, namely finding clusters of stores for
each specific product group (phase 1), obtaining the final clusters of stores (phase 2), and
analyzing for the impact of the obtained clusters of stores (phase 3). Each of these phases
raise different questions that need to be answered to get to the answer of the research
question.
How can we cluster the stores per product group exclusively based on the
products available in the considered product group?
How can we obtain one final set of clusters of stores that combines the
information comprised in the clusters of stores found for each separate
product group?
2
1.2.3 Phase 3: Clustering impact
Price is the factor that provides immediate revenues [27]. Therefore, to prove the positive
impact of our framework on the revenue and the profit we have to adjust the current
prices according to the new formed clusters. We have to define a price policy at which
we set different prices for products in different clusters of stores. The price of a product
in a cluster is kept the same in all stores in that cluster. Note, the focus of this research
is not to find the optimal prices but finding the clusters of stores that fit the underlying
data the best. This raises the following subquestions to answer the research question:
If different prices are set in the clusters, what is the impact of the clustering
of stores for each of the separate product groups on the revenue and profit of
these product groups?
If different prices are set in the clusters, what is the impact of the final
clustering of stores on the revenue and profit in total?
1.3 | Method
The main focus of this research is to define new clusters of stores that use the differences
in consumer behavior among stores. To use differences in consumer behavior by means
of the variance in price elasticity on store-item level a two-stage framework is needed.
First, in phase 1 and the first step of the two-stage framework we obtain clusters of stores
based on the price elasticity on store-item level for each product group separately. The K-
means++ clustering algorithm by Arthur and Vassilvitskii [2] is used as a building block
for the Restricted K-means++ clustering algorithm which is proposed in this paper. The
K-means++ clustering algorithm is an easily implementable clustering algorithm that is
able to handle large datasets. The proposed Restricted K-means++ clustering algorithm
is bound to some restrictions for which K-means++ does not comply and therefore the
Restricted K-means++ clustering algorithm is an extension of the K-means++ clustering
algorithm. Stores should be assigned to clusters such that stores originating from the same
city/village are appointed to the same cluster. If stores from the same city/village end up
in different clusters our reassignment process ensures these stores in the end are assigned
to the same cluster. Furthermore, products that generate more revenue than others are
more important to the retailer. This is covered in the novel weighted dissimilarity measure
between stores proposed in this paper. The higher the weight, the more important the
product is to the retailer. Moreover, imagine we are interested in the dissimilarity between
two stores and only one of the stores sells a specific product. Although the price elasticity
of the product is not available in one of the stores it indicates a difference between the
stores because the assortment differs. To cope with this issue we add a correction term to
the novel weighted dissimilarity proposed in this paper that scales up the dissimilarity in
3
case data is missing. Secondly, phase 2 and the second step of the two-stage framework
includes combining the information comprised in the clusterings of stores from the separate
product groups by using the hard least squares Euclidean consensus algorithm [14]. The
result is the so-called cluster consensus which represents the final set of clusters of stores.
Finally, we conduct an extensive analysis using regressions which provides insights in the
drivers for differences in consumer behavior in order to make better decisions.
4
2 | Related work
In order to use the variation in price elasticities we propose a two-stage clustering frame-
work which consists of first finding new clusters of stores for each product group separately
and consequently finding the cluster consensus. The cluster consensus summarizes the in-
formation of all the found clusters of stores for each product group separately in one final
set of clusters. Additionally, we conduct an extensive analysis using regressions on the
formed clusters which provide insights in the drivers for differences in consumer behavior
in order to make better decisions. Existing methods are re-used and new ones are intro-
duced. Section 2.1 presents an overview of the literature on clustering. In Section 2.2 we
elaborate upon the literature related to the drivers of price elasticity.
2.1 | Clustering
The K-means clustering algorithm [12] is a clustering method that is the most commonly
used clustering algorithm in particular for its simplicity and its applicability to large
data sets. K-means suffers from the limitation that convergence to a local optimum
could produce unreliable results due to the choice of the initial centers. As such, the K-
means++ algorithm was proposed by Arthur and Vassilvitskii [2] to avoid the problem of
getting stuck in a local optimum. Moreover, the K-means++ algorithm outperforms the
K-means algorithm regarding speed as well. Additionally, in this research we are bound
to the domain-specific business rule that stores located in the same city/village should be
assigned to the same cluster. This business rules can be implemented in the clustering
algorithm via constraints. Related to it, constrained clustering [3, 4, 5] has been widely
researched the last decades.
Wagstaff et al. [40] implement constraints into the K-means clustering algorithm by
ensuring none of the constraints are violated while updating cluster assignments. In case
point s cannot be hosted by cluster C, they go through the list of clusters and search for
a cluster to which point s could be assigned. The must-link constraint, i.e., two instances
should be in the same cluster, corresponds to the business rule our research is bound to.
The disadvantage is that the algorithm is order-sensitive since the order of cluster centers
to which the stores are compared could influence the final clustering. Therefore the final
clusters could end up differently in separate runs if the order of cluster centers differs.
Basu et al. [3] needs supervisory information given as cluster labels a priori to initialize
the clustering algorithm. They propose to incorporate this form of semi-supervision in
the K-means algorithm by seeding. It uses labeled data and the constraints generated
from labeled data to initialize the clustering algorithm. This way so-called seed clusters
are generated to initialize the clustering algorithm. The labels of the seeded data are kept
the same during subsequent steps of the algorithm, whereas the non-seeded data labels
5
are estimated at each step. Additionally, they propose to apply the same technique using
the Expectation Maximization algorithm. During each Maximization step the conditional
distributions are kept the same for the seeded data and at each Expectation step the
conditional distributions for the non-seeded data are re-estimated. Experimental results
on a newsgroups dataset show the convenience of the aforementioned algorithms regarding
sensitivity and robustness over random seeding and COP-Kmeans, proposed by Wagstaff
et al. [40].
Another method for constrained clustering is the Constrained Complete Link algorithm
by Klein et al. [24]. This hierarchical agglomerative clustering method using complete
linkage is altered during initialization to cope for the must-link and cannot-link constraints,
i.e., two instances should not end up in the same cluster. In case instances are related by a
must-link constraint the distance between the instances is set to zero, while in case of the
cannot-link constraint the same distance is set to infinity. Consequently, new distances
between pairs of instances are found using shortest path.
In contrast to constructing a single clustering, multiple clusterings impose different
structures on the data and therefore provide a wide range of information. In order to
exploit the complementary nature of the data, different samples of input data can be
used. To combine clusterings, according to Vega-Pons and Ruiz-Scholcloper [39], roughly
two sorts of different methods exist, i.e., median partition based approaches [13, 39] and
object co-occurrence based approaches [14, 22, 38]. The basic idea of a median partition
based approach is to find a clustering P such that the similarity between P and all
the clusterings in the ensemble is maximized. One of the object co-occurrence based
approaches is the method by Strehl and Ghosh [36] for which first the similarity matrix
is constructed. Based on the adjacency matrix of a hypergraph the similarity matrix can
be deduced. If the adjacency matrix contains a 1 in the column it represents an object
(row) that is contained in that cluster. Next, any reasonable similarity-based clustering
algorithm can be used, such as K-medoids [23]. However, since the adjacency matrix of
a hypergraph is constructed they also propose to use the METIS algorithm [21] to yield
the combined clustering which then makes the algorithm a graph based method.
6
ticity using scanner data of 18 product groups in 83 stores in Chicago. They discovered
they could explain two thirds of the variation in price elasticity by using eleven consumer
characteristics consisting of demographic and competitive variables. The information com-
prised in seven demographic variables related to life stage, education, family size, income,
ethnicity, value of the house, and the percentage of working women are exploited. More-
over, four competitive variables such as the sales of competitors and distances to other
supermarkets and warehouses are incorporated in the model. The demographic variables
proved to be more important than the competitive variables. However, the research dates
from 20 years ago and the retailing market has changed drastically. Collecting data is
way less expensive these days and therefore the data could contain more valuable infor-
mation. Moreover, since the data was collected in Chicago the competitive landscape and
consumer behavior are expected to differ from the competitive landscape and consumer
behavior in the Netherlands.
As stated before, price elasticity is broadly researched and all sorts of different deter-
minants to price elasticity are examined. Kumar and Karande [26] seek the effect of the
environment of a store on the retailer’s performance by means of the sales or productiv-
ity of stores located all over the United States. Most interestingly, they conducted the
analysis on store level and segmented stores offering useful insights in the determinants of
store performance. Situational price sensitivity is examined by Wakefield et al. [41]. They
found in case of social and hedonic consumption situations price sensitivity is weakened
compared to functional consumption. Huang et al. [19] found that low-income shoppers
appear to be most price sensitive and price sensitivity decreased in case market share
of brands increased. Last, Elrod and Winer [11] found weak significant influence of age,
education, and female headed households on price sensitivity.
7
3 | Data
As we know retailers assign stores to clusters. Stores assigned to the same cluster charge
the same prices for the same products. As previously mentioned in Chapter 1 Jumbo
implements four unique clusters of stores based on local competition. Each of the formed
clusters of stores is called a priceline. The stores are assigned to a priceline based on the
nearest located competitor in the same domicile. In this research we focus on the priceline
represented by the largest competitor of Jumbo which is Albert Heijn due to availability
issues of the data. Within this priceline we conduct a clustering of stores. This cluster of
stores contains 305 stores out of the over 600 stores in total. These stores account for 53%
of the revenue of the Jumbo market chain. Stores that opened at most 3 months ago are
discarded from the research due to the limited amount of historical data. Furthermore,
only products which are sold in the last quarter (2016Q1) of the considered time span
are taken into account since products which are not sold anymore do no longer influence
the policy of a retailer. There are two sources of data used: the Jumbo database and
Statistics Netherlands (CBS). We use CBS data to retrieve information on demographics
related to the environment of the considered stores. The price elasticities on store-item as
well as on store level are provided by the retailer. Moreover, the data is roughly split into
two sets, the scanner data analyzed in Section 3.1 and the store-specific characteristics
and trading area data discussed in Section 3.2.
8
Table 3.1: The evaluated product groups with the number of products, number of
brands, average price elasticity per store and standard deviation of the price elasticity.
Product group Number of products Number of brands Average elasticity Standard deviation
This scanner data set comprises the retail prices, prices for substitutes and comple-
ments, as well as information on promotions such as price discounts. The extensive data
on promotional activity also incorporates indicators for the promotional channels such as
local media, flyers, and television commercials. Furthermore, the data contains informa-
tion on holidays, weather, and school vacations.
9
4 | Method
This research strives to use the variance in price elasticity to get to new clusters of stores
by conducting a two-stage clustering framework. Figure 4.1 shows there is still a lot of
variation in price elasticity within clusters of stores and especially in some specific product
groups between stores which proves the need for a new method.
25 % Cola
Energy
20 % Juice
Share of stores
Kids youth
Large soda
15 %
Water
10 %
5%
0%
−2.2 −2 −1.8 −1.6 −1.4 −1.2 −1 −0.8 −0.6 −0.4
Store-specific price elasticity
Differences in price elasticities on store-item level are studied [32] to find new clusters
of stores. We propose a novel dissimilarity measure in Section 4.1 that is used in the
clustering framework. Next, we discuss the clustering framework and the evaluation and
validation methods in Section 4.2. Finally, Section 4.3 shows an outline of the regression
model that is used for the analysis of the final set of clusters found by the clustering
framework. Note, we assume that each product only belongs to one product group. In
the remainder of this paper symbols i and j refer to products, whereas s, v, and y indicate
stores. The set of all stores is indicated by S and the set of all products by I. Furthermore,
G is the set of product groups. Then,
G
[
I= Pg (4.1)
g=1
where Pg is the set of products in the product group with index g. The input data for
the clustering algorithm Γ = {ei,s ∀i ∈ I, s ∈ S} comprises all price elasticities ei,s per
product i in store s. The clustering function for each product group indicated by g can
be described as:
λg : S → {1, ..., kg }, ∀g ∈ {1, ..., G}, (4.2)
10
with kg is the number of clusters for product group g and subsequently the set of clusterings
is defined as follows:
Λ = {λg : g = 1, ..., G}. (4.3)
This way we define a clustering of stores for each product group separately via Λ based on
the price elasticities on store-item level. The information comprised in Λ is used to find
the final optimal clustering λ∗ by means of the hard least squares Euclidean consensus
algorithm. In the appendix, Table A.1 shows an overview of all symbols used for the
dissimilarity measure and the clustering method.
where N (s, y) is the set of all products in a product group sold in both stores s and y,
ψi equals the average revenue per store for product i, and di represents the number of
stores in which product i is sold. Weight wi corresponds to the total revenue of product i
divided by the total revenue of all products sold in both stores s and y in its corresponding
product group such that 0 ≤ wi ≤ 1. Consequently, the result is the following preliminary
weighted dissimilarity measure between store s and store y:
X
D0 (s, y) = wi |ei,s − ei,y |, (4.5)
i∈N (s,y)
where |ei,s − ei,y | is the absolute difference between the price elasticities of product i sold
in store s and store y. The sum is exclusively evaluated over the set N (s, y) since it is only
possible to subtract price elasticities in case both stores sell the product. If a product is
solely sold in one of both stores, the price elasticity of the product is not available in one
of the stores and we can not subtract the price elasticities of the products as simply one
of them misses. Note, if both stores do not sell the considered product there will be no
difference in assortment and therefore there is no indication for a difference between the
stores regarding the considered product.
11
Second, again imagine we are interested in the dissimilarity between two stores and
only one of the stores sells a specific product. Although the price elasticity of the product
is not available in one of the stores it indicates a difference between the stores because
the assortment differs. Therefore, Porro et al. [33] suggest to address the problem before
the classifier is built. They propose to adjust the dissimilarity measure to cope for the
not available values by using the available data only. The Not Available Correction Term
(NACT) is proposed to correct for products which are solely sold in one of the two con-
sidered stores. The NACT is represented by the fraction ψ∪ /ψ∩ (s, y) and consists of the
following terms:
X
ψ∪ = ψj dj (4.6)
j∈Pg
and
X
ψ∩ (s, y) = ψi di , (4.7)
i∈N (s,y)
where ψ∪ is the total revenue of all products sold in a product group, whereas ψ∩ (s, y)
is the total revenue of all products exclusively sold in both stores. Accordingly, the final
domain-specific dissimilarity measure is as follows:
ψ∪ X
D(s, y) = wi |ei,s − ei,y |. (4.8)
ψ∩ (s, y)
i∈N (s,y)
This way the NACT is responsible for the relative importance of the set of products sold
in both stores in a product group versus the set of products exclusively sold in one of both
considered stores in a product group. The following hypothetical example provides insight
in the NACT. Imagine we compare two stores s and y. There are three products sold in
the union of products sold in stores s and y in the considered product group. Product 1
is exclusively sold in store s and has a revenue of 10 in total. Moreover, products 2 and 3
have a revenue of 20 and 30 in total in both stores, respectively. Therefore, the numerator
of the NACT equals 60 and the denominator equals 50 which results in a NACT of 60/50.
This means the dissimilarity measure is scaled up since one of the products is exclusively
sold in one store. We want to take this products as well into account with regard to
the dissimilarity between two stores since the assortment differs. Finally, the weight wi
addresses the relative importance of products sold in both stores relative to each other.
Note, the denominator of the NACT and weight wi are the same, i.e, ψ∩ (s, y).
4.2 | Clustering
This section provides the two-stage clustering framework along with the method for clus-
tering validation and evaluation. Subsection 4.2.1 introduces the novel Restricted K-
means++ clustering algorithm and Subsection 4.2.2 the process of finding the cluster
12
consensus. Subsection 4.2.3 enumerates the methods for determining k the number of
clusters. The cluster evaluation method with regard to robustness of the algorithm and
the price policies are shown in Subsection 4.2.4.
where S is the set of all stores, Cl is the current set of cluster centers, and minc∈Cl D(y, c)
is the shortest distance of store y to the closest center c that is already chosen. This way
if the distance of a store to the closest center c that is already chosen becomes higher, the
higher the probability becomes that this store is chosen as the following cluster center cl+1 .
Cluster centers
As stated before we have to deal with products which are not sold in certain stores resulting
in values which are not present in the data. Apart from the problem these values cause
with regard to the dissimilarity measure it interferes with the calculation of the cluster
centers. Cluster centers are normally represented by the cluster means in the K-means++
clustering algorithm. However, since not all values of the price elasticities are available we
can not simply acquire the cluster mean. Therefore, the original K-means++ clustering
algorithm does not suffice and the assignment of new cluster centers in the Restricted
K-means++ clustering algorithm proposed in this paper should be conducted differently.
The cluster center is found with regard to the following optimization criterion:
1 X
cz = argmin D(s, y), (4.10)
y∈Sz |Sz |
s∈Sz
where Sz is the set of stores assigned to the cluster with label z and D(s, y) represents the
dissimilarity between store s and y. This means an existing store serves as a cluster center.
Most importantly note that K-means++ can yield empty clusters due to the assignment
13
of cluster means. However, since our method does not simply compute the cluster means,
but according to Equation 4.10 returns the store which on average has minimum distance
to the other stores within a cluster, we prevent this from happening. The algorithm of
the K-means++ clustering algorithm with a modification to the process of finding cluster
centers can be found in Algorithm 1.
Algorithm 1 K-means++
Initialize kg the number of clusters for PG g
Initialize S the set of stores
Initialize iterM ax the number of iterations of K-means
Pick first initial center c1 uniformly at random from S
for l = 1 to kg − 1 do
minc∈Cl D(y,c)2
Pick new center cl+1 from S with probability P (cl+1 = y) = P 2
s∈S minc∈Cl D(s,c)
end for
for n = 1 to iterM ax do
for z = 1 to kg do
if s ∈ S is closest to center cz then
Assign s to cluster with center cz
end if
Set new cluster center argmin |S1z | s∈Sz D(s, y)
P
y∈Sz
end for
end for
Retailer specifics
To define the clusters retailers operate in compliance with domain-specific business rules.
One of the these business rules that we should take into account during clustering is
that stores originating from the same domicile should be assigned to the same cluster.
This makes us propose the Restricted K-means++ clustering algorithm as found in Al-
gorithm 2. To alter the original clustering algorithm for domain-specifics, the constraint
should be handled with care during initialization or assignment [10]. Since we can easily
depict manually which stores should be assigned to the same clusters according the afore-
mentioned geographical business rule a simple adaption of the assignment of stores should
suffice. Assume V is the set of stores located in the same domicile. Inspired by Wagstaff
et al. [40], assign store v, ∀v ∈ V , to the cluster with center cz according the following
optimization criterion:
1 X
cz = argmin D(v, c). (4.11)
c∈C |V |
v∈V
Hence, Equation 4.11 represents the average distance of the stores located in the same
domicile to the closest cluster center. Thus, all stores located in the same domicile are
assigned to a cluster if these stores are closest on average to the center of that cluster.
This means at first all stores are separately clustered and subsequently cluster centers are
14
determined as proposed in Equation 4.10. Last, taking into account the business rule,
stores are re-assigned according Equation 4.11.
Robustness
Additionally, we focus on building a robust system by conducting the Restricted K-
means++ clustering nStart times taking the clustering r∗ that minimizes the total within
sum-of-squares (TWSS) of the clusters. The TWSS is a measure of internal cohesion and
is the sum of distance functions for all clusters of each point in the cluster to the k th
center. This way we avoid that little bit of randomness that slips into the system by
picking a first cluster center uniformly at random during random seeding.
Note, the Restricted K-means++ clustering algorithm is conducted for all product groups
that are considered separately.
15
Before the actual clustering consensus can be found the definition of dissimilarities
between clusterings should be considered. The main issue to calculating dissimilarities
between clusterings is that the cluster labels may be mixed up in different clusterings,
whereas the underlying structure of the data does not change. Therefore we have to
find a permutation of the cluster labels in such a way that agreement between clusters
is maximized. In order to assess the problem of cluster labels permutation matrix Π
is used to replace λ by λΠ. This relabeling procedure assures the cluster labels of λ
are reassigned without changing the underlying structure of the data. Since searching
through all possible permutations is very time-consuming Dimitriadou et al. [8] defined
the following Euclidean clustering dissimilarity between λ and λ:
e
e = minkλ − λΠk,
d(λ, λ) e (4.12)
Π
where k·k is the Frobenius norm. This distance measure is simply equivalent to maxi-
mizing the number of objects with the same class id in both clusterings. For proof see
Hornik [17]. Now, Hornik proposed to find the optimal Π using the Hungarian method [25]
since we have to deal with the linear sum assignment problem (LSAP). Subsequently, all
information in the elements of the cluster ensemble Λ needs to be incorporated into the
so-called cluster consensus. The cluster consensus is the single clustering of stores we are
looking for. Consensus candidates λ are compared to the cluster ensemble. The optimal
cluster consensus λ∗ is defined as follows:
X
λ∗ = argmin xg d(λ, λg )p , (4.13)
λ
λg ∈Λ
which is equivalent to
X
λ∗ = argmin xg minkλ − λg Πg kp ., (4.14)
λ Πg
λg ∈Λ
where xg is the PG-specific weight. Moreover, weights are based on the revenue of a
specific product group: P
j∈Pg ψj dj
xg = PG P . (4.15)
h=1 j∈Ph ψj dj
Then, xg represents the total revenue of product group g divided by the total revenue
of all product groups such that 0 ≤ xg ≤ 1. If p = 2, we use least squares consensus
clusterings. Now define λe as the weighted average of λg Πg and fix Πg . The weighted
average of λg Πg is equivalent to:
G
e= P 1
X
λ G
x g λ g Πg . (4.16)
g=1 xg g=1
16
Then, Hornik. [17] proves the optimal λ which is λ∗ , is given by λ.
e Moreover, since we
have to find multiple optimal permutations for the separate partitions we no longer assess
the LSAP but it becomes a multi-dimensional assignment problem (MAP) which is solved
using the DWH (Dimitriadou, Weingessel and Hornik) algorithm from the CLUE package
in R. This algorithm is an extension of the greedy algorithm described in Dimitriadou et
al. [8]. Note that without going back to the information contained in the original data or
the Restricted K-means++ clustering algorithm we obtain a final clustering.
G
X
k ∗ = round( xg kg ), (4.17)
g=1
where round(·) is the nearest integer function, xg is the weight for product group g and
kg the number of clusters indicated for product group g.
silhouette index and Hartigan index do not agree on the number of clusters kg found by the elbow method
for product group g, the number of clusters is reconsidered.
17
Price elasticity
µi − σi µi µi + σi
elastic inelastic
Next, labels are assigned to each product within a specific cluster. The average price
elasticity µi,c of product i in cluster c is found and compared to the product-specific
interval. In case µi,c exceeds the lower bound of the interval and µi,c is smaller than -1
the product is assigned the label ‘elastic’. The cluster-specific average price elasticity of
a specific product should exceed -1, otherwise a product is in principle not elastic at all.
Whenever µi,c exceeds the upper bound of the interval and µi,c ∈ (−1, 0) the product
is assigned the label ‘inelastic’. In all other cases the product is assumed ‘regular’. The
labeling procedure is summarized in Algorithm 3.
The following step is to set prices for a new price policy based on the labeling procedure.
Current prices are used as a starting point. Obviously, the prices of inelastic labeled
products are raised and prices of elastic labeled products are cut. Since we have the data
of Jumbo at our disposal we have to take into account their business rules. Therefore,
18
prices should not exceed the lowest prices of the competition within the same domicile.
This leads to a price increase in inelastic products up to the prices of the competition
with a maximum of 5%. The price decrease of elastic products is limited to a maximum
of 5%. Both the percentage changes are set according the policy of the retailer. Last,
price elasticities are used to find the projected sales after the price setting procedure is
conducted. Keep in mind that we force price differences between clusters in order to be
able to prove the value to the retailer to set different prices in different clusters of stores.
To conduct a robustness analysis for the clustering we use the Jaccard (dis)similarity
[20]. The Jaccard similarity is developed to find the similarity between sets of categorical
variables, such as the similarity between the cluster consensus and the separate clusterings
of each product group. This measure is specifically chosen since the distances between
the cluster labels which represent cluster membership are irrelevant. In Equation 4.19 the
formula of the Jaccard similarity can be seen, where λg (S) and λ∗ (S) are the clustering
for product group g and the cluster consensus representing cluster membership:
Below we give an example of the (dis)similarity calculation between the cluster mem-
bership for a specific product group λg (S) and the cluster consensus λ∗ (S). Imagine
hypothetically we have 8 different stores assigned to two different clusters.
h iT
λg (S) = 2 1 1 2 1 2 2 2
h iT (4.20)
λ∗ (S) = 1 1 1 2 2 2 2 1
It can be easily seen that the Jaccard similarity equals 5/8 and the Jaccard dissimilarity
equals 3/8. This means that for the evaluated product group 5 out of the 8 stores end up
in the same cluster as in the cluster consensus.
19
the retailer can make better decisions. In order to discover the drivers of differences in
consumer behavior among clusters panel data models [28] are conducted for the clusters,
inspired by Hoch et al. [16]. Therefore socio-demographic as well as socio-economic vari-
ables and competitional characteristics among others are used [26]. In Subsection 4.3.1 the
theoretical background on regression models is elaborated. Subsection 4.3.2 explains the
variables that are made available and discusses the selection procedure of these variables
for the regression models.
with us,t = (δs − δ) + ηs,t , δs ∼ i.i.d.(δ, σδ2 ), and ηs,t ∼ i.i.d.(0, ση2 ). The price elasticities
per store per cluster are found by aggregating the sales of all products part of all six
considered product groups in a specific store. Note, the price elasticities considered in the
regression models are no longer per product per store since we are especially interested
in the drivers that separate the stores and not so much in product specifics. The feasible
generalized least squares estimation (FGLS) is used to estimate the parameters in the
random effects model.
α
pi ≤ (4.22)
m
20
where pi is the newly found significance level for hypothesis i, α is the desired significance
level, and m is the number of hypotheses tested. In general, the significance level α is set
to 5%.
With regard to the time-dependent variables we have to deal with aggregation issues.
Since the data is made available at store-item level per week we need to aggregate over
the products sold in a store to acquire store level data per week. Moreover, for the prices
of substitutes and the prices of complements the average price during a week is used. For
the promotional variables the percentage of products on promotion is composed.
21
5 | Evaluation
In this chapter the results of the proposed methods are discussed. First, the cluster
definition is elaborated in Section 5.1. Next, in Section 5.2 the results of the sensitivity
analysis for different price policies are considered. In Section 5.3 we discuss the results of
the new price policy. Finally, in Section 5.4 we take a closer look at the results from the
regression analysis.
Cola
Energy
Juice
10
Kids youth
TWSS
Large soda
Water
5
0
2 3 4 5 6 7 8 9 10
Number of clusters
Figure 5.1: The figure presents the number of clusters per product group indicated by
the elbow method by means of the total within sum of squares (TWSS).
The first cluster contains 102 stores represented by an environment with a relatively
high percentage of consumers part of the low social class, see Table A.3 in the Appendix for
1 The silhouette and Hartigan index confirm the number of clusters indicated by the elbow method.
22
more information on social classes. Moreover, the percentages of youth and non-western
immigrants are large. The second cluster consists of 105 stores mostly situated in the
South of the country. As well as for the fourth cluster, which consists of 41 stores, the
percentage of consumers part of the high social class and families with kids are relatively
high. These clusters differ in particular in the share of youth which is highest in cluster
4. Last, the remaining 57 stores belong to cluster 3 and are mostly situated in the upper
North of the country in the rural areas which is reflected in the small share of immigrants
and the fact that supermarkets are relatively far apart. As well as for cluster 1 the
percentage of consumers part of the low social class is relatively high. An overview of
the cluster means of some of the characteristics of the different clusters can be found in
Table 5.1.
Table 5.1: The table presents the cluster means of some of the characteristics of the
different clusters (in percentages).
Table 5.2: The Jaccard dissimilarities of the cluster memberships of the product
groups versus the cluster consensus.
23
In general, the Jaccard dissimilarity of the product groups with the cluster consensus
seems strikingly low with 0.039 as a high for Cola. This indicates an overall difference of
12 out of 305 stores appointed to different clusters. This means the deviation of each of
the clusterings of the product groups from the cluster consensus is small and the stores
for the most part end up in the same cluster. Noteworthy is for example the fact that
for both Kids youth and Large soda the stores in the second cluster do not differ from
the cluster consensus. This could be due to the large number of products together with
the relatively small spread in price elasticity in these product groups, see Table 3.1. This
makes clustering relatively easy compared to the other product groups. Considering the
results of the Jaccard dissimilarity the proposed Restricted K-means++ clustering method
is very robust and only indicates minor differences between the clusterings of the product
groups versus the cluster consensus. The consumer behavior seems to be split roughly
the same for each of the product groups since we are using the most specific level of price
elasticities.
Cola 328 93
Energy 199 63
Water 159 46
Large soda 133 38
Juice 126 32
Kids youth 118 25
Figure 5.2: The initial situation of the average revenue (l) and average profit (r) per
product of the products within a product group (in thousands of euros).
We work step by step towards a new price policy as discussed in Subsection 4.2.4.
After each product is indicated as being ‘elastic’, ‘inelastic’ or ‘regular’, new prices are set
for the different clusters. To recap, the prices of inelastic products are increased up to a
maximum of 5% and the price cut of elastic products is also limited to 5%. Moreover, price
24
elasticities are used to find the projected sales after new prices are set. If we change prices
we have to reckon with competition since prices are not supposed to exceed the prices of
the competition and Jumbo has the ‘lowest price guarantee’, which basically means that
prices of a specific product for all stores in a cluster are set as low as necessary by the
supermarket in a way that there is still sufficient profit left.
First, we take a closer look at the effect of price changes of elastic products on the
revenue and profit. Obviously, by increasing the percentage change of prices of both
the elastic and inelastic products the revenue increases. On the other hand this is not
by definition the case considering the profit. Results show that cutting prices of elastic
products by a predefined percentage in general is not profitable in the end. The reason
is that prices are bounded to the business rule of the lowest price guarantee. There is
no longer space for price cuts without harming the profit due to the negligible difference
between the cost price and out-of-the-door price in general. If we cut prices it does not
lead to sufficient increases in the sales such the profit grows.
Now, we take a closer look at the the effect of price changes of inelastic products on
the revenue and profit. From now on, we consider a maximum price increase in inelastic
products, indicated by ‘max+5%’, as the price increase which is bounded by the price of
that product charged by the competition within the same domicile. For example, product
i and j are priced e1 in our store. The price of the competition for product i is e1,05
and the price of the competition for product j is e1,10. The price of product i can be
raised up to e1,04, since the new price can not exceed the price of the competition. This
results in a price increase of 4%. For product j we can raise the price up to e1,09 which
represents a price increase of 9%. However we are bounded to a maximum increase of 5%
of the current price resulting in a new price of e1,05 for product j. Results show that
changing prices of inelastic products results in more revenue and more profit in the end.
In conclusion, a price change of elastic products in general does not lead to an increase
in profit, but price changes of elastic products prove to be valuable in case of the revenue.
On the other hand, the price increase in inelastic products is valuable to the revenue and
is profitable. Thus, the optimal profit can be found by not changing prices of elastic
products and changing prices of inelastic products up to the maximum of +5%. In the
end we are dealing with a trade-off between revenue and profit. The increase in revenue
can be related to more customers and therefore more relevance of the retailer in the eyes
of the customer. On the other hand profit is important for the retailer itself. The impact
on the revenue and profit by means of the step wise percentage change in price of inelastic
and elastic products can be found in Table B.1 and B.2 in the Appendix, respectively.
25
Table 5.3: The projected increase in revenue and profit for different price policies (in
percentages).
Elastic Inelastic Revenue Profit
Table 5.3 presents the sensitivity analysis for the projected increase in revenue and
profit for different price policies in percentages. As mentioned before, we noticed that
changing prices of elastic products results in an increase in revenue and in a decrease in
profit generally. However, looking at price changes of inelastic products both revenue and
profit result in an increase. We impose a trade-off between revenue and profit. To find
middle ground, from this moment onward we focus on the price policy where prices of
elastic products are cut by 2% and prices of inelastic products are raised by max+5%. In
this case, the projected increase in revenue and profit of the considered product groups
are 0.36% and 0.76%, respectively.
26
Table 5.4: The projected increase in revenue per product group by cluster with
competition taken into account and 2% price change of elastic products.
Cola 18,528 (0.15) 76,274 (0.63) 23,222 (0.39) 10,505 (0.20) 128,529 (0.36)
Energy 531 (0.01) 3,605 (0.08) 13,626 (0.60) 6,506 (0.49) 24,268 (0.18)
Juice 7,960 (0.08) 47,689 (0.52) 38,157 (0.62) 18,141 (0.43) 111,947 (0.38)
Kids youth 32,077 (0.34) 36,784 (0.39) 40,109 (0.63) 9,693 (0.32) 118,663 (0.42)
Large soda 9,953 (0.12) 54,504 (0.53) 34,461 (0.61) 8,973 (0.32) 107,891 (0.40)
Water 6,692 (0.14) 17,170 (0.29) 16,120 (0.65) 8,312 (0.44) 48,294 (0.32)
Total 75,741 (0.15) 236,026 (0.46) 165,695 (0.58) 62,130 (0.34) 539,772 (0.36)
Table 5.4 gives the projected increase in revenue per product group by cluster. In line
with our expectations the relative increase in revenue of cluster 3 is highest compared
to the other clusters by reason of the substantial share of products indicated as elastic
or inelastic, namely 19.2%. However, in absolute terms cluster 2 outperforms cluster 3
since it consists of almost double the number of stores. The average number of products
indicated as inelastic and elastic per cluster is 17.5%. For more information on the number
of products indicated as elastic or inelastic see Table B.4 in the Appendix. In addition,
outstanding is the minor increase in revenue for cluster 1 despite the large amount of stores
in cluster 1. The number of products indicated as inelastic (6%) is strikingly low. Further
notice that Kids youth experiences the most growth in relative terms and in particular
cluster 1 stands out with regard to the other product groups.
Table 5.5: The projected increase in profit per product group by cluster with
competition taken into account and 2% price change of elastic products.
Cola 27,256 (0.79) 9,807 (0.28) 23,112 (1.34) 2,512 (0.18) 62,687 (0.63)
Energy -71 (-0.00) -714 (-0.01) 16,238 (2.13) -2,319 (-0.56) 13,134 (0.31)
Juice 4,869 (0.19) 21,703 (0.94) 29,692 (1.92) 69 (0.00) 56,333 (0.76)
Kids youth 18,063 (0.92) 41,332 (2.01) 2,176 (0.16) 4,320 (0.67) 65,890 (1.10)
Large soda 11,878 (0.50) 28,838 (1.01) 35,322 (2.15) 428 (0.01) 76,465 (0.99)
Water -1,022 (-0.07) 12,973 (0.78) 10,984 (1.49) 2,747 (0.51) 25,683 (0.59)
Total 60,973 (0.46) 113,939 (0.83) 117,524 (1.52) 7,757 (0.16) 300,193 (0.76)
Then, the projected increase in profit per product group by cluster is depicted in
Table 5.5. In general the different clusters of each of the product groups are more profitable
than the initial situation. Nonetheless, we can find some clusters, such as cluster 2 of the
product group Energy and cluster 1 of the product group Water, which result in a decrease
in profit. Especially for cluster 2 of the product group Energy this is not surprising since
exclusively some products are indicated as elastic and as we have seen before price changes
27
of elastic products usually result in a decrease in profit. Cluster 3 outperforms cluster 2
although the projected increase in revenue for cluster 2 is higher. This indicates the cost
of goods sold changes in favor of cluster 3. In absolute terms as well as in relative terms
cluster 4 is least profitable among other things because of the small number of stores that
end up in cluster 4. Then, the projected increase in profit in relative terms is highest for
the product group Kids youth. Likewise, the projected increase in profit of the product
group Kids youth is highest.
Finally, results show a projected increase in revenue of 0.36% and a projected increase
in profit of 0.76% in total for the considered product groups. Cluster 3 is most profitable
as a result of the large amount of inelastic indicated products (11%) despite the fact that
cluster 2 generates a higher projected increase in revenue. With regard to the product
groups, especially the profit increase in Kids youth attracts attention endorsed by the
projected increase in revenue.
28
results of the panel data regression models with regard to the clusters can be found in
Table C.8 in the Appendix. Additionally, the panel data regression models per product
group per cluster are also elaborated in Appendix C to show differences within each
product group between the four found clusters. Furthermore, this Appendix contains
the results of regression models for the different product groups to show differences in
consumer behavior among product groups.
Table 5.6: The table presents some of the coefficients (and standard errors) of the
variables of the panel data regression models on store-item level for all clusters to
declare the dependent variable of price elasticities. Note, that the significance level is set
to 5% and only variables indicated by ‘-’ are not significant.
Father’s day -0.025 (0.004) -0.033 (0.005) -0.027 (0.005) -0.027 (0.008)
Mother’s day 0.050 (0.006) 0.054 (0.007) 0.029 (0.007) 0.069 (0.012)
Price ratio substitute 1 -1.083 (0.039) 0.316 (0.033) -1.183 (0.049) -0.387 (0.064)
Instore flyer -0.446 (0.12) -0.469 (0.139) 0.000 (0.000) -0.786 (0.247)
National folder -1.203 (0.140) -0.640 (0.120) 0.000 (0.000) -0.551 (0.216)
29
6 | Conclusion
30
The follow-up step is to consider if and how the analysis could be extended to the
complete store, which means taking all product groups present in the store in considera-
tion. Moreover, in this research we focus exclusively on the first cluster and are limited
to the results that belong to this cluster. Therefore, an opinion on operational feasibility
should be formed concerning the extension to complete stores and all existing pricelines.
In general, it seems obvious to reconsider the time span of pricing optimization as price
sensitivity changes over time. For example, price elasticity differs from ‘regular’ weeks
around the holidays proven by the panel data analysis. These days prices are optimized
mostly 2-3 times per year whereas it could be interesting and profitable to update prices
more regularly. Although outside the scope of this research, one could think about ap-
plying the method by Brombin et al. [7] to search for the covariates of price elasticity
if interested in the drivers of the differences between clusters. This simulation method
provides an elegant way to find the independent variables for models of price elasticity.
This way the overall process of regression could be fully automated like the clustering
algorithm without interference of the end-user selecting the variables by hand.
31
Bibliography
[2] Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In:
Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms.
pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
[3] Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: In Pro-
ceedings of 19th International Conference on Machine Learning (ICML-2002. Citeseer
(2002)
[4] Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise con-
strained clustering. In: SDM. vol. 4, pp. 333–344. SIAM (2004)
[5] Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised
clustering. In: Proceedings of the tenth ACM SIGKDD international conference on
Knowledge discovery and data mining. pp. 59–68. ACM (2004)
[6] Bonferroni, C.E.: Teoria statistica delle classi e calcolo delle probabilita. Libreria
internazionale Seeber (1936)
[7] Brombin, C., Finos, L., Salmaso, L.: Adjusting stepwise p-values in generalized linear
models. In: International Conference on Multiple Comparison Procedures.–see step.
adj () in the R someMTP package (2007)
[8] Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy cluster-
ing. International Journal of Pattern Recognition and Artificial Intelligence 16(07),
901–912 (2002)
[9] Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the
number of clusters in a dataset. Genome biology 3(7), 1–21 (2002)
[10] Eduardo, M., Brea, A., et al.: Constrained clustering algorithms: Practical issues
and applications (2013)
32
[11] Elrod, T., Winer, R.S.: An empirical evaluation of aggregation approaches for devel-
oping market segments. The Journal of Marketing pp. 65–74 (1982)
[12] Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability
of classifications. Biometrics 21, 768–769 (1965)
[13] Ghaemi, R., Sulaiman, M.N., Ibrahim, H., Mustapha, N., et al.: A survey: clustering
ensembles techniques. World Academy of Science, Engineering and Technology 50,
636–645 (2009)
[14] Gordon, A.D.: Classification, (chapman & hall/crc monographs on statistics & ap-
plied probability) (1999)
[16] Hoch, S.J., Kim, B.D., Montgomery, A.L., Rossi, P.E.: Determinants of store-level
price elasticity. Journal of marketing Research pp. 17–29 (1995)
[17] Hornik, K.: A clue for cluster ensembles. Journal of Statistical Software 14(11) (2005)
[18] Hsiao, C.: Analysis of panel data. No. 54, Cambridge university press (2014)
[19] Huang, M.H., Hahn, D.E., Jones, E., et al.: Determinants of price elasticities for
store brands and national brands of cheese. In: American Agricultural Economics
Association Annual Meeting, Denver, Colorado. August. pp. 1–4 (2004)
[20] Jaccard, P.: The distribution of the flora in the alpine zone. New phytologist 11(2),
37–50 (1912)
[21] Karypis, G., Kumar, V.: Multilevelk-way partitioning scheme for irregular graphs.
Journal of Parallel and Distributed computing 48(1), 96–129 (1998)
[22] Karypis, G., Kumar, V.: Parallel multilevel series k-way partitioning scheme for
irregular graphs. Siam Review 41(2), 278–300 (1999)
[23] Kaufman, L., Rousseeuw, P.: Clustering by means of medoids. North-Holland (1987)
[24] Klein, D., Kamvar, S.D., Manning, C.D.: From instance-level constraints to space-
level constraints: Making the most of prior knowledge in data clustering (2002)
[25] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research
logistics quarterly 2(1-2), 83–97 (1955)
[26] Kumar, V., Karande, K.: The effect of retail store environment on retailer perfor-
mance. Journal of Business Research 49(2), 167–181 (2000)
[27] Lagin, M., Gebert-Persson, S.: Defining the links between retail price strategies and
price tactics (2015)
33
[28] Maddala, G.S., Lahiri, K.: Introduction to econometrics, vol. 2. Macmillan New York
(1992)
[29] Marn, M.V., Roegner, E.V., Zawada, C.C.: Pricing new products. McKinsey Quar-
terly (3), 40–49 (2003)
[30] Monroe, K.B.: Measuring price thresholds by psychophysics and latitudes of accep-
tance. Journal of Marketing Research pp. 460–464 (1971)
[31] Ng, A.: Clustering with the k-means algorithm. Machine Learning (2012)
[32] Petrick, J.F.: Segmenting cruise passengers with price sensitivity. Tourism Manage-
ment 26(5), 753–762 (2005)
[33] Porro-Muñoz, D., Duin, R.P., Talavera, I.: Missing values in dissimilarity-based clas-
sification of multi-way data. In: Progress in Pattern Recognition, Image Analysis,
Computer Vision, and Applications, pp. 214–221. Springer (2013)
[34] Rossi, P.E., Allenby, G.M.: A bayesian approach to estimating household parameters.
Journal of Marketing Research pp. 171–182 (1993)
[35] Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of
cluster analysis. Journal of computational and applied mathematics 20, 53–65 (1987)
[36] Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining
multiple partitions. Journal of machine learning research 3(Dec), 583–617 (2002)
[37] Thalamuthu, A., Mukhopadhyay, I., Zheng, X., Tseng, G.C.: Evaluation and com-
parison of gene clustering methods in microarray analysis. Bioinformatics 22(19),
2405–2412 (2006)
[38] Tumer, K., Agogino, A.K.: Ensemble clustering with voting active clusters. Pattern
Recognition Letters 29(14), 1947–1953 (2008)
[39] Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. In-
ternational Journal of Pattern Recognition and Artificial Intelligence 25(03), 337–372
(2011)
[40] Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al.: Constrained k-means clus-
tering with background knowledge. In: ICML. vol. 1, pp. 577–584 (2001)
[41] Wakefield, K.L., Inman, J.J.: Situational price sensitivity: the role of consumption
occasion, social context and income. Journal of Retailing 79(4), 199–212 (2003)
[42] Wang, K., Wang, B., Peng, L.: Cvap: validation for cluster analyses. Data Science
Journal 8, 88–93 (2009)
34
Appendices
35
A | Variables
z Cluster indicator -
n Iteration indicator -
i, j Product indicator -
r Results indicator -
s, v, y Store indicator -
t Time indicator -
k Number of clusters -
kg Number of clusters in PG g -
k∗ Number of clusters in cluster consensus -
g Product group indicator -
iterM ax Number of iterations K-means -
nStart Number of iterations Restricted K-means++ -
G Number of product groups -
36
Table A.2: The variables used in the regression analysis.
Variable Description
Ascension day holiday, when holiday date falls in week then 1 else 0
Back to school school, when holiday date falls in week then 1 else 0
Carnival holiday, when holiday date falls in week then 1 else 0
Easter holiday, when holiday date falls in week then 1 else 0
Father’s day holiday, when holiday date falls in week then 1 else 0
Autumn school, when holiday date falls in week then 1 else 0
Krokus school, when holiday date falls in week then 1 else 0
Liberation day holiday, when holiday date falls in week then 1 else 0
New year holiday, when holiday date falls in week then 1 else 0
Pre-Easter holiday, when holiday date falls in week then 1 else 0
Pre-Pentecost holiday, when holiday date falls in week then 1 else 0
Pre-Christmas holiday, when holiday date falls in week then 1 else 0
Queen’s day holiday, when holiday date falls in week then 1 else 0
Mother’s day holiday, when holiday date falls in week then 1 else 0
Olympus holiday, when holiday date falls in week then 1 else 0
Pentecost holiday, when holiday date falls in week then 1 else 0
Sinterklaas holiday, when holiday date falls in week then 1 else 0
Christmas holiday, when holiday date falls in week then 1 else 0
37
Table A.3: The social class, the division is based upon the breadwinner of a family.
Horizontally we can see the study degree according the Dutch education system and
vertically the profession.
Business owner
A (high) A A A B1 B2 C
(leads 10 or more persons)
Business owner
A A A A B1 B2 C
(leads less than 10 persons)
Farmer / gardener A A B1 B1 B2 C C
Free professions A A A B1 B2 C C
Scientist and higher; manager A B1 B1 B1 B2 C C
Scientist and higher; no manager A B1 B1 B1 B2 C C
Secondary technical and vocational
A B1 B1 B1 B2 C C
education (SBC 4); manager
Secondary technical and vocational
A B1 B1 B2 B2 C C
education (SBC 4); no manager
Elementary school and lower technical and
B1 B2 B2 C C C C
vocational education (SBC 1 en 2)
38
B | Impact clustering results
Table B.1: The step wise revenue without taking competition into account, the
horizontal percentage change represents the price increase in inelastic products and the
vertical percentage change represents the price cut of elastic products (in thousands of
euros).
Table B.2: The step wise profit without taking competition into account, the horizontal
percentage change represents the price increase in inelastic products and the vertical
percentage change represents the price cut of elastic products (in thousands of euros).
39
Table B.3: The projected increase in profit with competition vs. without competition
taken into account and 2% price change of elastic products, between brackets the
relative change can be found.
Table B.4: The share of products labeled as inelastic or elastic by the labeling
procedure (in percentages).
40
C | Panel data regression results
Table C.1: The table presents the coefficients (and standard errors) of the variables of
the panel data regression models for Cola on store-item level for all clusters to declare
the dependent variable of price elasticities. Note, that the significance level is set to 5%.
Ascension day 0.022 (0.005) 0.044 (0.006) 0.009 (0.006) 0.026 (0.009)
Back to school 0.004 (0.004) 0.017 (0.005) 0.008 (0.005) -0.004 (0.007)
Carnival 0.025 (0.005) 0.000 (0.006) 0.005 (0.006) 0.023 (0.009)
Easter 0.007 (0.004) 0.024 (0.005) 0.000 (0.005) 0.011 (0.007)
Father’s day -0.025 (0.004) -0.033 (0.005) -0.027 (0.005) -0.027 (0.008)
Autumn 0.013 (0.004) 0.031 (0.005) 0.027 (0.005) 0.017 (0.007)
Krokus -0.001 (0.004) -0.024 (0.005) 0.004 (0.005) -0.012 (0.007)
Liberation day -0.005 (0.004) -0.020 (0.005) -0.012 (0.005) -0.001 (0.008)
New year 0.045 (0.004) 0.021 (0.005) 0.035 (0.005) 0.055 (0.007)
Pre-Easter -0.019 (0.004) 0.004 (0.005) -0.018 (0.005) -0.013 (0.007)
Pre-Pentecost -0.020 (0.004) -0.016 (0.005) -0.022 (0.005) -0.024 (0.008)
Pre-Christmas 0.033 (0.007) 0.053 (0.010) 0.034 (0.010) 0.052 (0.014)
Queen’s day -0.011 (0.004) -0.005 (0.005) -0.013 (0.005) 0.003 (0.008)
Mother’s day 0.050 (0.006) 0.054 (0.007) 0.029 (0.007) 0.069 (0.012)
Olympus 0.111 (0.007) 0.086 (0.008) 0.110 (0.008) 0.082 (0.012)
Pentecost -0.001 (0.004) 0.009 (0.005) 0.007 (0.005) 0.000 (0.008)
Sinterklaas 0.046 (0.004) 0.062 (0.005) 0.027 (0.005) 0.066 (0.007)
Christmas 0.048 (0.006) 0.070 (0.008) 0.057 (0.008) 0.083 (0.010)
Price ratio complement 1 0.010 (0.017) 0.031 (0.020) -0.053 (0.020) -0.091 (0.032)
Price ratio competition -0.636 (0.049) 0.348 (0.064) -0.224 (0.066) -0.179 (0.096)
Price ratio regular price 0.145 (0.062) 1.773 (0.078) 1.004 (0.082) 0.912 (0.121)
Price ratio substitute 1 -0.884 (0.015) -0.591 (0.018) -0.834 (0.019) -0.662 (0.024)
Price ratio substitute 2 -0.780 (0.032) -1.202 (0.038) -0.921 (0.044) -1.127 (0.069)
Discount price -0.005 (0.000) -0.006 (0.000) -0.005 (0.000) -0.004 (0.001)
Trips 0.006 (0.000) 0.003 (0.000) 0.005 (0.000) 0.012 (0.000)
Dynamic newsletter 0.000 (0.000) 0.643 (0.106) 0.463 (0.111) 1.366 (0.153)
Entrance poster -1.647 (0.502) -5.129 (0.593) -1.856 (0.586) 0.000 (0.000)
National folder 0.825 (0.114) 0.704 (0.145) 0.765 (0.162) 0.876 (0.229)
Instore flyer -0.348 (0.042) -0.806 (0.091) -0.602 (0.095) -1.428 (0.133)
Online radio 0.134 (0.029) -0.116 (0.035) 0.162 (0.039) 0.237 (0.057)
Openings flyer -1.979 (0.124) -2.005 (0.148) -0.875 (0.143) -1.356 (0.138)
Signing on shelf 1.341 (0.064) 0.362 (0.079) 1.042 (0.081) 0.759 (0.125)
Second placing -0.814 (0.071) 0.471 (0.089) -0.239 (0.095) 0.620 (0.138)
Store radio 3.76 (0.486) 6.577 (0.580) 2.586 (0.573) 0.000 (0.000)
Rain (x1000) 0.017 (0.003) 0.015 (0.004) 0.012 (0.005 ) 0.035 (0.007)
Temperature (x1000) 0.221 (0.019) 0.617 (0.016) 0.213 (0.016) 0.644 (0.023)
Seasonality 0.098 (0.006) 0.155 (0.009) 0.130 (0.007) 0.000 (0.000)
Percentage high education -0.306 (0.069) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Percentage high income 0.158 (0.05) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
City-rural dummy -0.035 (0.012) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
No. competitive stores in 15km 0.000 (0.000) -0.013 (0.003) -0.005 (0.002) 0.000 (0.000)
41
Table C.2: The table presents the coefficients (and standard errors) of the variables of
the panel data regression models for Energy on store-item level for all clusters to declare
the dependent variable of price elasticities. Note, that the significance level is set to 5%.
Ascension day 0.076 (0.007) 0.064 (0.005) 0.060 (0.009) 0.049 (0.010)
Back to school -0.001 (0.006) -0.014 (0.004) 0.017 (0.009) 0.005 (0.009)
Carnival 0.013 (0.008) 0.015 (0.006) 0.022 (0.011) 0.007 (0.012)
Easter 0.045 (0.006) 0.018 (0.004) 0.031 (0.008) 0.022 (0.009)
Father’s day 0.017 (0.007) 0.012 (0.004) 0.019 (0.009) 0.001 (0.010)
Autumn -0.063 (0.007) -0.023 (0.005) -0.070 (0.009) -0.025 (0.009)
Krokus 0.023 (0.006) 0.015 (0.004) 0.024 (0.008) 0.017 (0.009)
Liberation day 0.017 (0.007) -0.006 (0.005) 0.016 (0.009) -0.004 (0.010)
New year -0.060 (0.006) -0.030 (0.004) -0.084 (0.008) -0.057 (0.009)
Pre-Easter 0.045 (0.006) 0.021 (0.004) 0.050 (0.008) 0.029 (0.009)
Pre-Pentecost 0.029 (0.007) 0.026 (0.005) 0.026 (0.010) 0.020 (0.010)
Pre-Christmas -0.010 (0.012) -0.013 (0.009) -0.008 (0.017) -0.029 (0.018)
Queen’s day 0.073 (0.007) 0.037 (0.005) 0.074 (0.009) 0.033 (0.010)
Mother’s day 0.105 (0.011) 0.038 (0.007) 0.092 (0.013) 0.066 (0.016)
Olympus 0.253 (0.013) 0.099 (0.008) 0.239 (0.015) 0.173 (0.024)
Pentecost 0.053 (0.007) 0.041 (0.004) 0.041 (0.009) 0.019 (0.010)
Sinterklaas -0.040 (0.006) -0.019 (0.004) -0.033 (0.008) -0.038 (0.009)
Christmas -0.012 (0.009) 0.005 (0.007) -0.009 (0.012) 0.016 (0.013)
Price ratio competition -1.464 (0.045) -0.541 (0.031) -1.263 (0.061) -0.531 (0.053)
Price ratio regular price -0.390 (0.071) -0.550 (0.051) -0.047 (0.100) -1.404 (0.106)
Price ratio substitute 1 -0.361 (0.029) -0.362 (0.020) -0.158 (0.039) -0.319 (0.040)
Price ratio substitute 2 -0.564 (0.025) -0.330 (0.016) -0.742 (0.032) -0.337 (0.035)
Discount price 0.016 (0.003) 0.015 (0.002) 0.016 (0.006) 0.033 (0.006)
Trips 0.001 (0.000) 0.004 (0.000) -0.008 (0.000) 0.001 (0.000)
Rain (x1000) -0.083 (0.006) -0.032 (0.004) -0.101 (0.008) -0.084 (0.009)
Temperature (x1000) 0.620 (0.020) 0.608 (0.013) 0.690 (0.030) 0.620 (0.028)
Seasonality 0.232 (0.01) 0.178 (0.004) 0.236 (0.014) 0.252 (0.006)
National folder -6.656 (0.391) -4.993 (0.255) -6.690 (0.502) -6.671 (0.502)
Instore flyer -0.795 (0.147) -0.757 (0.096) -1.231 (0.186) -1.203 (0.196)
Online radio 0.221 (0.035) 0.084 (0.022) 0.286 (0.049) 3.332 (0.173)
Signing on shelf 2.612 (0.138) 2.328 (0.089) 2.585 (0.172) 0.776 (0.076)
Second placing 0.480 (0.053) 0.639 (0.035) 0.401 (0.075) 0.000 (0.000)
Region East 0.000 (0.000) 0.115 (0.055) 0.000 (0.000) 0.000 (0.000)
Region South 0.000 (0.000) 0.139 (0.042) 0.000 (0.000) 0.000 (0.000)
City-rural dummy 0.000 (0.000) -0.061 (0.019) 0.000 (0.000) 0.000 (0.000)
Percentage high education 0.000 (0.000) 0.000 (0.000) 0.500 (0.124) 0.334 (0.018)
No. competitive stores in 15km 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.002 (0.001)
42
Table C.3: The table presents the coefficients (and standard errors) of the variables of
the panel data regression models for Kids youth on store-item level for all clusters to
declare the dependent variable of price elasticities. Note, that the significance level is set
to 5%.
Ascension day -0.004 (0.006) 0.026 (0.005) 0.002 (0.007) 0.004 (0.010)
Back to school 0.001 (0.006) 0.042 (0.005) -0.028 (0.007) -0.008 (0.010)
Carnival -0.021 (0.007) -0.027 (0.006) -0.023 (0.009) -0.017 (0.012)
Easter -0.046 (0.006) -0.056 (0.005) -0.062 (0.007) -0.050 (0.009)
Father’s day -0.086 (0.006) -0.039 (0.005) -0.098 (0.007) -0.079 (0.010)
Autumn -0.015 (0.006) -0.025 (0.005) -0.026 (0.007) -0.017 (0.009)
Krokus 0.028 (0.006) 0.022 (0.005) 0.045 (0.007) 0.036 (0.009)
Liberation day -0.001 (0.006) -0.013 (0.005) -0.021 (0.007) -0.024 (0.010)
New year -0.021 (0.006) -0.082 (0.006) -0.005 (0.008) -0.058 (0.010)
Pre-Easter -0.061 (0.005) -0.045 (0.005) -0.076 (0.007) -0.073 (0.009)
Pre-Pentecost -0.003 (0.006) 0.013 (0.005) -0.001 (0.008) -0.004 (0.010)
Pre-Christmas 0.087 (0.011) 0.053 (0.010) 0.061 (0.014) 0.071 (0.018)
Queen’s day -0.009 (0.006) -0.015 (0.006) -0.031 (0.008) -0.023 (0.011)
Mother’s day -0.010 (0.008) -0.028 (0.007) 0.006 (0.010) -0.032 (0.014)
Olympus 0.078 (0.008) 0.031 (0.007) 0.097 (0.009) 0.063 (0.014)
Pentecost 0.054 (0.006) 0.036 (0.005) 0.041 (0.007) 0.043 (0.010)
Sinterklaas -0.022 (0.006) -0.036 (0.005) -0.012 (0.007) -0.018 (0.009)
Christmas -0.029 (0.008) -0.019 (0.008) -0.015 (0.011) -0.035 (0.013)
Price ratio competition -0.379 (0.096) -0.554 (0.088) 0.176 (0.121) -0.566 (0.163)
Price ratio regular price 0.420 (0.089) 0.267 (0.080) -0.199 (0.113) 0.676 (0.152)
Price ratio substitute 1 0.601 (0.050) -0.404 (0.041) 0.251 (0.069) 0.413 (0.079)
Price ratio substitute 2 -1.024 (0.048) 0.115 (0.039) -0.715 (0.062) -0.769 (0.077)
Discount price 0.002 (0.004) 0.031 (0.003) 0.013 (0.005) 0.003 (0.009)
Trips 0.027 (0.001) 0.007 (0.000) 0.025 (0.001) 0.022 (0.001)
National folder -0.499 (0.207) -0.351 (0.181) -1.579 (0.260) -0.093 (0.330)
Instore flyer 1.598 (0.132) 0.681 (0.113) 1.494 (0.169) 1.288 (0.218)
Online radio 0.087 (0.038) 0.619 (0.032) 0.073 (0.049) 0.264 (0.062)
Signing on shelf -2.588 (0.139) -1.661 (0.121) -2.626 (0.175) -2.033 (0.222)
Second placing 0.087 (0.113) 0.769 (0.109) 0.724 (0.147) -0.025 (0.178)
Openings flyer 2.362 (0.078) 1.205 (0.072) 1.774 (0.102) 2.169 (0.122)
Rain (x1000) 0.014 (0.002) -0.037 (0.005) 0.000 (0.000) 0.021 (0.009)
Temperature (x1000) 0.325 (0.019) 0.260 (0.018) 0.351 (0.023) 0.500 (0.030)
Seasonality 0.034 (0.008) 0.110 (0.009) 0.000 (0.000) 0.082 (0.007)
No. competitive store in 15km 0.007 (0.002) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Percentage unemployed -1.619 (0.333) -1.721 (0.373) 0.000 (0.000) 0.000 (0.000)
Percentage regular visitor competitor -0.137 (0.039) -0.143 (0.048) 0.156 (0.061) 0.086 (0.025)
Percentage high education 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.199 (0.033)
Percentage high income 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) -0.144 (0.029)
43
Table C.4: The table presents the coefficients (and standard errors) of the variables of
the panel data regression models for Large soda on store-item level for all clusters to
declare the dependent variable of price elasticities. Note, that the significance level is set
to 5%.
Ascension day 0.043 (0.006) 0.039 (0.005) 0.035 (0.008) 0.036 (0.011)
Back to school -0.020 (0.005) -0.019 (0.005) -0.023 (0.007) -0.033 (0.010)
Carnival 0.071 (0.007) 0.036 (0.006) 0.113 (0.009) 0.045 (0.013)
Easter 0.055 (0.005) 0.053 (0.005) 0.043 (0.007) 0.036 (0.009)
Father’s day -0.001 (0.006) -0.009 (0.005) 0.001 (0.008) -0.016 (0.011)
Autumn -0.043 (0.005) -0.043 (0.005) -0.031 (0.007) -0.053 (0.010)
Krokus 0.011 (0.005) 0.017 (0.005) -0.009 (0.007) 0.001 (0.010)
Liberation day 0.031 (0.006) 0.013 (0.005) 0.030 (0.008) 0.013 (0.011)
New year 0.057 (0.005) 0.066 (0.005) 0.016 (0.007) 0.075 (0.010)
Pre-Easter 0.092 (0.005) 0.162 (0.005) 0.027 (0.007) 0.101 (0.009)
Pre-Pentecost 0.000 (0.006) 0.026 (0.005) -0.015 (0.008) 0.006 (0.011)
Pre-Christmas 0.066 (0.011) 0.072 (0.010) 0.040 (0.014) 0.096 (0.019)
Queen’s day -0.004 (0.006) 0.02 (0.005) -0.011 (0.008) -0.016 (0.011)
Mother’s day 0.004 (0.009) -0.002 (0.007) 0.007 (0.011) 0.018 (0.017)
Olympus 0.141 (0.007) 0.086 (0.006) 0.128 (0.009) 0.147 (0.015)
Pentecost 0.069 (0.006) 0.055 (0.005) 0.060 (0.007) 0.056 (0.010)
Sinterklaas -0.070 (0.005) -0.046 (0.005) -0.096 (0.007) -0.059 (0.010)
Christmas 0.031 (0.008) 0.030 (0.008) 0.028 (0.011) 0.056 (0.014)
Price ratio competition -0.343 (0.091) -0.221 (0.080) -0.892 (0.130) -0.189 (0.175)
Price ratio regular price -3.586 (0.114) -2.459 (0.100) -3.226 (0.168) -3.386 (0.226)
Price ratio substitute 1 -0.282 (0.059) -0.266 (0.048) -0.413 (0.080) -0.267 (0.105)
Price ratio substitute 2 0.308 (0.039) 0.213 (0.031) 0.403 (0.053) 0.041 (0.065)
Price index complement 1 0.601 (0.043) 0.474 (0.035) 0.549 (0.057) 0.472 (0.078)
Discount price 0.071 (0.002) 0.042 (0.001) 0.038 (0.002) 0.102 (0.006)
Trips 0.014 (0.001) 0.019 (0.000) 0.013 (0.001) 0.009 (0.001)
National folder -0.820 (0.203) -0.870 (0.168) -0.639 (0.283) -0.621 (0.392)
Instore flyer 0.812 (0.184) 0.888 (0.153) 0.842 (0.259) 0.605 (0.353)
Online radio 0.011 (0.028) -0.151 (0.023) 0.269 (0.036) -0.116 (0.051)
Signing on shelf -1.036 (0.246) -1.206 (0.205) -1.412 (0.354) 0.084 (0.461)
Second placing 1.106 (0.143) 1.201 (0.121) 1.044 (0.208) 0.363 (0.264)
Openings flyer -0.154 (0.091) 0.279 (0.077) -0.135 (0.127) 0.000 (0.000)
Rain (x1000) -0.049 (0.005) -0.040 (0.005) -0.057 (0.007) -0.040 (0.010)
Temperature (x1000) -0.129 (0.020) -0.037 (0.018) -0.242 (0.028) 0.000 (0.000)
Seasonality 0.170 (0.007) 0.203 (0.007) 0.122 (0.010) 0.145 (0.010)
No. competitive stores in 15km -0.006 (0.002) -0.005 (0.002) 0.000 (0.000) 0.000 (0.000)
Region East -0.029 (0.013) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Region South -0.072 (0.017) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
City-rural dummy -0.051 (0.012) 0.000 (0.000) -0.074 (0.027) 0.000 (0.000)
Self-scan dummy -0.041 (0.012) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Percentage unemployed 0.000 (0.000) -1.345 (0.413) 0.000 (0.000) 0.000 (0.000)
Percentage non-western immigrants 0.000 (0.000) 0.567 (0.189) 0.000 (0.000) 0.000 (0.000)
Percentage western-immigrants 0.000 (0.000) -0.304 (0.146) 0.000 (0.000) 0.000 (0.000)
44
Table C.5: The table presents the coefficients (and standard errors) of the variables of
the panel data regression models for Juices on store-item level for all clusters to declare
the dependent variable of price elasticities. Note, that the significance level is set to 5%.
Ascension day 0.125 (0.010) 0.111 (0.010) 0.122 (0.012) 0.132 (0.015)
Back to school -0.047 (0.009) -0.042 (0.009) -0.074 (0.012) -0.046 (0.014)
Carnival -0.045 (0.011) -0.042 (0.011) -0.043 (0.014) -0.046 (0.017)
Easter 0.099 (0.009) 0.125 (0.009) 0.098 (0.011) 0.099 (0.012)
Father’s day 0.090 (0.010) 0.090 (0.009) 0.070 (0.012) 0.079 (0.014)
Autumn 0.056 (0.009) 0.078 (0.009) 0.050 (0.011) 0.032 (0.013)
Krokus 0.016 (0.009) -0.017 (0.009) 0.055 (0.011) 0.028 (0.013)
Liberation day 0.012 (0.010) 0.010 (0.009) 0.013 (0.012) 0.013 (0.014)
New year 0.006 (0.009) -0.012 (0.009) 0.015 (0.012) 0.016 (0.013)
Pre-Easter 0.111 (0.009) 0.182 (0.009) 0.045 (0.011) 0.136 (0.013)
Pre-Pentecost 0.107 (0.010) 0.136 (0.010) 0.059 (0.012) 0.105 (0.015)
Pre-Christmas 0.072 (0.018) 0.059 (0.018) 0.111 (0.023) 0.138 (0.027)
Queen’s day 0.022 (0.009) 0.029 (0.009) 0.025 (0.012) 0.017 (0.014)
Mother’s day -0.001 (0.014) -0.023 (0.012) 0.028 (0.016) -0.002 (0.021)
Olympus -0.159 (0.013) -0.177 (0.011) -0.069 (0.015) -0.170 (0.020)
Pentecost 0.028 (0.009) 0.018 (0.009) 0.032 (0.012) 0.039 (0.014)
Sinterklaas 0.019 (0.009) -0.006 (0.009) 0.081 (0.011) 0.074 (0.013)
Christmas 0.132 (0.013) 0.155 (0.014) 0.129 (0.018) 0.094 (0.019)
Price ratio competition 0.807 (0.104) 1.017 (0.098) 0.192 (0.132) 0.644 (0.164)
Price ratio regular price 3.308 (0.138) 3.184 (0.129) 4.186 (0.178) 2.991 (0.206)
Price ratio substitute 1 1.773 (0.045) 1.859 (0.040) 1.224 (0.054) 1.700 (0.072)
Price ratio substitute 2 -2.984 (0.045) -2.81 (0.040) -2.868 (0.057) -2.746 (0.066)
Discount price 0.129 (0.013) 0.045 (0.006) 0.021 (0.005) 0.122 (0.015)
Trips 0.024 (0.001) 0.017 (0.001) 0.030 (0.001) 0.016 (0.001)
Instore flyer 0.945 (0.193) 0.584 (0.169) 1.425 (0.24) 0.655 (0.291)
Online radio -0.587 (0.047) -0.319 (0.044) -0.842 (0.058) -0.537 (0.071)
Signing on shelf 2.286 (0.365) 2.027 (0.333) 2.502 (0.476) 2.463 (0.537)
Second placing -0.006 (0.317) 1.209 (0.291) -0.101 (0.418) 0.092 (0.473)
Openings flyer -0.612 (0.223) -1.829 (0.212) -0.535 (0.293) -1.136 (0.310)
Rain (x1000) 0.064 (0.008) 0.090 (0.009) 0.051 (0.011) 0.089 (0.013)
Temperature (x1000) -0.123 (0.028) 0.000 (0.000) -0.225 (0.037) 0.000 (0.000)
Seasonality 0.285 (0.015) 0.343 (0.015) 0.319 (0.019) 0.306 (0.014)
Franchise dummy -0.052 (0.022) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Percentage high education 0.000 (0.000) 0.195 (0.077) 0.000 (0.000) 0.000 (0.000)
Region East 0.000 (0.000) 0.188 (0.061) 0.000 (0.000) 0.000 (0.000)
Region South 0.000 (0.000) 0.285 (0.049) 0.000 (0.000) 0.000 (0.000)
Percentage high social class 0.000 (0.000) 0.000 (0.000) 0.312 (0.130) 0.000 (0.000)
Percentage regular visitor competitor 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.312 (0.096)
45
Table C.6: The table presents the coefficients (and standard errors) of the variables of
the panel data regression models for Water on store-item level for all clusters to declare
the dependent variable of price elasticities. Note, that the significance level is set to 5%.
Ascension day 0.026 (0.004) 0.020 (0.005) 0.040 (0.005) 0.019 (0.008)
Back to school -0.005 (0.004) -0.020 (0.004) -0.011 (0.005) 0.006 (0.008)
Carnival -0.006 (0.005) -0.032 (0.006) -0.002 (0.006) -0.009 (0.010)
Easter 0.015 (0.004) 0.015 (0.004) 0.011 (0.005) 0.023 (0.007)
Father’s day -0.007 (0.004) -0.006 (0.004) -0.004 (0.005) -0.011 (0.008)
Autumn -0.009 (0.004) -0.001 (0.004) 0.009 (0.005) -0.008 (0.008)
Krokus 0.014 (0.004) -0.003 (0.004) 0.012 (0.005) 0.009 (0.008)
Liberation day 0.006 (0.004) 0.016 (0.005) -0.023 (0.005) -0.002 (0.008)
New year -0.025 (0.004) -0.027 (0.005) -0.041 (0.005) -0.030 (0.007)
Pre-Easter 0.018 (0.004) 0.021 (0.004) 0.013 (0.005) 0.019 (0.007)
Pre-Pentecost 0.027 (0.004) 0.041 (0.005) 0.012 (0.005) 0.032 (0.009)
Pre-Christmas 0.058 (0.008) 0.073 (0.009) 0.093 (0.010) 0.056 (0.015)
Queen’s day 0.018 (0.004) 0.016 (0.005) 0.031 (0.005) 0.001 (0.008)
Mother’s day -0.017 (0.006) -0.039 (0.006) -0.005 (0.007) -0.020 (0.012)
Olympus -0.001 (0.007) -0.013 (0.008) 0.021 (0.009) 0.042 (0.016)
Pentecost 0.023 (0.004) 0.025 (0.004) 0.029 (0.005) 0.028 (0.008)
Sinterklaas 0.012 (0.004) 0.045 (0.004) 0.001 (0.005) 0.017 (0.007)
Christmas 0.045 (0.006) 0.066 (0.007) 0.031 (0.007) 0.046 (0.011)
Price ratio competition -0.065 (0.032) -0.218 (0.039) -0.105 (0.038) -0.320 (0.060)
Price ratio regular price -0.089 (0.041) 0.563 (0.048) -0.363 (0.050) 0.339 (0.081)
Price ratio substitute 1 0.079 (0.037) -0.310 (0.04) 0.508 (0.044) -0.530 (0.077)
Price ratio substitute 2 -0.365 (0.024) 0.086 (0.027) -0.770 (0.030) 0.187 (0.052)
Discount price -0.015 (0.006) 0.005 (0.003) 0.019 (0.004) -0.020 (0.009)
Trips 0.006 (0.000) 0.008 (0.000) 0.010 (0.000) 0.000 (0.000)
Instore flyer 2.570 (0.445) 3.033 (0.460) 3.045 (0.475) 2.084 (0.819)
Online radio -0.135 (0.018) -0.158 (0.020) 0.047 (0.023) -0.050 (0.035)
Signing on shelf -0.177 (0.027) -0.303 (0.03) 0.095 (0.033) -0.174 (0.057)
Second placing -0.172 (0.392) 0.762 (0.425) -0.398 (0.504) 0.773 (0.821)
Openings flyer -0.444 (0.388) -1.578 (0.421) -0.726 (0.494) -0.864 (0.811)
National folder -1.504 (0.434) -1.773 (0.446) -1.706 (0.462) -1.631 (0.792)
Rain (x1000) -0.028 (0.004) -0.019 (0.004) -0.053 (0.005) -0.036 (0.008)
Temperature (x1000) 0.625 (0.015) 0.583 (0.019) 0.483 (0.020) 0.909 (0.027)
Seasonality 0.224 (0.004) 0.234 (0.005) 0.238 (0.005) 0.229 (0.006)
No. competitive stores in 15km -0.007 (0.001) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
City-rural dummy 0.054 (0.011) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Franchise dummy 0.000 (0.000) 0.000 (0.000) -0.074 (0.028) 0.000 (0.000)
Self-scan dummy 0.000 (0.000) 0.000 (0.000) 0.061 (0.019) 0.000 (0.000)
46
Table C.7: The table presents the coefficients (and standard errors) of the variables of
the panel data regression models on store-item level for all product groups to declare the
dependent variable of price elasticities. Note, that the significance level is set to 5%.
Intercept 0.031 (0.054) 0.847 (0.097) 1.873 (0.037) -0.418 (0.082) -5.100 (0.044) -1.183 (0.024)
Ascension day 0.030 (0.003) 0.064 (0.004) 0.038 (0.003) 0.011 (0.003) 0.120 (0.006) 0.026 (0.003)
Back to school 0.009 (0.003) -0.003 (0.003) -0.024 (0.003) 0.004 (0.003) -0.056 (0.005) -0.012 (0.003)
Carnival 0.016 (0.003) 0.017 (0.004) 0.063 (0.004) -0.015 (0.004) -0.038 (0.006) -0.013 (0.003)
Easter 0.013 (0.003) 0.028 (0.003) 0.055 (0.003) -0.052 (0.003) 0.110 (0.005) 0.015 (0.002)
Father’s day -0.030 (0.003) 0.011 (0.003) -0.005 (0.003) -0.074 (0.003) 0.086 (0.005) -0.006 (0.003)
Autumn 0.025 (0.003) -0.037 (0.003) -0.047 (0.003) -0.017 (0.003) 0.057 (0.005) -0.004 (0.003)
Krokus -0.005 (0.003) 0.017 (0.003) 0.015 (0.003) 0.034 (0.003) 0.011 (0.005) 0.009 (0.002)
Liberation day -0.013 (0.003) 0.008 (0.004) 0.023 (0.003) -0.012 (0.003) 0.011 (0.005) 0.003 (0.003)
New year 0.029 (0.003) -0.054 (0.003) 0.061 (0.003) -0.043 (0.004) 0.009 (0.005) -0.034 (0.003)
Pre-Easter -0.012 (0.003) 0.034 (0.003) 0.113 (0.003) -0.059 (0.003) 0.128 (0.005) 0.019 (0.002)
Pre-Pentecost -0.023 (0.003) 0.022 (0.004) 0.013 (0.003) 0.005 (0.004) 0.111 (0.006) 0.033 (0.003)
Pre-Christmas 0.044 (0.005) -0.010 (0.007) 0.072 (0.006) 0.069 (0.006) 0.085 (0.011) 0.082 (0.005)
Queen’s day -0.006 (0.003) 0.055 (0.004) 0.005 (0.003) -0.019 (0.004) 0.025 (0.005) 0.018 (0.003)
Mother’s day 0.050 (0.004) 0.077 (0.005) 0.007 (0.005) -0.016 (0.005) -0.007 (0.007) -0.026 (0.004)
Olympus 0.096 (0.004) 0.183 (0.006) 0.119 (0.004) 0.062 (0.004) -0.150 (0.007) 0.003 (0.005)
Pentecost 0.003 (0.003) 0.041 (0.003) 0.064 (0.003) 0.044 (0.003) 0.027 (0.005) 0.027 (0.003)
Sinterklaas 0.054 (0.003) -0.033 (0.003) -0.065 (0.003) -0.026 (0.003) 0.029 (0.005) 0.022 (0.003)
Christmas 0.059 (0.004) -0.003 (0.005) 0.039 (0.005) -0.023 (0.005) 0.138 (0.008) 0.046 (0.004)
Price ratio competition -0.258 (0.035) -0.955 (0.023) -0.292 (0.051) -0.231 (0.056) 0.762 (0.059) -0.162 (0.021)
Price ratio regular price 1.023 (0.043) -0.481 (0.038) -3.179 (0.065) 0.075 (0.052) 3.353 (0.078) 0.220 (0.027)
Price ratio substitute 1 -0.731 (0.010) -0.316 (0.015) -0.288 (0.033) 0.052 (0.028) 1.697 (0.025) -0.086 (0.023)
Price ratio substitute 2 -0.984 (0.022) -0.498 (0.013) 0.239 (0.021) -0.395 (0.027) -2.845 (0.025) -0.188 (0.016)
Price ratio complement 1 -0.017 (0.011) 0.000 (0.000) 0.530 (0.024) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Discount price -0.005 (0.000) 0.019 (0.002) 0.052 (0.001) 0.016 (0.002) 0.047 (0.004) 0.008 (0.002)
Trips 0.004 (0.000) 0.000 (0.000) 0.014 (0.000) 0.018 (0.000) 0.022 (0.000) 0.005 (0.000)
Instore flyer -0.362 (0.029) -1.014 (0.073) 0.177 (0.039) -0.347 (0.081) 0.841 (0.106) 3.166 (0.269)
Online radio 0.061 (0.020) 0.000 (0.000) 0.000 (0.000) 0.346 (0.022) -0.517 (0.026) -0.109 (0.012)
Signing on shelf 0.635 (0.026) 2.839 (0.064) -0.339 (0.093) -0.782 (0.074) 2.307 (0.205) -0.154 (0.013)
Openings flyer -1.779 (0.084) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) -1.086 (0.126) -0.827 (0.067)
Store radio 4.652 (0.326) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Entrance poster -2.965 (0.336) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Second placing 0.000 (0.000) 0.552 (0.027) 1.011 (0.081) 0.517 (0.068) 0.347 (0.178) 0.000 (0.000)
National folder 0.879 (0.082) -5.877 (0.194) 0.000 (0.000) -0.999 (0.119) 0.000 (0.000) -1.944 (0.261)
Dynamic newsletter 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 1.825 (0.044) 0.000 (0.000) 0.000 (0.000)
Rain (x1000) 0.013 (0.003) -0.066 (0.003) -0.049 (0.003) -0.007 (0.003) 0.079 (0.005) -0.035 (0.002)
Temperature (x1000) 0.405 (0.008) 0.630 (0.010) 0.000 (0.000) 0.282 (0.011) -0.040 (0.020) 0.709 (0.010)
Seasonality 0.152 (0.004) 0.220 (0.003) 0.135 (0.004) 0.088 (0.004) 0.283 (0.007) 0.217 (0.002)
Percentage high education -0.212 (0.077) -0.242 (0.051) -0.211 (0.052) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Percentage high income -0.323 (0.078) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Franchise dummy -0.060 (0.020) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) -0.043 (0.015) -0.036 (0.010)
Percentage non-western immigrants 0.593 (0.221) 0.839 (0.168) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.369 (0.096)
Percentage western immigrants -1.449 (0.225) -0.658 (0.212) 0.000 (0.000) -0.790 (0.144) 0.000 (0.000) 0.000 (0.000)
Region South 0.000 (0.000) -0.066 (0.016) -0.166 (0.014) 0.000 (0.000) 0.000 (0.000) -0.043 (0.009)
Average family size 0.000 (0.000) 0.102 (0.037) 0.000 (0.000) -0.144 (0.033) 0.000 (0.000) 0.000 (0.000)
Percentage unemployed 0.000 (0.000) 0.000 (0.000) -0.858 (0.363) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
City-rural dummy 0.000 (0.000) 0.000 (0.000) -0.036 (0.016) 0.000 (0.000) 0.000 (0.000) 0.044 (0.009)
Percentage regular visitor competitor 0.000 (0.000) 0.000 (0.000) -0.093 (0.037) 0.062 (0.028) 0.105 (0.033) 0.000 (0.000)
Percentage high social class 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.254 (0.046) 0.000 (0.000) 0.000 (0.000)
Surface in sqm 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) -0.011 (-0.002) 0.000 (0.000) -0.006 (0.001)
Self-scan dummy 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.036 (0.014) 0.043 (0.009)
R2 0.463 0.560 0.541 0.427 0.552 0.660
47
Table C.8: The table presents the coefficients (and standard errors) of the variables of
the panel data regression models on store-item level for all clusters to declare the
dependent variable of price elasticities. Note, that the significance level is set to 5%.
Ascension day 0.052 (0.003) 0.049 (0.003) 0.036 (0.004) 0.043 (0.005)
Back to school 0.006 (0.003) 0.007 (0.003) -0.018 (0.004) -0.009 (0.005)
Carnival -0.003 (0.004) -0.007 (0.003) 0.005 (0.005) 0.001 (0.006)
Easter -0.001 (0.003) 0.019 (0.003) 0.008 (0.004) 0.006 (0.004)
Father’s day -0.023 (0.003) -0.009 (0.003) -0.038 (0.004) -0.029 (0.005)
Autumn -0.006 (0.003) 0.003 (0.003) -0.006 (0.004) -0.007 (0.005)
Krokus 0.002 (0.003) -0.010 (0.003) 0.018 (0.004) 0.008 (0.005)
Liberation day -0.015 (0.003) -0.014 (0.003) -0.012 (0.004) -0.020 (0.005)
New year -0.004 (0.003) 0.004 (0.003) 0.015 (0.004) -0.006 (0.005)
Pre-Easter -0.023 (0.003) 0.044 (0.003) -0.028 (0.004) 0.002 (0.005)
Pre-Pentecost -0.009 (0.003) 0.024 (0.003) -0.013 (0.004) 0.005 (0.005)
Pre-Christmas 0.031 (0.006) 0.073 (0.006) 0.048 (0.008) 0.064 (0.009)
Queen’s day 0.015 (0.003) 0.030 (0.003) 0.013 (0.004) 0.008 (0.005)
Mother’s day 0.016 (0.004) 0.013 (0.004) 0.017 (0.006) 0.009 (0.007)
Olympus 0.030 (0.005) 0.019 (0.004) 0.044 (0.006) 0.028 (0.008)
Pentecost 0.044 (0.003) 0.036 (0.003) 0.053 (0.004) 0.038 (0.005)
Sinterklaas -0.027 (0.003) -0.001 (0.003) -0.002 (0.004) 0.005 (0.005)
Christmas 0.050 (0.004) 0.065 (0.004) 0.082 (0.006) 0.051 (0.007)
Price ratio competition -0.823 (0.054) -0.197 (0.053) -1.040 (0.070) -0.798 (0.095)
Price ratio regular price 1.847 (0.074) 1.303 (0.067) 2.197 (0.100) 1.762 (0.123)
Price ratio substitute 1 -1.083 (0.039) 0.316 (0.033) -1.183 (0.049) -0.387 (0.064)
Price ratio substitute 2 -0.371 (0.038) -1.165 (0.033) -0.411 (0.050) -0.823 (0.060)
Price ratio complement 1 -0.644 (0.027) -0.477 (0.023) -0.442 (0.034) -0.591 (0.044)
Discount price -0.002 (0.002) -0.011 (0.001) 0.002 (0.002) -0.007 (0.003)
Trips 0.013 (0.000) 0.002 (0.000) 0.011 (0.000) 0.006 (0.000)
Rain (x1000) -0.002 (0.001) -0.015 (0.003) -0.011 (0.003) -0.002 (0.001)
Temperature (x1000) 0.000 (0.000) 0.500 (0.010) 0.327 (0.015) 0.430 (0.015)
Seasonality 0.194 (0.005) 0.205 (0.005) 0.126 (0.006) 0.219 (0.006)
Instore flyer -0.446 (0.12) -0.469 (0.139) 0.000 (0.000) -0.786 (0.247)
Signing on shelf 1.227 (0.098) 0.191 (0.096) 0.958 (0.059) 0.686 (0.186)
Openings flyer 1.429 (0.695) 0.000 (0.000) 0.000 (0.000) -0.923 (0.26)
Dynamic newsletter 1.568 (0.138) 1.463 (0.158) 0.968 (0.067) 1.583 (0.266)
Entrance poster -1.998 (0.688) -13.168 (2.328) 0.000 (0.000) 0.000 (0.000)
National folder -1.203 (0.140) -0.640 (0.120) 0.000 (0.000) -0.551 (0.216)
Buy-1-get-1-free -0.618 (0.171) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Second placing 0.000 (0.000) 0.952 (0.130) 0.000 (0.000) 0.713 (0.237)
Online radio 0.000 (0.000) -0.244 (0.029) 0.000 (0.000) 0.000 (0.000)
Store radio 0.000 (0.000) 12.317 (2.326) 0.000 (0.000) 0.000 (0.000)
Franchise dummy -0.020 (0.008) -0.026 (0.010) 0.000 (0.000) 0.000 (0.000)
No. competitive stores in 15km -0.003 (0.001) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Percentage high education -0.105 (0.035) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Percentage unemployed -0.410 (0.202) 0.789 (0.202) 0.000 (0.000) 0.000 (0.000)
Self-scan dummy -0.026 (0.008) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Percentage regular visitor competitor -0.050 (0.023) 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Percentage high social class 0.000 (0.000) 0.000 (0.000) 0.125 (0.042) 0.000 (0.000)
Percentage non-western immigrants 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 1.276 (0.360)
City-rural dummy 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) 0.040 (0.023)
R2 0.622 0.649 0.651 0.802
48