0% found this document useful (0 votes)
40 views10 pages

Echo Chambers

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views10 pages

Echo Chambers

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Industry (SIRIP) Papers I SIGIR ’20, July 25–30, 2020, Virtual Event, China

Understanding Echo Chambers in E-commerce


Recommender Systems

Yingqiang Ge∗† Shuya Zhao∗ Honglu Zhou Changhua Pei


Rutgers University New York University Rutgers University Alibaba Group
yingqiang.ge@rutgers.edu sz2257@nyu.edu honglu.zhou@rutgers.edu changhuapei@gmail.com

Fei Sun Wenwu Ou Yongfeng Zhang


Alibaba Group Alibaba Group Rutgers University
ofey.sunfei@gmail.com santong.oww@taobao.com yongfeng.zhang@rutgers.edu

ABSTRACT KEYWORDS
Personalized recommendation benefits users in accessing contents E-commerce; Recommender Systems; Echo Chamber; Filter Bubble
of interests effectively. Current research on recommender systems
mostly focuses on matching users with proper items based on user ACM Reference Format:
Yingqiang Ge, Shuya Zhao, Honglu Zhou, Changhua Pei, Fei Sun, Wenwu
interests. However, significant efforts are missing to understand
Ou, and Yongfeng Zhang. 2020. Understanding Echo Chambers in E-commerce
how the recommendations influence user preferences and behav- Recommender Systems. In Proceedings of the 43rd International ACM SIGIR
iors, e.g., if and how recommendations result in echo chambers. Conference on Research and Development in Information Retrieval (SIGIR ’20),
Extensive efforts have been made in examining the phenomenon July 25–30, 2020, Virtual Event, China. ACM, New York, NY, USA, 10 pages.
in online media and social network systems. Meanwhile, there are https://doi.org/10.1145/3397271.3401431
growing concerns that recommender systems might lead to the
self-reinforcing of user’s interests due to narrowed exposure of
items, which may be the potential cause of echo chamber. In this 1 INTRODUCTION
paper, we aim to analyze the echo chamber phenomenon in Alibaba Recommender systems (RS) comes into play with the rise of on-
Taobao — one of the largest e-commerce platforms in the world. line platforms, e.g., social networking sites, online media, and e-
Echo chamber means the effect of user interests being reinforced commerce[16, 18, 19]. Intelligent algorithms with the ability to offer
through repeated exposure to similar contents. Based on the defini- personalized recommendations are increasingly used to help con-
tion, we examine the presence of echo chamber in two steps. First, sumers seek contents that best match their needs and preferences
we explore whether user interests have been reinforced. Second, in forms of products, news, services, and even friends[1, 49, 50]. De-
we check whether the reinforcement results from the exposure of spite the significant convenience that RS has brought, the outcome
similar contents. Our evaluations are enhanced with robust metrics, of the personalized recommendations, especially how it reforms
including cluster validity and statistical significance. Experiments social mentality and public recognition — which could potentially
are performed on extensive collections of real-world data consisting reconfigure the society, politics, labor, and ethics — remains unclear.
of user clicks, purchases, and browse logs from Alibaba Taobao. Extensive attention has been drawn at this front, thus arriving
Evidence suggests the tendency of echo chamber in user click be- at the two coined terms, echo chamber and filter bubble. Both ef-
haviors, while it is relatively mitigated in user purchase behaviors. fects might occur after the use of personalized recommenders and
Insights from the results guide the refinement of recommendation entail far-reaching implications. Echo chamber describes the ris-
algorithms in real-world e-commerce systems. ing up of social communities who share similar opinions within
the group [41], while filter bubble [36], as the phenomenon of an
CCS CONCEPTS overly narrow set of recommenders, was blamed for isolating users
• Information systems → Recommender systems; Web log in information echo chambers [1].
analysis; Test collections. Owing to the irreversible and striking impact that the internet
has brought on the mass communication, echo chamber and filter
∗ Co-first authors with equal contributions. bubble are appearing in online media and social networking sites,
† This work was done when Yingqiang Ge worked as an intern in Alibaba. such as MovieLens [33], Pandora [1], YouTube [23], Facebook [37],
and Instagram [39]. Significant research efforts have been put for-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed ward in examining the two phenomena in online media and social
for profit or commercial advantage and that copies bear this notice and the full citation networks [4, 6, 7, 14, 20, 30]. Recently, researchers have concluded
on the first page. Copyrights for components of this work owned by others than the that the decisions made by RS can influence user beliefs and pref-
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission erences, which in turn affect the user feedback, e.g., the behavior
and/or a fee. Request permissions from permissions@acm.org. of click and purchase received by the learning system, and this
SIGIR ’20, July 25–30, 2020, Virtual Event, China kind of user feedback loop might lead to echo chamber and filter
© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-8016-4/20/07. . . $15.00 bubbles [26]. On the other hand, the two concepts are not isolated,
https://doi.org/10.1145/3397271.3401431 since filter bubble is a potential cause of echo chamber [1, 12].

2261
Industry (SIRIP) Papers I SIGIR ’20, July 25–30, 2020, Virtual Event, China

In this work, we are primarily concerned with the existence personalized recommenders would fragment users, making like-
and the characteristics of echo chamber in real-world e-commerce minded users aggregate [41]. The existing views or interests of these
systems. We define echo chamber as the effect of users’ interests users would be reinforced and amplified since “group polarization
being reinforced due to repeated exposure to similar items or cate- often occurs because people are telling one another what they
gories of items, thereby generalizing the definition in [26]. This is know” [41, 42]. Pariser later described filter bubble, as the effect
because users’ consuming preferences are so versatile and diverse of recommenders making users isolated from diverse content, and
that cannot simply be classified into positive or negative directions trapping them in an unchanging environment [36]. Though both
as what it looks like in political opinions [40]. Based on the above are concerned with the malicious effect that recommenders would
definition of echo chamber, we formulate the research in two steps pose, echo chamber emphasizes the polarized environment, while
by answering the following two related research questions: filter bubble lays stress on the undiversified environment.
• RQ1: Does the recommender system, to some extent, rein- Researchers are expressing their concerns of the two effects, and
force user click/purchase interests? attempting to formulate a richer understanding of the potential
• RQ2: If user interests are indeed strengthened, is it caused characteristics [4, 10, 23, 32, 39]. Considering echo chamber as a
by RS narrowing down the scope of items exposed to users? significant threat to modern society as they might lead to polariza-
tion and radicalization [11], Risius et al. analyzed news “likes” on
To measure the effect of recommender systems on users, we first Facebook, and distinguished different types of echo chambers [37].
follow the idea introduced in [33] and separate all users into cate- Mohseni et al. reviewed news feed algorithms as well as methods
gories based on how often they actually “take” the recommended for fake news detection and focused on the unwanted outcomes
items. This separation helps us to compare recommendation fol- of echo chamber and filter bubble after using personalized content
lowers against a controlled group, namely, the recommendation selection algorithms [30]. They argued that personalized newsfeed
ignorers. The remaining problem is how to measure the effect of might cause polarized social media and the spread of fake content.
echo chamber on each group. Users in social network platforms Another genre of research aims to clear up strategies to mitigate
have direct ways to interact with other users, potentially through the potential issues of echo chamber and filter bubble, or design
actions of friending, following, commenting, etc [38]. A similar new recommenders to alleviate such effects [2, 3, 13, 17, 22, 35].
analogy is that users in the recommender system could interact Badami et al. proposed a new recommendation model for com-
with other users indirectly through the recommendations offered bating over-specialization in polarized environments after finding
by the platform, since recommendation lists are usually generated that matrix factorization models are easier to learn in polarized
as a result of considering the user’s previous preferences and the environments, and in turn, encourage filter bubbles that reinforce
preferences of similar users (𝑖.𝑒., collaborative filtering). Due to the polarization. Tintarev el al. attempted to use visual explanations,
absence of an explicit network of user-user interaction, which is i.e., chord diagrams and bar charts, to address the problems [43].
naturally and commonly provided in social networks, we decide to There is a certain amount of work focusing on the detection or
measure echo chamber in e-commerce at the population level. This measuring of echo chamber and filter bubble, questioning whether
is because users who share similar interaction records (𝑒.𝑔., clicking they do exist [1, 4, 15, 27, 31, 33]. For example, Hosanagar et al.
the same products) will be closely located in a latent space, and the used data from an online music service, trying to find out whether
cluster of these users in that space, along with its temporal changes, personalization is, in fact, fragmenting the population, and con-
could serve as signals to detect echo chamber. Finally, we measure cluded that it does not [24]. They claimed personalization is a tool
the content diversity in recommendation lists for each group to see that helps users widen their interests, which in turn creates com-
whether recommender system narrows down the scope of items monality with others. Sasahara et al. suggested echo chambers are
exposed to users, so as to answer RQ2. somewhat inevitable given the mechanisms at play in social media,
The key contributions of our paper can be summarized as follows: specifically, the basic influence and unfriending [38]. Their simu-
• We study echo chamber effect at a population level by im- lation dynamics showed that the social network rapidly devolves
plementing clustering on different user groups, and measure into segregated, homogeneous, and polarized communities, even
the shifts in user interests with cluster validity indexes. with the minimal amount of influence and unfriending.
• We design a set of controlled trials between recommendation Despite the reasonableness of prior works, severe limitations do
followers and ignorers, and employ a wide range of technical exist, making the claims only plausible. One major aspect is that
metrics to measure the echo chamber effect to provide a most of the existing works draw conclusions by means of simula-
broader picture. tion, or relying on some self-defined networks and measurements
• We conduct our experiments based on real-world data from with simplified dynamics [6, 9, 15, 20, 26, 31]. Building upon the
Alibaba Taobao — one of the largest e-commerce platforms subjective assumptions, whether the modeling and analysis have
in the world. Our analytical results, grounded with reliable the capability to reflect the truth seems to be dubious [37]. On the
validity metrics, suggest the tendency of echo chamber in other hand, many of the prior works confound the meaning of the
terms of the user click behaviors, and relatively mitigated two effects or solely examine one of them without considerations of
effect in user purchase behaviors. the other [17, 23, 27, 28, 37]. Exceptions such as [26], disentangles
echo chamber from filter bubble, but suffers from the previous-
2 RELATED WORK mentioned deficiency, i.e., reliance on simulation and simplified
Today’s recommender systems are criticized for bringing dangerous artificial settings. With the desire to address the limitations, we aim
byproducts of echo chamber and filter bubble. Sunstein argued that to explore the existence of echo chamber in real-world e-commerce

2262
Industry (SIRIP) Papers I SIGIR ’20, July 25–30, 2020, Virtual Event, China

systems while investigating filter bubble via differentiating it as the


potential cause of echo chamber. To the best of our knowledge, this
is the first work that utilizes real-world data of recommendation
and user-item interaction from a large-scale real-world e-commerce
platform, with solid validity metrics, instead of from the artificial
well-controlled experimental settings. We do not induce any prior
assumptions that might be unreliable. We draw analysis from a
latent space without relying on any explicit pre-built networks.
3 DATA COLLECTION AND ANALYSIS
3.1 Data Collection Figure 1: The users sorted by PVR from the lowest to the
We first collected 86, 192 users’ five months accessing logs spanning highest score. Each x-axis index refers to a unique user. The
Jan. 1, 2019 to May 31, 2019, from Taobao. There exist three types two split points represent 20% (blue) and 80% (red) users.
of accessing logs: browse log, click log, and purchase log.
the effect of “taking” recommendations, we need to classify the
Browse Log records the recommendations that have been browsed
users in the dataset into users who “follow” recommendations and
for each user, including the timestamp, page view id, user id, rec-
users who do not. For consistency, in this paper, we call the ones
ommended item id, item position in the recommendation list, and
who “follow” recommendations as the Following Group and those
whether it was clicked or not. The recommended items per page
who do not as the Ignoring Group. Inspired by the experiment setting
are known as a page view (PV) in e-commerce.
of [33], we draw comparisons between two groups of consumers —
Click Log is used to record each user’s click behaviors in the
the Following Group and the Ignoring Group— to explore the effect
whole electronic market, which means that user click behaviors
of recommendation systems on users.
outside the recommender list will also be recorded (i.e., the user
The first step is to classify users into the two groups, and there
may initiate a search and click an item). It includes the timestamp,
are several different approaches to this classification. One straight-
PV id, user id, user profile, clicked item id, and the item price.
forward approach is to use the ratio between the number of clicked
Purchase Log is in the same format as the click log, except it is
items and the number of browsed items. We can calculate this ratio
used to record users’ purchase behaviors. Besides, since purchase
for each user 𝑖 based on his or her browsing history. However, one
is a more expensive action than click for users, the sparsity of the
extreme case is that user 𝑖 only viewed each item once, and the user
purchase log is much higher than that of the click log.
clicked most of them, but never came back to the platform to use
We extract dataset with the above components (browse log, click
the recommendation system again. This will classify the user into
log, and purchase log) for the following three reasons:
the Following Group because the ratio is close to one but he/she is
• E-commerce is a complicated scenario, where users have misclassified because he/she actually abandoned the recommender
multiple types of interactions with items, such as browsing, system. We can see that this approach cannot help to investigate
clicking, purchasing, and rating. The effect of RS might have the long-term influence of RS on consumers.
a different influence on them, because different interactions In order to fulfill the need for long-term observation, we adopt
induce different costs for users, e.g., browsing and clicking the “block” idea [33] into our classification task. We first identify
merely cost time, while purchasing costs time and money. clicked PV, which is a recommendation list on a single page where
• Clicking and purchasing are used to represent consumer’s the consumer clicked at least one item displayed in it. Then, we
implicit feedback in RS, which contains rich information compute the number of clicked PVs over the total number of all
about consumer preferences. However, browsing contains PVs for a given consumer, and we define this ratio as PVR (namely,
much more noise as most of the browsed items may be in- page view ratio). The intuition behind this design is that we believe
different to a user, which means the user neither likes nor the effect that RS imposes on consumers depends on the frequency
dislikes the item. that consumers are exposed to it, as well as consumers’ responses
• In RS, ratings are user’s explicit feedback to items. However, to the recommendations (𝑒.𝑔., clicks). Once we have calculated the
since the platform automatically fulfills the rating score to 5 PVRs for all users, we sort the users from the lowest to the highest
stars (the highest score) if a user purchased an item without PVR scores, and the result is shown in Figure 1. Based on this figure,
leaving a rating, so the ratings may not reflect users’ true we define users who took recommendations in at most 20% of their
preference, thus, we do not use ratings in this work. PVs as the Ignoring Group, and users who took recommendations
We further remove each user’s logs in the first two months to in at least 80% of their PVs as the Following Group, which gives us
make sure that all users have enough time to get familiar with the e- 6,183 followers and 6,979 ignorers in total.
commerce platform, and that the RS receives sufficient interactions
from consumers to understand their preferences well. 3.3 User Interaction Blocks
In order to examine the temporal effect of recommender systems
3.2 Identifying Recommendation Takers on users, we divide the user-item interaction history (could be
The purpose of our work is to explore the temporal effect of recom- 𝑏𝑟𝑜𝑤𝑠𝑒, 𝑐𝑙𝑖𝑐𝑘, or 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒) into discrete intervals for each user. We
mendation systems on users, especially to investigate the existence follow the similar “blocks of interaction” idea in [33] to divide the
and the characteristics of the echo chamber effect. In order to study interaction history of a user into intervals, which is used to make

2263
Industry (SIRIP) Papers I SIGIR ’20, July 25–30, 2020, Virtual Event, China

sure that all users have the same amount of interactions with the Click Purchase Browse
recommender system throughout an interaction block.
Following group 5, 025 2, 099 5, 507
We define an interval as a block consisting of 𝑛 consecutive
interactions, where 𝑛 is a constant decided by experimental studies. Ignoring group 2, 452 1, 458 1, 910
Moreover, different interactions occur at different frequencies, for All users 7, 477 3, 557 7, 417
example, the number of browsed items is much higher than the
Table 1: Statistics of each user group.
number of clicked items, and the number of clicked items is again
higher than the purchased items, indicating that the length of the 4 MEASURES FOR ECHO CHAMBERS
block (i.e., 𝑛) may vary based on the corresponding interactions. We
To answer RQ1, we propose to study the reinforcement in user
primarily set the length of the interval as 𝑛 = 200 for browsed items
interests at a population level. The pattern of reinforcement could
(named as a browsing block), 𝑛 = 100 for clicked items (named
happen in a simple scenario, where some members highly support
as a clicking block), and 𝑛 = 10 for purchased items (named as a
one opinion, while others believe in a competing opinion. The
purchasing block). If there are not enough interactions to constitute
phenomenon is reflected as the dense distribution on the two sides
the last block, we will drop it to make sure that all blocks have the
of the opinion axis. However, user’s interest in e-commerce is much
same number of interactions. Meanwhile, to ensure the temporal
more complicated, such that it cannot be simply classified into
effect of RS, we only keep those users who have at least three
positive and negative. What we observe is that users can congregate
intervals in the three months, for each type of user-item interaction.
into multiple groups in terms of distinct preferences. As a result,
Finally, as shown in Table 1, we have 7,477 users to examine the
we implement clustering on user embeddings and measure the
echo chamber effect on click behaviors, 3,557 users for purchase
change in user interests with cluster validity indexes (more details
behaviors, and 7,417 users for browsing behaviors.
can be found in Section 4.2). We measure the changes in terms of
Considering the browse log is potentially noisy (i.e., indifferent
clustering on the embeddings at the beginning and at the end of
to a user, see Section 3.1), as well as the fact that clicking and
the user interaction record, i.e., we compute the user embeddings
purchasing are commonly used to represent users’ implicit feedback
respectively for the first and the last interaction block and measure
to RS, in the following, we use click and purchase behaviors to
the changes. To be clear, we refer to these two blocks as the “first
represent users’ preferences on the items (while we detect the echo
block” and the “last block”.
chamber effect), and examine the temporal changes in content
To answer RQ2, we propose to measure the content diversity
diversity of recommended items via browsing behaviors (which
in recommendation lists at the beginning and the end, and more
may be the potential cause of echo chamber).
details can be found in Section 4.3. We examine whether there exists
a trend that recommendation systems narrow down the contents
3.4 User Embeddings provided to the users. Before we cluster the Following Group and
the Ignoring Group, respectively, we need to know whether the
As the user interests are closely related to the user interactions
two groups are clusterable and what is the appropriate number of
with items, we argue that the items that the user clicked can reflect
clusters for each of the group. Thus, we first examine the clustering
his/her click interests, and the items that the user purchased can
tendency and select the proper clustering settings, which will be
represent his/her purchase interests. However, only using discrete
introduced in Section 4.1.
indexes to denote the interacted items is not sufficient to represent
user interests since we cannot know the collaborative relations 4.1 Measuring Clusters
between different items only based on the indexes. Following the
basic idea of collaborative filtering, we use the user-item interaction 4.1.1 Clustering Tendency.
information to train an embedding model. Assessing clustering tendency is employed to evaluate whether
Items are encoded into item embeddings based on one of the there exist meaningful clusters in the dataset before applying clus-
state-of-the-art models [46]. To cluster and compare the items for tering methods. We use Hopkins statistic (H) [5, 29] to measure
different users at different times, we need to guarantee that the the tendency since it can examine the spatial randomness of the data
item embeddings are stable across the period of time that is under by testing the given dataset with a uniformly random-distributed
investigation. For this purpose, the embeddings are trained on all dataset. The value of H is from 0 to 1. A result close to 1 indicates
of the collected data until May 31, 2019, which is the last day that a highly clustered dataset, while a result around 0.5 indicates that
our dataset contains. This is for two reasons: (1) since the training the data is random.
data contains all of the user-item interactions, it helps to learn more Let 𝑋 ∈ 𝑅 𝐷 be the given dataset of 𝑁 elements, and 𝑌 ∈ 𝑅 𝐷 is
accurate embeddings; and (2) since the training procedure includes the uniformly random dataset of 𝑀 (𝑀 ≪ 𝑁 ) elements with the
all items under consideration, we can guarantee that all embed- same variation as 𝑋 . Then we get a random sample {𝑥 1𝐷 , 𝑥 2𝐷 , . . . , 𝑥 𝑀
𝐷}

dings are learned in the same space. After that, we use the average from X. And 𝑠𝑖𝐷 and 𝑡𝑖𝐷 are the distances from 𝑥𝑖𝐷 and 𝑦𝑖𝐷 to their
pooling on the item embeddings to compute the user embeddings. nearest neighbor in 𝑋 , respectively. Hopkins statistic is computed
Specifically, we use the average of the item embeddings of items as the following:
Í𝑀 𝐷
that the user clicked (or purchased) within a user-item interaction 𝑖=1 𝑡𝑖
𝐻 = Í𝑀 Í𝑀 𝐷 (1)
block to represent the user’s click preferences (or purchase prefer- 𝑖=1 𝑠𝑖 + 𝑖=1 𝑡𝑖
𝐷

ences) during a certain period of time. In this way, user embeddings The results are shown in Table 3. Hopkins statistic examines
and item embeddings are in the same representation space. the datasets before applying further measurement to them. The

2264
Industry (SIRIP) Papers I SIGIR ’20, July 25–30, 2020, Virtual Event, China

characteristics of a clusterable dataset (𝐻 > 0.5) can be observed the same group at different times under the same setting, and a
under its optimal setting in K-means clustering. Then we can select higher score indicates a better clustering. Let 𝑁 denote the size
the proper number of clusters for each user group. of the dataset {𝑥 1, . . . , 𝑥 𝑁 }, and 𝐾 denote the number of clusters.
The centroids of clusters are denoted as 𝐶𝑖 , 𝑖 = 1, 2, ..., 𝐾. For a data
4.1.2 Clustering Settings.
point 𝑥 𝑗 , it belongs to a cluster 𝑝 𝑗 and we have the correspond-
We use the Bayesian Information Criterion (BIC) to determine
ing cluster centroid 𝐶𝑝 𝑗 , where 𝑗 = 1, 2, ..., 𝑁 ,𝑝 𝑗 = 1, 2, ..., 𝐾. The
the number of clusters for each group. Due to the high-dimensional
Calinski-Harabasz index is thus calculated as follows:
characteristics of user embeddings, it is hard to choose the optimal
𝑆𝑆𝐵𝐾 (𝑁 − 𝐾)
𝐾 (i.e., the number of clusters) via some common k-selection tech- 𝐶𝐻𝐾 = · (3)
𝑆𝑆𝑊𝐾 (𝐾 − 1)
niques, like elbow method or average silhouette method. To deal
with it, we use model selection technique to compare the cluster- 𝑁 𝐾
∥𝑥 𝑗 − 𝑐 𝑝 𝑗 ∥ 2 , 𝑆𝑆𝐵𝐾 = ∥𝑐𝑖 − 𝑋¯ ∥ 2 , and 𝑋¯ repre-
Í Í
where 𝑆𝑆𝑊𝐾 =
ing results under different 𝐾s and we choose BIC, which aims to 𝑗=1 𝑖=1
select the model with maximum likelihood. Its revised formula for sents the mean of the whole dataset. Based on this definition, an
partition-based clustering suits our tasks well and there is also a ideal clustering result means that elements within a cluster con-
penalty term avoiding overfitting in the formula. gregate and elements in-between clusters disperse, leading to a
high 𝐶𝐻𝐾 value. Intuitively, we can assign users into clusters, and
𝐾
Õ  𝑛𝑖 𝑛𝑖 𝐷 log 2𝜋 Σ 𝐷 (𝑛𝑖 − 1)  𝐾 (𝐷 + 1) log 𝑁 calculate the CH index for this clustering result based on the user
𝐵𝐼𝐶 = 𝑛𝑖 log − − − (2)
𝑖=1
𝑁 2 2 2 embeddings. After a certain period of time, the user embeddings
𝐾 𝑛𝑖 would change due to the user’s new interactions during the time,
where the variance is defined as Σ = 𝑁 −𝐾 1 Í Í 𝑥 −𝑐 and we can use the new embeddings to calculate the CH index
𝑗 𝑖 2.
𝑖=1 𝑗=1 without changing the users’ cluster assignment. By comparing the
The 𝐾-class clustering has 𝑁 points 𝑥 𝑗 ∈ 𝑋 𝐷 , 𝑐𝑖 is the center of CH index before and after the user embeddings change, we will be
the 𝑖-th cluster with the size of 𝑛𝑖 , 𝑖 = 1, . . . , 𝐾. BIC evaluates the able to evaluate to what extent the user preferences have changed
likelihood of different clustering settings. In our case, we use BIC to (see Figure 2, details to be introduced later). Furthermore, based on
determine the number of clusters (𝑖.𝑒., K). The 𝐾 of the maximum the user IDs, we can track how each user’s preference changed in
BIC is the optimal number of clusters. We pick the corresponding the latent space.
𝐾 (𝑖.𝑒., 𝐾 ∗ ) of the first decisive local maximum (𝑖.𝑒., 𝐵𝐼𝐶 ∗ ).
4.2.2 External Validity Indexes.
4.2 Measuring Reinforcement of User Interests We use external validity indexes to measure the similarity between
the “first block” and the “last block” embeddings in terms of cluster-
We use cluster validity [44] to compare the user embeddings of
ing. This kind of indexes utilizes the ground truth class information
two user groups, and observe the changes in clustering through
to evaluate the clustering results. The clustering result close to the
different months. Originally, this technique is known as the pro-
optimal clustering has high index scores. In other words, the exter-
cedure to evaluate how the clustering algorithm performs on the
nal indexes compute the similarity between the given clustering
given datasets. The process evaluates the results on different param-
and the optimal clustering.
eter settings via a set of cluster validity indexes. These indexes for
External validity indexes are constructed on the basis of con-
cluster validation can be grouped into two types, internal indexes
tingency tables [48]. It is built as a matrix containing the inter-
(Section 4.2.1), and external indexes (Section 4.2.2). The external in-
relation between two partitions on a set of 𝑁 points. Partitions,
dexes are based on the ground-truth clustering information, which
P = {𝑃1, 𝑃2, . . . , 𝑃𝐾1 } of 𝐾1 clusters and Q = {𝑄 1, 𝑄 2, . . . , 𝑄 𝐾2 } of
is not always available for a dataset. On the contrary, the internal
𝐾2 clusters, give us a 𝐾1 × 𝐾2 contingency table (𝐶𝑃𝑄 ) consisting
index can evaluate the clustering without knowing the optimal
the number of common points (𝑛𝑖 𝑗 ) in 𝑃𝑖 and 𝑄 𝑗 as follows:
classification. We choose each of them to measure the temporal
changes in clustering in both user groups.  𝑛 11 𝑛 12 ··· 𝑛 1𝐾2 

 𝑛 21 𝑛 22 ··· 𝑛 2𝐾2 
4.2.1 Internal Validity Indexes. 𝐶𝑃𝑄 =  .
 
(4)
.. .. .. 
A good clustering algorithm is required to satisfy several valid  ..
 . . . 

properties, such as compactness, connectedness, and spatial separa- 𝑛𝐾 1 𝑛𝐾 2 · · · 𝑛𝐾 𝐾 
 1 1 1 2
tion [21]. One type of internal indexes is to evaluate to what extent
where 𝑃𝑖 and 𝑄 𝑗 are the clusters in P and Q with the size of 𝑝𝑖 and
the clusters satisfy these properties, and a prominent example is 𝐾
Í2
the Calinski-Harabasz index [8]. Another type is applied to crisp 𝑞 𝑗 , 𝑖 = 1, 2, . . . , 𝐾1 , 𝑗 = 1, 2, . . . , 𝐾2 . Therefore, we have 𝑝𝑖 = 𝑛𝑖 𝑗 ,
clustering or fuzzy clustering [34, 47]. Since we want to explore 𝑗=1
how user interests shift at the population level, we apply the for- 𝐾
Í1 𝐾
Í1 𝐾Í2
𝑞𝑗 = 𝑛𝑖 𝑗 , and 𝑛𝑖 𝑗 = 𝑁 .
mer type of internal indexes on the clustering results of the user 𝑖=1 𝑖=1 𝑗=1
embeddings, in order to detect the polarization tendency in user The techniques of comparing clusters for external validation
preferences by tracking how the index changes over time. are divided into three groups, pair-counting, set-matching, and
Calinski-Harabasz (𝐶𝐻𝐾 ) index scores the clustering consider- information-theoretic [45]. We use Adjusted Rand Index (ARI) [25],
ing the variation ratio between the sum-of-squares between clusters which is a pair-counting index that counts the pairs of data points
(𝑆𝑆𝐵𝐾 ) and the sum-of-squares within clusters (𝑆𝑆𝑊𝐾 ) under 𝐾- on which two clusters are identical or dissimilar. In this way, we
class clustering. Based on this, we can compare the clustering of can evaluate the portion of users shifting to another cluster on

2265
Industry (SIRIP) Papers I SIGIR ’20, July 25–30, 2020, Virtual Event, China

Dataset Statistic Numerical Values


K-means clustering over time. As a representative pair-counting
based measure, ARI satisfies the three crucial properties of a clus- Num of click logs 7, 386, 783
tering comparison measure [45]: metric property, normalization, Click Log Num of users 7, 477
Num of items 2, 686, 591
and constant baseline property. It can be computed as follows:
𝑛𝑖 𝑗  𝑝𝑖 Í 𝑞 𝑗   Num of purchase logs 98, 135
]/ 𝑁2
Í Í
2
−[ 2 2 Purchase Log Num of users 3, 557
𝑖𝑗 𝑖 𝑗
𝐴𝑅𝐼 = 1 𝑝𝑖  𝑞𝑗  Í Í 𝑞 𝑗  𝑁  (5) Num of items 71, 973
− [ 𝑝2𝑖
Í Í
2 [ 2 + 2
] 2
]/ 2
𝑖 𝑗 𝑖 𝑗
Num of browse logs 6, 225, 301
ARI measures the similarity between two different clusterings Browse Log Num of users 7, 417
(on two datasets) with the same clustering setting (𝐾) for the same Num of items 5, 077, 268
user group in our experiment. Traditionally, it evaluates the dif- Table 2: Statistics of experiment data.
ferent clusterings under different clustering settings on the same
dataset. The similarity between embeddings of the “first block” and 5.1 Data Description
the “last block” shows how much the clustering changes under the As what we have introduced in Section 3, experiments are per-
same 𝐾. The higher the ARI score, the higher the similarity between formed on an extensive collection of real-world data with user click,
the clusterings at the beginning and the end, and fewer changes purchase, and browse logs . The code of our entire experiment is re-
in the latent space. The difference in ARI helps us to understand leased in our GitHub repository 1 . Meanwhile, after completing the
how the Following Group acts under the influence of RS. Since the data pre-processing in Section 3, 𝑒.𝑔., identifying recommendation
distribution of user interest embeddings would change under the followers and creating user-item interaction blocks, the detailed
influence of external conditions, such as sales campaigns and the statistics for the final dataset regarding each type of user-item in-
release of new products, the group that has fewer changes implies teraction are shown in Table 2. Note that the Following Group (i.e.,
stable interests in certain types of items, showing the tendency of recommendation takers) and the Ignoring Group differ in size after
reinforcement. Unlike the comparison in 𝐶𝐻𝐾 , a higher ARI implies the above pre-processing procedure. Since the size of the dataset
that the preferences of the Following Group is relatively reinforced. would affect cluster analysis results, we need to resize the larger
user group — Following Group — in all types of user actions, to avoid
4.3 Measuring the Changes of Content Diversity the influence. As a result, we sample from the Following Group to
To answer the second question, we measure the content diversity make sure the Following Group and the Ignoring Group have an
in recommendation lists. To detect how diverse a list of recommen- equal amount of users. We apply this operation (denoted as resize-
dations is, we compute the pairwise distance of item embeddings sampling) to each kind of user interaction logs and generate three
and use the average of the item distance to represent the content pairs of equal-sized user groups for our experiments.
diversity of the list [33]. We collect the first 𝑁 items recommended Additionally, we take a random sample of 𝑝% users (𝑝-sampling)
to users as the first recommendation list, and the last 𝑁 items as from the dataset in each computation, and repeat every experiment
the last recommendation list. The Euclidean distance is computed 50 times to calculate the statistical significance for each measure.
between two item embeddings with the dimension of 𝐷 . Let 𝑣𝑖 be We use 10% for 𝑝-sample in Hopkins statistic, which is large enough
to represent the distribution of a dataset. We use the ratio of 80%
the vector of item embedding, 𝑣𝑖 ∈ 𝑅 𝐷 , 𝑖 = 1, . . . , 𝑁 , and 𝑣𝑖𝑑 is the
for other indexes. We calculate the average of the 50 experiments
value of 𝑣𝑖 on the 𝑑-𝑡ℎ dimension. Then we have the distance:
v as the final result. As a result, for the Following Group, it takes two
steps of samples – resize-sampling and 𝑝-sampling – before each
u
t 𝐷
Õ
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑣𝑖 ,𝑣 𝑗 = (𝑣𝑖𝑑 − 𝑣𝑑𝑗 ) 2 (6) computation. For instance, we need to compute cluster indexes for
𝑑=1 the Following Group on the click logs 50 times. For each time, we
The smaller the distance, the smaller the difference between the resize the Following Group into 2452 users and then take another
two items. We take the average of the pairwise Euclidean distance sample of 80%. The final size would be 1962 users, which is the
within the block to measure its content diversity [51], and then same as the size of the Ignoring Group after 𝑝-sample.
utilize the temporal changes in content diversity to examine the
effect of the recommender system on different user groups. 5.2 Clustering Settings
5.2.1 Measuring the Cluster Tendency. We examine the clustering
5 ANALYZING ECHO CHAMBER tendency based on Hopkins statistic for the user embeddings of both
In this section, we first present the description of data after pre- Following Group and Ignoring Group, and the results are shown in
processing in Section 5.1. Then, we present our clustering settings Table 3. Our results show that both groups of user embeddings are
(i.e., the number of clusters selected) in Section 5.2. We measure clusterable (𝐻 > 0.5), so cluster analysis could generate meaningful
the cluster validity to examine the reinforcement in the Following results in the following experiments. Meanwhile, we also observe
Group to answer RQ1 in Section 5.3. Moreover, we evaluate the that the clustering tendency for each group is not stable, and we
shifts in content diversity of recommendation lists so as to answer utilize 𝑝-value of the 𝑡-test to examine its statistical significance
RQ2. In the following, we refer to the user embedding based on since the temporal change of 𝐻 score is quite small.
clicked and purchased items as “click embedding” and “purchase
embedding” respectively for simplicity. 1 https://github.com/szhaofelicia/EchoChamberInEcommerce

2266
Industry (SIRIP) Papers I SIGIR ’20, July 25–30, 2020, Virtual Event, China

(a) Click – Following Group (b) Click – Ignoring Group (c) Purchase – Following Group (d) Purchase – Ignoring Group

Figure 2: CH score under different 𝐾 selected using BIC.

Action User type Amount First Block Last Block P-value the Following Group uses 𝐾 with the range of [19, 29] and [6, 16] for
All users 4904 0.7742 0.7713 4.33e−9 the click embedding and the purchase embedding respectively, and
Following 2452 0.7756 0.7746 3.05e−2 the Ignoring Group uses 𝐾 with the range of [15, 25] and [4, 14].
Click
Ignoring 2452 0.7728 0.7680 1.53e−21
Between-group p-value 4904 1.58e−8 2.88e−28
5.3 Results Analysis
All users 2916 0.7264 0.7279 1.41e−2
Following 1458 0.7223 0.7248 1.25e−6 RQ1: Does RS reinforce user click or purchase interests?
Purchase
Ignoring 1458 0.7305 0.7310 0.27
Between-group p-value 2916 2.65e−28 1.02e−24
5.3.1 Internal Validity Indexes.
As introduced in Section 4.2.1, CH can measure the extent of varia-
Table 3: Hopkins statistic. tion of the within-group clustering at different times. We first use
CH to examine the tendency of reinforcement in user interests. The
average scores of the CH index are plotted in Figure 2, we can find
Comparing the changes of 𝐻 score in the first and last block, the
that both groups have a drop in CH after three months at all 𝐾s,
clustering tendency decreases in click embeddings but increases
and CH also decreases as 𝐾 becomes larger both in the first blocks
in purchase embeddings. Furthermore, the clustering tendency of
and the last blocks. The common decreasing trend over time in two
purchase embedding is less changeable and even slightly increased
groups might attribute to how we compute CH in the last blocks.
because there are fewer local shifts in the latent space of purchase
As we mentioned in Section 4.2.1, we assign the clustering parti-
embedding compared with the temporal changes in click embed-
tion results of the first blocks to the last blocks, which means that
ding. This might attribute to the fact that users’ tastes reflected in
the same user will have the same cluster label both at the begin-
purchased items are relatively stable since users cannot choose to
ning and at the end. However, the decrease in CH suggests that the
buy whatever they want as they need to pay the price for it.
temporal shifts in the user embeddings might have made the as-
5.2.2 Select the Number of Clusters. signment of clusters unsuitable, i.e., the ideal clustering partition of
Since we have shown that the Following Group and Ignoring Groups the first blocks, can no longer serve as the ideal clustering partition
are clusterable in the previous section, we now use BIC to detect of the last blocks due to the temporal shifts in the user embed-
the optimal number of clusters (𝐾 ∗ ) for each group of embeddings. dings. Besides the effect of RS, the changes in user embeddings can
The average BIC curves are plotted in Figure 3. We do not force also result from other factors. For instance, user interest can vary
the clustering on each dataset to use the same number of clusters a lot, along with changes in external conditions in e-commerce,
in consideration of underestimation caused by inappropriate 𝐾 such as sales campaigns. As a result, these temporal shifts of user
settings. Clustering settings, such as 𝐾, have to fit the datasets embeddings in the latent space are reflected as the decrease in CH.
well to guarantee the optimal clustering results. The inaccurate In practice, we can hardly avoid this “natural” reduction caused
results might lead to overestimating or underestimating of the echo by the e-commerce platform. As a result, we evaluate the difference
chamber effect. between the two user groups to find the effect that comes from the
We set the corresponding 𝐾 of the maximum BIC as the optimal RS. We compute the temporal decreases in CH for each group with
number of clusters 𝐾 ∗ . The 𝐾 ∗ s is 24, shown in Figure 3(a), for 𝐾 in [𝐾 ∗ − 5, 𝐾 ∗ + 5] (see Table 4). As is shown in the table, the
the Following Group, and 20, shown in Figure 3(b), for the Ignoring drops of CH at 𝐾 ∗ are 48.22 and 50.73 for the Following Group and
Group, with the click embeddings. Meanwhile, 𝐾 ∗ is 11, shown in Ignoring Group in click embedding, and the average drops of CH in
Figure 3(c), for the Followings Group, and 9, shown in Figure 3(d), [𝐾 ∗ −5, 𝐾 ∗ +5] are 48.41 and 50.95 respectively. Moreover, purchase
for the Ignoring Group, with the purchase embeddings. Intuitively, embeddings have similar results that the decreases of CH for the
we could directly compare the user groups with their own 𝐾 ∗ , but Following Group are 32.74 at 𝐾 ∗ and 33.26 for average, and the
the local areas around maximum in the curves seem to be quite reductions for the Ignoring Group are 35.45 and 37.29. We further
flat. As a result, we believe the measurements around 𝐾 ∗ would check the statistical significance of the difference between two
show more reliable and plausible results than the measurement groups, finding that all differences are at 95% confidence interval
exactly at 𝐾 ∗ . Thus, we execute the experiments in the range of (i.e., 𝑝-value is less than 0.05). Overall, CH drops slower in Following
[𝐾 ∗ −5, 𝐾 ∗ +5]. The average of results is used to examine the changes Groups, showing a more stable tendency than that in Ignoring Group.
of clustering via cluster validity indexes introduced next. Finally, Accordingly, the Ignoring Group, which falls faster in CH, disperses

2267
Industry (SIRIP) Papers I SIGIR ’20, July 25–30, 2020, Virtual Event, China

(a) Click – Following Group (b) Click – Ignoring Group (c) Purchase – Following Group (d) Purchase – Ignoring Group

Figure 3: Bayesian Information Criterion (BIC). The 𝐾 of the maximum BIC is the optimal number of clusters.

Click Purchase Click Purchase


Following Ignoring P-value Following Ignoring P-value Following Ignoring P-value Following Ignoring P-value
𝑘∗ − 5 53.50 58.14 3.62e−41 41.76 52.72 6.94e−60 𝑘∗ − 5 0.1136 0.0877 1.34e−33 0.0969 0.0829 3.70e−8
𝑘∗ − 4 52.21 56.51 2.10e−39 39.30 47.35 3.54e−56 𝑘∗ − 4 0.1099 0.0856 1.85e−32 0.0825 0.0677 1.67e−10
𝑘∗ − 3 51.12 54.95 1.58e−40 37.30 43.41 1.44e−50 𝑘∗ − 3 0.1060 0.0829 1.23e−33 0.0725 0.0686 2.77e−2
𝑘∗ − 2 50.10 53.54 4.79e−35 35.60 40.21 3.50e−48 𝑘∗ − 2 0.1040 0.0811 4.19e−34 0.0697 0.0712 0.35
𝑘∗ − 1 49.19 52.02 7.64e−31 34.13 37.70 7.47e−39 𝑘∗ − 1 0.0996 0.0780 2.88e−32 0.0639 0.0650 0.52
𝑘∗ 48.22 50.73 4.29e−28 32.74 35.45 3.55e−29 𝑘∗ 0.0974 0.0756 5.52e−32 0.0615 0.0635 0.23
𝑘∗ + 1 47.17 49.38 9.78e−22 31.34 33.62 3.96e−24 𝑘∗ + 1 0.0949 0.0734 4.49e−41 0.0585 0.0634 9.17e−4
𝑘∗ + 2 46.45 48.00 3.92e−15 30.09 32.06 8.70e−23 𝑘∗ + 2 0.0929 0.0717 1.62e−33 0.0555 0.0598 4.12e−3
𝑘∗ + 3 45.74 46.89 8.67e−11 28.88 30.54 4.99e−19 𝑘∗ + 3 0.0900 0.0698 1.13e−35 0.0524 0.0583 1.88e−5
𝑘∗ + 4 44.78 45.69 1.36e−7 27.78 29.14 2.69e−18 𝑘∗ + 4 0.0890 0.0687 1.10e−39 0.0511 0.0552 1.48e−3
𝑘∗ + 5 44.05 44.62 2.61e−4 26.97 28.00 4.78e−14 𝑘∗ + 5 0.0871 0.0670 4.20e−34 0.0489 0.0510 0.13
AVE 48.41 50.95 5.97e−32 33.26 37.29 1.06e−46 AVE 0.0986 0.0765 2.28e−51 0.0648 0.0642 0.53

Table 4: Calinski-Harabasz score. The corresponding 𝐾s for Table 5: ARI scores. Same as 𝐶𝐻𝐾 , the corresponding 𝐾s for
each groups are introduced in Section 5.2.2 each groups are introduced in Section 5.2.2

to a wide range on the latent space, reveals that it may receive a To sum up, the Following Group has fewer temporal shifts in click
milder influence of reinforcement on user preference. embedding but no evident difference in purchase embedding. In
The less dispersion of Following Group in the latent space is pos- other words, in terms of click interests, partitions at the beginning
sibly due to multiple factors. One the one hand, the Following Group and the end in the Following Group are more similar, indicating more
might have more users who hold on to the previous preference in connections. While in purchase interests, clustering at the end in
items; on the other hand, Following Group has fewer changes in both groups does not show the trend of sticking to the previous
their interests than the Ignoring Group does. Either of the reasons clustering. More changes appearing in both groups in purchase
could give us the conclusion that user interest in the Following embedding could be caused by the fact that users have fewer choices
Group has a strengthening trend over time, resulting in that the to purchase because of objective constraints, such as their incomes
dispersion in latent space is suppressed to some extent. and the item prices. The effect of RS cannot “force” users to buy
some items they cannot afford, thus, even if user interest has been
5.3.2 External Validity Indexes.
reinforced, the shifts in preferences might not appear in purchase
Also, we examine the temporal changes in clustering via the exter-
behaviors. However, in click behaviors, the Following Group seems
nal validity index, ARI. Unlike CH using the same labels on two
to strengthen their preference under the effect of RS, since users
datasets, ARI compares the different clusterings of the first and the
are free to click items they are interested in, and their interests
last block and measures the similarity between clusterings. We plot
presented in click embedding do not have any other constraints.
the similarities (ARI) at different 𝐾s in Figure 4 and list the average
The group with higher ARI has fewer changes, which means they
results of ARI in Table 5. We find that in the click embedding, the
stick to the items they interacted with before and intensify their
Following Group has a higher ARI than the Ignoring Group (average
preferences. The evidence confirms the conclusion in Section 5.3.1
ARI of 0.0986 and 0.0765 respectively with the 𝑝-value of 2.28e−51).
that there exists the tendency of reinforcement in user interests in
In the purchase embedding, the difference between the two groups
the Following Group.
is not statistically significant; the 𝑝-value for the difference of the
average ARI is 0.53. Similar observation is also shown in the curves RQ2: If user interests are strengthened, is it caused by RS
in Figure 4, the curve in Figure 4(a) is higher than the curve in narrowing down the scope of items exposed to users?
Figure 4(b), but curves in Figure 4(c) and Figure 4(d) almost over- After an affirmative answer to RQ1, we examine RQ2 to explore the
lap. Let us take a look at each pair of ARI of different user groups potential cause of the reinforcement in user interests: narrowed
in purchase embedding, the differences among half of the 𝐾s in contents offered to users. To do so, we measure the content diver-
[𝐾 ∗ − 5, 𝐾 ∗ + 5] are not significant. sity of recommendation lists at the beginning and the end. The

2268
Industry (SIRIP) Papers I SIGIR ’20, July 25–30, 2020, Virtual Event, China

(a) Click – Following Group (b) Click – Ignoring Group (c) Purchase – Following Group (d) Purchase – Ignoring Group

Figure 4: ARI under different 𝐾s, which are selected by BIC.

(a) First Block – Following Group (b) Last Block – Following Group (c) First Block – Ignoring Group (d) Last Block – Ignoring Group

Figure 5: Recommendation content diversity (pairwise distance between user embeddings) in two user groups.

distributions of content diversity (the average pairwise distance of Amount First last Within-group p-value
item embeddings) in the first and last blocks for display are plot- All users 3820 1.0969 1.0937 6.10e−11
ted in Figure 5, and the corresponding average content diversities Following group 1910 1.0945 1.0882 1.95e−20
are listed in Table 6 These distributions are approximately normal, Ignoring group 1910 1.0992 1.0989 0.67
and the first blocks have a larger density around higher content Between-group p-value 3820 2.13e−12 2.16e−56
diversity than the last blocks do. Also, the distribution becomes
dispersing over time, lowering the average of the whole group. Table 6: The content diversity of recommended items.
Furthermore, the average content diversity gives us consistent ob-
servation. When paying attention to the overall temporal changes, Group drops a lot. Then, the scope of items recommended to the Fol-
we find that the content diversity among all users falls from 1.0960 lowing Group has been repeatedly narrowed down, strengthening
to 1.0937 with 𝑝-value of 6.10e−11. Even though the drop is tiny, user interests in this group as a consequence.
it indicates that both groups go through the trend of narrowing As we claimed in answer to RQ1, the reinforcement in prefer-
down the scope of the content displayed to users. Additionally, the ences — echo chamber effect — is reflected in the temporal shifts of
content diversities in the Following Group have a larger reduction user embeddings in clustering. Particularly, echo chamber appears
from 1.0945 to 1.0882 than the reduction in Ignoring Group. On the in both user click interests and user purchase interests, but the effect
contrary, the decrease in the Ignoring Group can even be ignored in the latter is sort of slight. However, in other RS platforms, such as
because of the high 𝑝-value of 0.67. This is because RS learns more movie recommendations [33], opposite observations appear in the
about followers from their interactions, such as click, purchase. Following Group, indicating that RS helps users explore more items
Thus, it is more likely for RS to provide items similar to what users and mitigate the reduction in content diversity. One possible reason
have previously interacted with. is that unlike products in e-commerce, promotional campaigns for
In e-commerce, user affects recommendations exposed to them movies mostly focus on those commercial films. Therefore, movie
through user actions, and their actions get influenced in return, recommendation platforms could still fill the recommendation list
these procedures form a feedback loop, which strengthens the per- with niche movies and slow down the reduction in content diversity.
sonalized recommendation and shrinks the scope of the content of-
fered to users. As a result, the filter bubble effect occurs in Following 6 CONCLUSIONS AND FUTURE WORK
Group. Conversely, Ignoring Group only provide minimal informa- In this paper, we examine and analyze echo chamber effect in a real-
tion about their tastes in items, since they do not interact with RS world e-commerce platform. We found that the tendency of echo
much. Hence, RS recommends items from a broad scope to explore chamber exists in personalized e-commerce RS in terms of user
the users’ preferences. The difference between the two groups is click behaviors, while on user purchase behaviors, this tendency
also statistically significant at all times. The Ignoring Group has a is mitigated. We further analyzed the underlying reason for the
higher diversity of 1.0992 in the beginning, and the difference is observations and found that the feedback loop exists between users
further enlarged in the end since the content diversity in Following and RS, which means that the continuous narrowed exposure of

2269
Industry (SIRIP) Papers I SIGIR ’20, July 25–30, 2020, Virtual Event, China

items raised by personalized recommendation algorithms brings (2018), 260–275.


consistent content to the Following Group, resulting in the echo [24] Kartik Hosanagar, Daniel Fleder, Dokyun Lee, and Andreas Buja. 2013. Will the
global village fracture into tribes? Recommender systems and their effects on
chamber effect as a reinforcement of user interests. This is one of consumer fragmentation. Management Science 60, 4 (2013), 805–823.
our first steps towards socially responsible AI in online e-commerce [25] Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. Journal of
Classification 2, 1 (01 Dec. 1985), 193–218.
environments. Based on our observations and findings, in the future, [26] Ray Jiang, Silvia Chiappa, Tor Lattimore, András György, and Pushmeet Kohli.
we will develop refined e-commerce recommendation algorithms 2019. Degenerate feedback loops in recommender systems. In Proceedings of the
to mitigate the echo chamber effects, so as to benefit online users 2019 AAAI/ACM Conference on AI, Ethics, and Society. 383–390.
[27] Xuehan Jiang. 2018. A community-evolution based approach for detecting the
for more informed, effective, and friendly recommendations. echo chamber effect in recommender systems. (2018).
[28] Lanu Kim, JD West, and Katherine Stovel. 2017. Echo chambers in science?. In
REFERENCES American Sociological Association (ASA) Annual Meeting.
[29] Richard G. Lawosn and Peter C. Jurs. 1990. New index for clustering tendency
[1] David Paul Allen, Henry Jacob Wheeler-Mackta, and Jeremy R Campo. 2017. The and its application to chemical problems. Journal of Chemical Information and
Effects of Music Recommendation Engines on the Filter Bubble Phenomenon. Computer Sciences 30, 1 (1990), 36–41.
(2017). [30] Sina Mohseni and Eric Ragan. 2018. Combating Fake News with Interpretable
[2] Arda Antikacioglu and R Ravi. 2017. Post processing recommender systems for News Feed Algorithm. arXiv preprint arXiv:1811.12349 (2018).
diversity. In KDD. ACM, 707–716. [31] Judith Möller, Damian Trilling, Natali Helberger, and Bram van Es. 2018. Do not
[3] Mahsa Badami, Olfa Nasraoui, and Patrick Shafto. 2018. PrCP: Pre- blame it on the algorithm: an empirical assessment of multiple recommender
recommendation Counter-Polarization. In KDIR. 280–287. systems and their impact on content diversity. Information, Communication &
[4] E. Bakshy, S. Messing, and L. A. Adamic. 2015. Exposure to ideologically diverse Society 21, 7 (2018), 959–977.
news and opinion on Facebook. Science 348, 6239 (may 2015), 1130–1132. [32] CHA Namjun, CHO Hosoo, LEE Sangman, and Junseok HWANG. 2019. Ef-
[5] Amit Banerjee and Raiesh N. Dave. 2004. Validating clusters using the Hopkins fect of AI Recommendation System on the Consumer Preference Structure in
statistic. IEEE International Conference on Fuzzy Systems 1 (2004), 149–153. e-Commerce: Based on Two types of Preference. In 2019 21st International Con-
[6] Dimitrios Bountouridis, Jaron Harambam, Mykola Makhortykh, Mónica Marrero, ference on Advanced Communication Technology (ICACT). IEEE, 77–80.
Nava Tintarev, and Claudia Hauff. 2019. SIREN: A Simulation Framework for Un- [33] Tien T Nguyen, Pik-Mai Hui, F Maxwell Harper, Loren Terveen, and Joseph A
derstanding the Effects of Recommender Systems in Online News Environments. Konstan. 2014. Exploring the filter bubble: the effect of using recommender
In ACM Conference on Fairness, Accountability, and Transparency. ACM, 150–159. systems on content diversity. In Proceedings of the 23rd WWW. ACM, 677–686.
[7] Laura Burbach, Patrick Halbach, Martina Ziefle, and André Calero Valdez. 2019. [34] Malay Kumar Pakhira, Sanghamitra Bandyopadhyay, and Ujjwal Maulik. 2004.
Bubble Trouble: Strategies Against Filter Bubbles in Online Social Networks. In Validity index for crisp and fuzzy clusters. Pattern Recognit. 37 (2004), 487–501.
International Conference on Human-Computer Interaction. Springer, 441–456. [35] Zachary A. Pardos and Weijie Jiang. 2019. Combating the Filter Bubble: Designing
[8] Tadeusz Caliński and Harabasz JA. 1974. A Dendrite Method for Cluster Analysis. for Serendipity in a University Course Recommendation System. IntRS’19: Joint
Communications in Statistics - Theory and Methods 3 (01 1974), 1–27. Workshop on Interfaces and Human Decision Making for Recommender Systems
[9] Allison J.B. Chaney, Brandon M. Stewart, and Barbara E. Engelhardt. 2018. How (2019).
Algorithmic Confounding in Recommendation Systems Increases Homogeneity [36] Eli Pariser. 2011. The filter bubble: What the Internet is hiding from you. Penguin
and Decreases Utility. RecSys (2018). UK.
[10] James N Cohen. 2018. Exploring Echo-Systems: How Algorithms Shape Im- [37] Marten Risius, Okan Aydinguel, and Maximilian Haug. 2019. TOWARDS AN UN-
mersive Media Environments. Journal of Media Literacy Education 10, 2 (2018), DERSTANDING OF CONSPIRACY ECHO CHAMBERS ON FACEBOOK. (2019).
139–151. [38] Kazutoshi Sasahara, Wen Chen, Hao Peng, Giovanni Luca Ciampaglia, Alessan-
[11] Pranav Dandekar, Ashish Goel, and David T Lee. 2013. Biased assimilation, dro Flammini, and Filippo Menczer. 2019. On the Inevitability of Online Echo
homophily, and the dynamics of polarization. Proceedings of the National Academy Chambers. arXiv preprint arXiv:1905.03919 (2019).
of Sciences 110, 15 (2013), 5791–5796. [39] Ana-Andreea Stoica and Augustin Chaintreau. 2019. Hegemony in Social Media
[12] Abhisek Dash, Animesh Mukherjee, and Saptarshi Ghosh. 2019. A Network- and the effect of recommendations. In Companion Proceedings of The 2019 World
centric Framework for Auditing Recommendation Systems. In IEEE INFOCOM Wide Web Conference. ACM, 575–580.
2019-IEEE Conference on Computer Communications. IEEE, 1990–1998. [40] Cass R. Sunstein. 2007. Republic.com 2.0. Princeton University Press. http:
[13] Carlos Manuel Moita de Figueiredo. 2014. Emotions and recommender systems: //www.jstor.org/stable/j.ctt7tbsw
A social network approach. (2014). [41] Cass R Sunstein. 2009. Going to extremes: How like minds unite and divide. Oxford
[14] Seth Flaxman, Sharad Goel, and Justin M Rao. 2016. Filter bubbles, echo chambers, University Press.
and online news consumption. Public opinion quarterly 80, S1 (2016), 298–320. [42] Cass R Sunstein. 2018. Republic: Divided democracy in the age of social media.
[15] Daniel M Fleder and Kartik Hosanagar. 2007. Recommender systems and their Princeton University Press.
impact on sales diversity. In ACM conference on Electronic commerce. ACM. [43] Nava Tintarev, Shahin Rostami, and Barry Smyth. 2018. Knowing the unknown:
[16] Zuohui Fu, Yikun Xian, Ruoyuan Gao, Jieyu Zhao, Qiaoying Huang, Yingqiang visualising consumption blind-spots in recommender systems. In Proceedings of
Ge, Shuyuan Xu, Shijie Geng, Chirag Shah, Yongfeng Zhang, and Gerard de Melo. the 33rd Annual ACM Symposium on Applied Computing. ACM, 1396–1399.
2020. Fairness-Aware Explainable Recommendation over Knowledge Graphs. In [44] Michalis Vazirgiannis. 2009. Clustering Validity. Springer US, Boston, MA, 388–
Proceedings of SIGIR 2020. ACM, New York, NY, USA. 393. https://doi.org/10.1007/978-0-387-39940-9_616
[17] Mingkun Gao, Hyo Jin Do, and Wai-Tat Fu. 2018. Burst Your Bubble! An Intelligent [45] Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2009. Information Theoretic
System for Improving Awareness of Diverse Social Opinions. In 23rd International Measures for Clusterings Comparison: Is a Correction for Chance Necessary?. In
Conference on Intelligent User Interfaces. ACM, 371–383. Proceedings of the 26th ICML (ICML ’09). ACM, New York, NY, USA, 1073–1080.
[18] Yingqiang Ge, Shuyuan Xu, Shuchang Liu, Zuohui Fu, Fei Sun, and Yongfeng [46] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun
Zhang. 2020. Learning Personalized Risk Preferences for Recommendation. SIGIR Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation
(2020). in Alibaba. Proceedings of the 24th ACM SIGKDD (2018).
[19] Yingqiang Ge, Shuyuan Xu, Shuchang Liu, Shijie Geng, Zuohui Fu, and Yongfeng [47] Weina Wang and Yunjie Zhang. 2007. On fuzzy cluster validity indices. Fuzzy Sets
Zhang. 2019. Maximizing marginal utility per dollar for economic recommenda- and Systems 158 (10 2007), 2095–2117. https://doi.org/10.1016/j.fss.2007.03.004
tion. In The World Wide Web Conference. 2757–2763. [48] Junjie Wu, Jian Chen, Hui Xiong, and Ming Xie. 2009. External validation
[20] Daniel Geschke, Jan Lorenz, and Peter Holtz. 2019. The triple-filter bubble: Using measures for K-means clustering: A data distribution perspective. 36 (04 2009),
agent-based modelling to test a meta-theoretical framework for the emergence 6050–6061.
of filter bubbles and echo chambers. British Journal of Social Psychology 58, 1 [49] Yongfeng Zhang, Min Zhang, Yiqun Liu, Shaoping Ma, and Shi Feng. 2013. Lo-
(2019), 129–149. calized matrix factorization for recommendation based on matrix block diagonal
[21] Julia Handl, Joshua Knowles, and Douglas B. Kell. 2005. Computational cluster forms. In Proceedings of the 22nd international conference on World Wide Web.
validation in post-genomic data analysis. Bioinformatics 21, 15 (08 2005), 3201– ACM, 1511–1520.
3212. [50] Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan,
[22] Natali Helberger, Kari Karppinen, and Lucia D’acunto. 2018. Exposure diversity Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework
as a design principle for recommender systems. Information, Communication & for news recommendation. In Proceedings of the 2018 World Wide Web Conference.
Society 21, 2 (2018), 191–207. 167–176.
[23] Martin Hilbert, Saifuddin Ahmed, Jaeho Cho, Billy Liu, and Jonathan Luu. 2018. [51] C.-N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen. 2005. Improving
Communicating with algorithms: a transfer entropy analysis of emotions-based recommendation lists through topic diversification.. In Proceedings of the 14th
escapes from online echo chambers. Communication Methods and Measures 12, 4 international conference on World Wide Web. ACM, 23–32.

2270

You might also like