0% found this document useful (0 votes)
22 views13 pages

Contextual Recommender Systems SEO

Uploaded by

mineisher00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views13 pages

Contextual Recommender Systems SEO

Uploaded by

mineisher00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 9, NO.

3, MAY/JUNE 2016 433

Online Learning in Large-Scale Contextual


Recommender Systems
Linqi Song, Cem Tekin, and Mihaela van der Schaar, Fellow, IEEE

Abstract—In this paper, we propose a novel large-scale, context-aware recommender system that provides accurate
recommendations, scalability to a large number of diverse users and items, differential services, and does not suffer from “cold start”
problems. Our proposed recommendation system relies on a novel algorithm which learns online the item preferences of users based
on their click behavior, and constructs online item-cluster trees. The recommendations are then made by choosing an item-cluster level
and then selecting an item within that cluster as a recommendation for the user. This approach is able to significantly improve the
learning speed when the number of users and items is large, while still providing high recommendation accuracy. Each time a user
arrives at the website, the system makes a recommendation based on the estimations of item payoffs by exploiting past context arrivals
in a neighborhood of the current user’s context. It exploits the similarity of contexts to learn how to make better recommendations even
when the number and diversity of users and items is large. This also addresses the cold start problem by using the information gained
from similar users and items to make recommendations for new users and items. We theoretically prove that the proposed algorithm for
item recommendations converges to the optimal item recommendations in the long-run. We also bound the probability of making a
suboptimal item recommendation for each user arriving to the system while the system is learning. Experimental results show that our
approach outperforms the state-of-the-art algorithms by over 20 percent in terms of click through rates.

Index Terms—Recommender systems, online learning, clustering algorithms, multi-armed bandit

1 INTRODUCTION

O NLINE websites currently provide a vast number of


products or services to their users. Recommender sys-
tems have emerged as an essential method to assist users in
users as well as items poses several key new challenges
including dealing with heterogeneous user characteristics
and item sets, existence of different user types such as reg-
finding desirable products or services from large sets of istered and non-registered users, large number of items
available products or services [1], [2]: movies at Netflix, and users, and the cold start problem. We address all these
products at Amazon, news articles at Yahoo!, advertise- challenges by designing online learning algorithms with
ments at Google, reviews in Yelp, etc. Moreover, Yahoo! three properties: contextualization, differential services
Developer Network, Amazon’s Amazon Web Services, and scalability.
Google’s Google Developers and numerous other major (a) Contextualization has the potential to better predict or
companies also offer Web services that operate on their anticipate the preferences of users. First-generation recom-
data, and recommendation ability is often provided to mender systems consider the expected payoff of each rec-
developers as an essential service functionality. ommendation to only be a function of the item [4], [6], [8],
The products or services provided by the recommender [9]. However, such recommender systems model the prefer-
systems are referred to generically as items [1], [2]. The per- ences of users with limited accuracy and have thus a limited
formance, or payoff, of a recommendation is based on the performance. Several recent works [5], [18], [23] show that
response of the user to the recommendation. For example, incorporating the context in which a recommendation is
in news recommendations, payoffs are measured by users’ made can greatly improve the accuracy of predicting how
click behaviors (e.g., 1 for a click and 0 for no click), and the users respond to various recommendations. Moreover, con-
average payoff when a webpage is recommended is known textualization allows the recommender to learn together for
as the Click-Through Rate (CTR) [6]. The goal of recommen- groups of users having similar contexts, which significantly
dations is to maximize the payoffs over all users that come speeds up the learning process.
to the website. The context is defined generally as an aggregate of
Although recommender systems have been deployed in various categories that describe the situations or circum-
many application areas during the past decade, the tre- stances of the user when a recommendation is made
mendous increase in both the number and diversity of [18], [19], such as time, location, search query, purchase
history, etc. When contexts are incorporated into the rec-
ommender systems, the expected payoff of a recommen-
 The authors are with the Department of Electrical Engineering, University dation can be modeled as a function of the item and the
of California-Los Angeles, Los Angeles, CA 90095. context. If the number of contexts, X, is finite and small,
E-mail: songlinqi@ucla.edu, cmtkn@ucla.edu, mihaela@ee.ucla.edu. we can run X recommenders, one for each context,
Manuscript received 7 May 2013; revised 20 Sept. 2014; accepted 25 Oct. to tackle this issue. However, when there are large or
2014. Date of publication 29 Oct. 2014; date of current version 15 June 2016. infinite number of contexts, this method fails since
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below. the number of recommenders increases significantly.
Digital Object Identifier no. 10.1109/TSC.2014.2365795 Hence, designing and building effective contextual
1939-1374 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
434 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 9, NO. 3, MAY/JUNE 2016

TABLE 1
Comparison with Existing Algorithms

Contexts Refinement in Solving Solving “cold Requiring Differential


the context space “scalability” start” training data services
[2] [4] [6] [8] [9] [11] [16] No - No No Yes No
[5] [21] Yes Partition No No Yes No
[23] [24] Yes Linear payoff assumption No Yes No No
[25] Yes Partition No Yes Yes No
[27] [28] No - Yes Yes No No
[29] Yes Partition Yes Yes Yes No
Our work Yes An adaptive neighborhood Yes Yes No Yes

recommender systems with large or infinite numbers of continuous exploration addresses the cold start problem
contexts remains a key challenge. since it helps the recommender to learn the payoffs of
In this paper we propose methods for contextualization new items and preferences of new users that are not simi-
which helps the recommender to learn from its past experi- lar to the previous ones over time.
ences that are relevant to the current user characteristics, to In this paper, we propose a novel, large-scale contextual
recommend an item to the current user. recommender system which is able to address all of the
(b) Differential services are important for online services, above mentioned challenges. Our system is based on a
including recommender systems. Existing works evaluate novel contextual Multi-Arm Bandit (MAB) approach. When
recommender systems based on their average performance a user with a specific context arrives, the recommender uti-
[22], [23], [29]. However, it may also be desirable for service lizes information from past arrivals within a neighborhood
providers to build customer loyalty by providing registered of the current context (i.e., “similar” to the current context).
users with a higher quality of service (in this case better rec- Based on the payoff estimations through these past arrivals,
ommendations) than the quality provided to anonymous the recommendation is made at an item-cluster level,
users using the system. Hence, designing recommender sys- instead of an item level. Our approach controls the size of
tems which have the ability to provide differential services the context neighborhood and the set of clusters to obtain
to various users represents a key challenge. sufficiently accurate and efficient payoff estimations. To
(c) Scalability is a key challenge faced by a recom- learn about new items and contexts, our algorithms use a
mender system with a large number of items and/or con- differential method for exploration: the system explores
texts. If not designed properly, the learning speed and more often when the users have a lower priority, at the risk
the implementation complexity of such systems can suffer of providing lower quality recommendations, and explore
significantly [3], [7], [10]. Hence, the recommender system less in order to provide consistently high quality recommen-
needs to easily scale to accommodate a large number of dations when the users have a higher priority. After each
items and contexts. recommendation, the realized payoff is received by the rec-
Another important challenge is cold start, which ommender system. Each item-cluster corresponds to an
appears in practice when the system has a very diverse set action (referred to as the arm in a bandit framework) for our
of items and very limited or no a priori information about learning algorithm, hence learning is performed at the clus-
the items that the users prefer [3], [7]. In this case, recom- ter level, and this solves the scalability problem.
mendation algorithms [2], [6], [9] fail to recommend items The goal of learning is to minimize the regret, which is the
that are most liked by users due to the limited information difference between the optimal expected total payoff up to
about the user’s preference for specific items. This prob- time T , which can be achieved if all information (i.e.,
lem is also often referred to as the “coverage” problem [3], expected payoffs of each item/cluster in a certain context) is
[7]. New items are unlikely to be recommended and new known, and the expected total payoff gained up to time T
users are unlikely to be given good recommendations due through our learning algorithm, in which the information is
to the limited interaction histories of these new items/ incomplete.
users. Therefore, dealing with cold start remains a key The remainder of this paper is organized as follows. In
challenge for large-scale recommender systems. Two Section 2, we discuss the related works and highlight the
important features of our proposed algorithm address the differences from our work. Section 3 describes the system
issue of cold start: contextualization and continuous model of the recommender system. Section 4 introduces the
exploration. By contextualization, our algorithm clusters adaptive clustering based contextual MAB recommendation
the users according to their contexts and items according algorithm. Section 5 provides simulation results. Section 6
to their features such that past information gained from concludes the paper.
users with similar contexts and items with similar features
can be used to solve the cold start problem when a new
user comes or a new item is introduced. Moreover, our 2 RELATED WORKS
algorithm explores continuously, i.e., it does not explore A plethora of prior works exists on recommendation algo-
only initially based on limited data, and then exploit for rithms. We compare our work with existing works in
the rest of the users. It explores and exploits in an alternat- Table 1. We categorize these algorithms based on the follow-
ing fashion, and keeps learning as new users come. This ing characteristics: their consideration of contexts; their
SONG ET AL.: ONLINE LEARNING IN LARGE-SCALE CONTEXTUAL RECOMMENDER SYSTEMS 435

refinement of the context space (if any); their ability to of the context with an unknown weight vector, and the
address the issues of “scalability”, “cold start”, and LinUCB algorithm is proposed to solve news article rec-
“differential services”; and their ability to operate without a ommendation problems. However, the linearity may not
period of training. Two main categories are filtering-based hold in practice for some recommender systems [22],
and reinforcement learning methods [3]. Filtering-based [25], [29]. In a more general setting [25], the context
approaches, such as collaborative filtering [7], [8], content- space is modeled as a bounded metric space in which
based filtering [2], [9] and hybrid approaches [10], [11], the payoff function is assumed to satisfy a Lipschitz con-
employ the historical data of users’ feedback to calculate the dition, instead of being a linear function of contexts. In
future payoffs of users based on some prediction functions. this context space, partition based algorithms are pro-
Collaborative filtering approaches predict users’ preferen- posed to solve the contextual recommendation problems
ces based on the preferences of their “peer” users, who have [25]. These works [18], [22], [25], however, do not take
similar interests on other items observed through historical into account the large item space, which can cause scal-
information [7], [8]. Content-based filtering methods help ability problems in practice.
certain users to find the items that are similar to the past A different strand of related works studies recommender
selected items [2], [9]. Hybrid approaches combine the col- systems that have a large number of items to recommend
laborative filtering and content-based filtering approaches [3], [7], [27], [28]. In [28], the item space is partitioned into
to gather more historical information in the user-item space subspaces, and online learning is performed in each sub-
when a new user or a new item appears in the system [10], space. Alternatively, in [27], the problem is solved by con-
[11]. However, filtering-based approaches are unable to sidering a static clustering of items. These works [27], [28]
solely tackle issues of cold start and scalability. Although can solve the scalability problem by reducing the number of
the hybrid approaches can alleviate the cold start problem choices of each recommendation. However, these works [3],
to some extent due to the consideration of both similar items [7], [27], [28] do not incorporate contexts in the recommen-
and similar users when a new item or user appears, it can- dations. In contrast, we consider the contexts when making
not solve the cold start problem when no similar items/ recommendations.
users of a new item/user exist in the system. Building large-scale contextual recommender systems is
Reinforcement learning methods, such as MAB [23], [24], more challenging. Few related works tackle both problems
[25] and Markov Decision Processes (MDPs) [26], are widely in an integrated manner. The most relevant theoretical
used to derive algorithms for recommending items. MDP- works related to our proposed solution are [25], [29]; how-
based learning approaches model the last k selections of a ever, there are significant differences between our work and
user as the state and the available items as the action set, these existing works. First, the works in [25], [29] are theo-
aiming at maximizing the long-term payoff [26]. However, retical and do not consider the key characteristics and
the state set will grow fast as the number of items increases, requirements of recommender systems. For example, the
thereby resulting in very slow convergence rates. In [32], algorithm proposed in [29] requires the knowledge of
“state aggregation” methods are used to solve this MDP Lipschitz constants, which makes the algorithm data-
problem with large state set, but the incurred regret grows dependent (different datasets may have different Lipschitz
linearly in time. MAB-based approaches [12], [13], [15], [17], constants based on the characteristics of the data) and hence
such as "-greedy [17] and UCB1 [15], provide convergence difficult to implement in practice; the algorithm in [25]
to the optimum, and not only asymptotic convergence, but requires knowing the covering constants of context and
also a bound on the rate of convergence for any time step. item spaces. In contrast, our learning algorithm does not
They do this by balancing exploration and exploitation, need the Lipschitz constant or covering constants to
where exploration involves learning about new items’ pay- operate; these constants are only used to analyze the perfor-
offs for a particular user by recommending new items while mance of the algorithm (i.e., to show the regret). Second, the
exploitation means recommending the best items based on models in [25], [29] consider only two spaces: the context
the observations made so far. space and the item space; while in our work, we consider
To further improve the effectiveness of recommenda- the user space, the context space, and the item space. Hence,
tions, contexts are considered in recent recommender we can provide differential services for different types of
systems [18], [21], [22]. For instance, it is important to users by choosing different trade-offs between exploitation
consider the search queries and the purchase histories and exploration. Third, in [29], combining the items and
(which can be considered as the contexts) of users when contexts greatly increases the space dimension, resulting in
making news article recommendations. Moreover, in a highly-complex online algorithm. In our work, we greatly
many applications, such as news recommendations, the reduce the dimension of the spaces by separately consider-
recommender system only observes the features of anon- ing the context and the item space. We build the item-
ymous users, without knowing the exact users. In this cluster tree offline, and only perform online calculations in
case, the users’ features (cookies) can be also considered the context space, resulting in lower-complexity online
as contexts. The context-aware collaborative-filtering learning compared with that in [29].
(CACF) and the contextual MAB methods have been In existing works [25], [27], [28], [29], the algorithms
studied to utilize such information [21], [23], [25], [29]. operate by fixing a partition of the context space. In con-
The CACF algorithm extends the collaborative filtering trast, our approach selects a dynamic subspace in the con-
method to a contextual setting by considering different text space each time to ensure a sufficiently accurate
weights for different contexts [21]. In [23], [24], the pay- estimation of the expected payoffs with respect to current
off of a recommendation is modeled as a linear function context, with a precise control of the size of the context
436 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 9, NO. 3, MAY/JUNE 2016

Fig. 2. Construction of item-cluster tree.


Fig. 1. Contextual recommender system model.

subspace. Moreover, our approach can provide differential news recommender system, an item is a news item, and
services (different instantaneous performance) to ensure the similarity between two news items depends on their
the satisfaction of higher type users, which are not consid- features: whether they are in the same category (e.g.,
ered before in existing works [4], [5], [6], [8], [9], [21], [23], Local News), whether they have the same keywords
[27], [28], [29]. A detailed comparison with the existing (e.g., Fire), etc.
works is presented in Table 1. Since the number of items in the system is large, we clus-
ter them based on the metric sI in the item space in order to
reduce the possible options for recommendation. We use an
3 RECOMMENDER SYSTEM MODEL
item-cluster tree to represent the hierarchical relationship
We consider the contextual recommender system shown in among item clusters, and an example is shown in Fig. 2. In
Fig. 1, which operates in discrete time slots t ¼ 1; 2; . . .. At an item-cluster tree, each leaf represents an item and each
each time slot, the system operates as follows: (1) a user node represents an item cluster. The depth of a node is
with a specific context arrives; (2) the recommender system defined as the length of the path to the root node, and the
observes the context, and recommends to the user an item depth of the tree is defined as the maximum depth of all
from the set of items, based on the current context as well as nodes in the tree. We denote the depth of the tree by L. We
the historical information about users, contexts, items, and define the collection of nodes at depth 0  l  L of the tree
payoffs of former recommendations; and (3) the payoff of as layer l and a node1 at layer l is denoted by Cl;k , where
this recommendation, whose expectation depends on the k 2 f1; 2; . . . ; Kl g and Kl is the number of nodes at depth l.
item and context, is received by the recommender system There is only one node C0;1 at layer 0, i.e., the root node. The
according to the user’s response. I nodes at layer L are the leaves of the tree, each of which is
Users: We formally model the set of users as U, which a single-element cluster that only contains one item. The
can be either a finite set (i.e., U ¼ f1; 2; . . . ; UgÞ or infinite hierarchical structure of the cluster tree implies that for a
set (i.e., U ¼ f1; 2; . . .g, e.g., a news recommender system child node Clþ1;k0 of node Cl;k , we have item i 2 Cl;k , for all
frequented by an infinite number of users). When a user i 2 Clþ1;k0 .
u 2 U arrives, the system can observe the user’s type:
The cluster size sðCl;k Þ is defined as the maximum dis-
important or ordinary users, registered or anonymous
tance of two items in that cluster, namely, sðCl;k Þ ¼
users, etc. We divide the users into S types, and denote
maxi;i0 2Cl;k fsI ði; i0 Þg. In an item-cluster-tree structure, a
the set of type s 2 f1; 2; . . . ; Sg users by U s . Based on the
users’ types, the recommender system can provide differ- general tree metric can be defined as a mapping from
ential services for users. the set of depths of the nodes/clusters in the tree to
Contexts: We model the context set X as a dC dimensional maximum size of clusters, i.e., sT : N ! R. Thus, sT ðlÞ
continuous space, and the context x 2 X is a dC dimensional denotes the maximum cluster size at depth l, namely,
vector. For simplicity of exposition, we assume that the con- sT ðlÞ  maxi;i0 2Cl;k ;1kKl fsI ði; i0 Þg for any cluster at depth
text space is X ¼ ½0; 1dC , but our results will hold for any l. For a tree metric, the maximum cluster size at depth l
bounded dC dimensional context space. In the context space, is not smaller than that of a cluster at depth l þ 1, i.e.,
the similarity distance sC is defined as a metric sT ðlÞ  sT ðl þ 1Þ. We can see from Fig. 2 that item 6 and
sC : X  X ! ½0; 1Þ. A smaller sC ðx; x0 Þ indicates that the item 7 are in the same cluster C2;3 ; item 6 and item 9 are
two contexts x and x0 exhibit higher similarity. In the in the same cluster C1;2 ; and the metric between item 6
Euclidean space X , the metric sC can be the Euclidean norm and item 7 is smaller than that between item 6 and item
or any other norm. Note that we will add the subscript t to 9, namely, sI ð6; 7Þ < sI ð6; 9Þ. In Section 4, we will provide
user u and context x when referring to the user and context an algorithm to construct such item-cluster trees. Using
associated with time period t. the constructed item-cluster tree, we can control the size
Items: We denote the set of items by I ¼ f1; 2; . . . ; Ig. and number of the clusters when the recommendations
In the item space, the similarity distance is defined as a are made.
metric sI : I  I ! R, which is based on the features of
the items and known to the recommender system, as in
1. Note that each node of the tree corresponds to an item cluster, so
[10], [22], [29]. A smaller sI ði; i0 Þ indicates that the two we will use the terms “node” and “cluster” and their notations
items i and i0 exhibit higher similarity. For example, in a interchangeably.
SONG ET AL.: ONLINE LEARNING IN LARGE-SCALE CONTEXTUAL RECOMMENDER SYSTEMS 437

The expected payoff of a recommendation depends on to construct an item-cluster tree that fulfils this tree metric,
the context and the item. For a user with context x 2 X , based on an arbitrary given metric among items.2 The con-
the payoff of recommending item i is denoted by a ran- structed item-cluster tree will be used in the recommenda-
dom variable ri;x 2 ½0; 1, whose distribution is fixed but tion algorithm. The Construction of Item-Cluster Tree
unknown to the recommender system. The expectation algorithm is described in Algorithm 1, and an example is
of ri;x is denoted by mi;x . Given the number of items nl;k shown in Fig. 2.
in cluster Cl;k , the expected payoff
P of cluster Cl;k for con-
text x is denoted by ml;k;x ¼ i2Cl;k mi;x =nl;k . Only the Algorithm 1. Construction of Item-Cluster Tree
payoffs of the recommended items can be observed by Input: item set I ¼ f1; 2;    Ig, base b, depth L of the tree.
the recommender system and can be used for further Initialization: Set L :¼ minfL; Lb g, where Lb is the maximum l
recommendations. such that bl1  mini;i0 2I ;sI ði;i0 Þ6¼0 sI ði; i0 Þ.
Context-aware collaborative-filtering algorithms con- 1: set node set at layer L as KL :¼ fCL;1 ; CL;2 ; . . . CL;KL g, where
sider that an item has similar expected payoffs in similar CL;k :¼ fkg; 81  k  KL , and KL ¼ I.
contexts [18], [19], [21], [22], while contextual bandit algo- 2: set representative point set at layer L as
rithms use a Lipschitz condition to model the similar RP L :¼ fiL;1 ;iL;2 ;   iL;KL g, where iL;k :¼ k; 81  k  KL .
expected payoffs of an item in two similar contexts [25], 3: set the child sets of nodes at layer L as empty sets:
[29]. As in [25], [29], we formalize this in terms of a Lipschitz ChdðCL;k Þ :¼ ;; 81  k  KL .
condition as follows. 4: for l ¼ ðL-1):0 do
5: set k :¼ 0. //counter for the number of clusters at layer l
Assumption 1 (Lipschitz condition for contexts). Given two 6: while RP lþ1 6¼ ; do
contexts x; x0 2 X , we have the following assumption: 7: set k :¼ k þ 1.
jmi;x  mi;x0 j  LC sC ðx; x0 Þ for any item i, where LC is the 8: arbitrary select an element ilþ1;k^ 2 RP lþ1 .
Lipschitz constant in the context space. 9: find the subset SK of Klþ1 such that
Similarly to [28], [29], we model the similar payoffs for SK :¼ fClþ1;k0 : 1  k0  Klþ1 ; ilþ1;k0 2 RP lþ1 ;
l
similar items in the same context using a Lipschitz condi- sI ðilþ1;k^; ilþ1;k0 Þ  ð1bÞb
1þb g:
tion, described in Assumption 2. 10: set Cl;k :¼ [Clþ1;k0 2SK Clþ1;k0 .
Assumption 2 (Lipschitz condition for items). Given two 11: set ChdðCl;k Þ :¼ SK.
items i; i0 2 I , we have the following assumption: 12: set PrtðClþ1;k0 Þ :¼ Cl;k ; 8Clþ1;k0 2 SK.
13: set il;k :¼ ilþ1;k^.
jmi;x  mi0 ;x j  LI sI ði; i0 Þ for any context x 2 X , where LI is
14: remove representative points of selected clusters from
the Lipschitz constant in the item space. RP lþ1 :
Based on Assumption 2, in an item-cluster tree with tree RP lþ1 :¼ RP lþ1 nfilþ1;k0 : Clþ1;k0 2 SKg:
metric sT ðÞ, we have jmi;x  mi0 ;x j  LI sT ðlÞ for any x 2 X 15: end while
and i; i0 2 Cl;k . Note that the Lipschitz constants LC and LI 16: set Kl :¼ k.
are not required to be known by our recommendation algo- 17: end for
rithm. They will only be used in quantifying its performance. 23: set PrtðC0;1 Þ :¼ ;.

The algorithm first sets each leaf at depth L to be a single


4 RECOMMENDATION ALGORITHM item cluster, and then operates from leaves to root by com-
In this section, we first propose the item-cluster tree con- bining clusters that are “similar”. The similarity between
struction method to cluster the items, then propose the two clusters is evaluated through the distance between the
Adaptive Clustering Recommendation (ACR) algorithm and representative points in the clusters. If the distance between
show its performance bound in terms of regret. two representative points is below a layer-varying thresh-
old, the two clusters are considered similar and have the
4.1 Construction of Item-Cluster Tree (CON-TREE) same parent cluster.
We first cluster items using an item-cluster tree, such that A key property of the CON-TREE algorithm is that it not
the recommendations are made at a cluster level instead only constructs an item-cluster tree that fulfils the exponen-
of an item level which reduces the possible recommenda- tial tree metric, but that it also has a bound on the maximum
tion options (arms) to solve the scalability problem. To number of item clusters at each layer. This is important
explicitly characterize the hierarchy of the item-cluster because we can balance the speed of learning and the accu-
tree, we denote the tree by a tuple < L, fKl g0lL , racy by adjusting the number of clusters and the cluster
fChdðCl;k Þg0lL;1kKl , fPrtðCl;k Þg0lL;1kKl >, where L size. We formalize this property in the following theorem.
is the depth of the tree; Kl ¼ fCl;k : 1  k  Kl g is the clus- Theorem 1. The item-cluster tree constructed via the CON-
ter set at layer l, and Kl ¼ jKl j is the cardinality of Kl ; TREE algorithm has the following properties: (1) the con-
ChdðCl;k Þ is the set of child clusters of cluster Cl;k (a subset structed tree with a depth at most Lb fulfils the exponential
of Klþ1 for 0  l < L, and ; for l ¼ LÞ; PrtðCl;k Þ is the set of tree metric, where Lb is the maximum l such that
parent cluster of cluster Cl;k (a one-element subset of Kl1
for 0 < l  L, and ; for l ¼ 0Þ.
2. Item-cluster trees that fulfill a certain tree metric are not unique;
In this paper, we adopt an exponential tree metric: however, they will provide the same performance bound for the recom-
sT ðlÞ ¼ bl , where b 2 ð0:5; 1Þ is the constant base. Our goal is mendations as shown in our theorems.
438 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 9, NO. 3, MAY/JUNE 2016

bl1  mini;i0 2I;sI ði;i0 Þ6¼0 sI ði; i0 Þ, and (2) the maximum number
of clusters at layer l of this tree is bounded by cI ð1þb dI 1 ldI
1bÞ ð b Þ ,
where cI and dI are the covering constant and covering dimen-
sion for the item space.3
Proof. See Appendix B, which can be found on the Com-
puter Society Digital Library at http://doi.
ieeecomputersociety.org/10.1109/TSC.2014.2365795. tu Fig. 3. Context arrivals and refinement in the context space.

We can see that according to the definition of the expo-


recommender system explores. In such a way, the average
nential tree metric, the proposed CON-TREE algorithm
per-period regrets of higher type users are expected to be
ensures that the cluster size at layer l is bounded by bl , and
smaller than those of the lower type users. Formally, we
from Theorem 1, the number of clusters at layer l is
define the instantaneous regret of type s users as
bounded by OðbldI Þ. If l is larger, then the cluster size at
IRsp ðtÞ ¼ E½mi ðtÞ;xt  mpL ðtÞ;pK ðtÞ;xt jut 2 U s  for an algorithm
layer l is smaller, but the number of clusters at layer l is
p, where the expectation is taken with respect to the con-
larger, and vice versa. This implies that choosing clusters
text arrival given current user type.
with larger depth l will make more specific recommenda-
We first present the ACR algorithm in Fig. 3, Fig. 4 and
tions to the users, but it will require learning more about
Algorithm 2. Initially, the item-cluster tree that fulfils the
specific characteristics of items since the number of clusters
exponential tree metric has been built offline through the
is larger. We also note that the item-cluster tree can be parti-
CON-TREE algorithm. We denote the epochs by l ¼ 0; 1;
tioned by a set of clusters Kl at layer l, which contain all the
items and no two clusters contain the same item. Thus any 2 . . . , and epoch l contains 2l time slots. In epoch l, the ACR
item is included in one of the clusters at the same layer. algorithm chooses the cluster set Kminfl;Lg that contains all
Note that since the CON-TREE algorithm needs up to the clusters at layer minfl; Lg, as shown in Fig. 4. Note that
Kl2 calculations at layer l, the computational complexity we will denote by lðtÞ the layer selection corresponding to
PLb 2ld time slot t. As epoch goes on, the algorithm selects an item
of the CON-TREE algorithm is bounded by Oð l¼0 b I Þ.
cluster from a cluster set that contains larger number of
Since it is implemented offline, the CON-TREE algorithm clusters, but these clusters have smaller cluster size. Hence,
does not affect the computational complexity of the online the algorithm can make more specific recommendations to
recommendation. users over time.
4.2 Adaptive Clustering Recommendation
Algorithm 2. Adaptive Clustering Recommendation
We have previously proposed a one-layer clustering rec-
Algorithm
ommendation algorithm (OLCR) in [30] to solve the con-
textual recommendation problem with large number of Input: Item-cluster tree, periods T .
items. However, the OLCR algorithm is based on fixed Initialization: initial partition K :¼ K0 ¼ fC0;1 g, epoch
clustering method. A disadvantage of OLCR is that it is E :¼ minfL; blog2 T cg.
difficult to decide the optimal size of the clusters. In this 1: for l ¼ 0 : E  1do
work, we propose an adaptive clustering based ACR 2: Set partition K :¼ Kl ¼ fCl;k : 1  k  Kl g.
3: for t ¼ 2l : 2lþ1  1do
algorithm that can adaptively select the cluster size in
4: Observe the user’s type s and the context xt . Find a ball
order to balance between making specific recommenda-
Bðxt ; rt Þ with radius rt ¼ tða1Þ=dC .
tions and reducing the number of clusters. We choose the
5: Select the cluster with maximum index
“optimal item” as the benchmark, which is defined as
k ¼ arg maxk0 2K Il;k
s
ðtÞ, with ties broken arbitrarily.
i ðtÞ ¼ arg maxi2I mi;xt with respect to the context xt arriv-
6: Randomly select a child node Clþ1;k^ of Cl;k .
ing at time t. The regret of learning up to time T of an 7: Randomly recommend an item in cluster Clþ1;k^.
algorithm p, which selects the pK ðtÞ cluster at layer pL ðtÞ 8: Receive the reward: rt .
(i.e., cluster CpL ðtÞ;pK ðtÞ Þ, given the context at time t, is 9: Record ft; xt ; Cl;k ; Clþ1;k^; rt g.
defined as the difference of expected payoffs from choos- 10: end for
ing item i ðtÞ, namely, 11: end for
12: Set l :¼ E, and set partition K :¼ KE ¼ fCE;k : 1  k  KE g
XT 13: for t ¼ 2l : T do
Rp ðT Þ ¼ ½mi ðtÞ;xt  mpL ðtÞ;pK ðtÞ;xt : (1)
t¼1 14: Observe the context xt . Find a ball Bðxt ; rt Þ with radius
rt ¼ tða1Þ=dC .
Since for different types of users, the recommender 15: Select the cluster with maximum index
system may provide differential services in order to main- k ¼ arg maxk0 2K Il;ks
ðtÞ, with ties broken arbitrarily.
tain the higher type users’ satisfaction. We assume that 16: Randomly recommend an item in cluster Cl;k .
type s users have a certain fixed probability ps to arrive at 17: Receive the reward: rt .
each time. For higher type users, the recommender sys- 18: Record ft; xt ; Cl;k ; rt g.
tem explores less, and for lower type users, the 19: end for

3. See Appendix A, available in the online supplemental material, Each time, a user with context xt arrives, and the algo-
for the definitions of covering dimension and covering constant. rithm chooses a relevant ball Bðxt ; rt Þ in the context space,
SONG ET AL.: ONLINE LEARNING IN LARGE-SCALE CONTEXTUAL RECOMMENDER SYSTEMS 439

Fig. 4. Recommender system based on ACR algorithm.

whose center is xt and radius is rt ¼ tða1Þ=dC as shown in


IRsACR ðtÞ 
8
Fig. 3. It then selects a cluster in KlðtÞ , by calculating and 24A1 KlðtÞ cC ln t ð1aÞa ð1aÞ
>
> 
comparing the index of each cluster at the current layer lðtÞ >
> KlðtÞ 2t2A1 þ t dC þ 4LC t dC
>
<
LC
based on the past history and current context. Subsequently, þ2LI tln b= ln 2 ; s ¼ 1
ð1aÞ ;
it randomly selects a child cluster at layer lðtÞ þ 1 (if avail- >
> KlðtÞ 2t2As þ 4LC t dC þ 2LI tln b= ln 2
>
>
> Q Ps0
able) and randomly recommends an item in the selected : h ts0 ts0 þ1
þKlðtÞ s1s0 ¼1 ½1  s00 ¼1 ps00 pBðxt ;rt Þ ð1  t Þ ; s 6¼ 1;
child cluster. Note that in the algorithm, to extract informa-
tion of a former epoch, we record two-cluster selections A lnðttÞ
where ts0 ¼ minft : RsA0 s lnðtÞ < ð1 þ r3rtt 2
Þ g, for s0 ¼ 1; 2; . . . ;
each time4: one is the cluster Cl;k at layer l, and the other one t
s  1, and pBðxt ;rt Þ ¼ x2Bðxt ;rt Þ fðxÞdx.
is the cluster Clþ1;k^ at layer l þ 1, which is a random selected
child cluster of Cl;k . Proof. See Appendix C, available in the online supplemental
The index represents a combination of exploration and material. u
t
exploitation, and its calculation is the main part of this algo-
rithm. The set of past time periods when the contexts We can see that type 1 users have a higher regret than
arrived at the relevant ball Bðxt ; rt Þis denoted by other users, because they need to explore more than
T ðBðxt ; rt ÞÞ ¼ ft : t < t; xt 2 Bðxt ; rt Þg. The number of others. For type s users, we can see from Theorem 2 that as
times that cluster Cl;k Phas been selected in that ball is the percentage of lower type users (s0 < sÞ increases, the
denoted by Nl;k;t ¼ t2T ðBðxt ;rt ÞÞ IfpL ðtÞ ¼ l; pK ðtÞ ¼ kg. instantaneous regret IRsACR ðtÞ decreases, because these
Therefore, the index for cluster Cl;k can be written as lower type users will explore more to benefit the type s
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi user. We can also see that as the ratio of exploration-
s
Il;k ðtÞ ¼ rl;k;t þ As ln t=Nl;k;t , where As is the exploration-
exploitation trade-off factors (As0 =As , s0 < sÞ increases, ts0
exploitation
P trade-off factor for type s users, and increases, hence the regret IRsACR ðtÞ decreases. This
rl;k;t ¼ ð t2T ðBðxt ;rt ÞÞ rt IfpL ðtÞ ¼ l;pK ðtÞ ¼ kgÞ=Nl;k;t is the implies that a bigger ratio of As0 =As represents a higher
sample-average payoff in the relevant ball. Without loss of differential service level.
generality, we assume that the exploration-exploitation From Theorem 2, we can observe that the instanta-
trade-off factor As decreases as the users’ type increases, neous regret IRsACR ðtÞ sublinearly decreases with t,
namely, A1  A2      AS . namely, IRsACR ðtÞ ¼ Oðtg Þ, where 0 < g < 1. The instanta-
We define the suboptimal cluster set at time t as neous regret measures the performance loss in terms of
SðtÞ ¼ fðlðtÞ; kÞ : k 2 KlðtÞ ; mlðtÞ;k ;t mlðtÞ;k;t > 4LC rt g, where the average payoff due to recommending a suboptimal
t
kt ¼ arg maxk mlðtÞ;k;t . We also define W k;t ¼ fNlðtÞ;k;t <
s
item for the user. CTR is a common performance measure
36As ln t in applications such as news and advertisement recom-
g. The following theorems prove a bound
ðm mlðtÞ;k;t þ2LC rt Þ2
lðtÞ;t mendations [6], [23]. The regret bound on the CTR is
on the instantaneous and long-term regrets of the ACR Oðtg Þ. Another interesting performance metric is the
algorithm. root-mean-square error (RMSE), which is commonly used
Theorem 2. If 1=ð1 þ dC Þ < a < 1 and the fully exploration in rating systems to measure the difference of the rating
assumption holds: PrfpðtÞ ¼ ðlðtÞ; kÞjut 2 U s ; W sk;t ; ðlðtÞ; prediction from the user’s actual rating[19], [20]. The
regret can be converted to a regret bound on the RMSE.
kÞ 2 SðtÞg  1  th for any time t, where h > 0 is a parame-
To do this, it is sufficient to consider a transformation in
ter, then for independent and identically distributed (i.i.d.)
which the reward is considered as the negative mean-
user and context arrivals, the instantaneous regret IRsACR ðtÞ
square of the suboptimality in CTR resulting from subop-
of the type s user in period t is bounded by
timal item recommendations. From Theorem 2, it can be
shown that the RMSE is Oðtg=2 Þ.
Theorem 3. For 1=ð1 þ dC Þ < a < 1 and i.i.d. user arrivals, the
4. Note that this can be generalized to arbitrary levels ahead, like
children of children, but the computational complexity and memory regret RACR ðT Þ up to time T of the ACR algorithm is bounded
requirements of each epoch grows. by
440 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 9, NO. 3, MAY/JUNE 2016

TABLE 2 5 EXPERIMENTAL RESULTS


Comparison with Existing Solution
In this section, we evaluate the performances of our algo-
Regret Space complexity Time complexity rithm through experiments, using the Yahoo! Today Mod-
ule dataset.
[29] dC þdI þ1
OðI þ T þ T  Þ OðT 1þ2 þ IT Þ
OðT dC þdI þ2 ln T Þ
P
ACR dC þdI þ1
Oð El¼0 Kl þ T Þ
OðT 2 þ KE T Þ
OðT dC þdI þ2 ln T Þ 5.1 Description of the Dataset
The Yahoo! Today Module dataset7 is based on the web-
ln b page recommendations of Yahoo! Front Page, which is a
X
S X
1
2LI T 1þln 2
RACR ðT Þ  2KE ps t2As þ prominent Internet news website of Yahoo! This dataset
s¼1 t¼1
1 þ ln b= ln 2 contains around 40 million instances collected from
ð1aÞð1þdC Þ Yahoo! Front Page during May 1 to 10, 2009. Each
24A1 KE cC ð1:386 þ ln T ÞT dC
instance of the dataset includes (1) recommended item
þ
LC ð2ð1aÞð1þdC Þ=dC  1Þ IDs and features, (2) contexts, and (3) click behavior of
ð1aÞð1þdC Þ
11a users. For each user the five-dimensional context vector
48A1 KE cC T dC ln 2
4LC dC T dC describes the user’s features, and is obtained from a
þ 2
þ ;
LC ð2ð1aÞð1þdC Þ=dC  1Þ dC  1 þ a higher dimensional raw feature vector of over 1,000 cate-
(2) gorical components by applying dimensionality reduction
techniques [23]. The raw feature vector includes: 1) demo-
where E ¼ minfL; blog2 T cg, cC is the covering constant of graphic information: gender and age; 2) geographic fea-
the context space X . tures; and 3) behavioral categories: about 1,000 binary
categories that summarize the user’s consumption his-
Proof. See Appendix C, available in the online supplemental tory. The high dimensional raw feature vector then is con-
material. u
t verted to a five dimensional context feature vector [23].
From Theorem 3, we observe that as long as the mini- The number of items in this dataset is 271, all of which
mum exploration-exploitation trade-off factor satisfies are used in our experiments.
I ln 2 The special data collection method [23] allows evaluating
AS  2dlnð1=bÞ , the the regret5 RACR ðT Þ is sublinear in T , since
online algorithms on this dataset without introducing bias.
from Theorem 1, KlðtÞ can be bounded through: KlðtÞ 
ln 2
Essentially, each instance in the dataset contains only the
dI 1 dI log2 t dI dI
cI ð1þb
1bÞ ð b Þ  cI ð1þb
1bÞ t
ln b . If the system does not con- click behavior of a user for one item that is recommended to
sider different user types, then all As ¼ 1, then the regret the user at the time of data collection. When collecting the
can be simplified as OðT ðdC þdI þ1ÞðdC þdI þ2Þ ln T Þ, by setting dataset, the item recommendations are done completely
a ¼ ðdI þ 2Þ=ðdI þ dC þ 2Þ and b ¼ 21=ðdI þdC þ2Þ . independent of the context of the user and the previous
item recommendations. As explained in [23], this removes
4.3 Theoretical Comparison any bias in data collection that might result from the algo-
In this section, we compare the computational complexity rithm used to make item recommendations when collecting
(time complexity), the memory required (space complexity), the data.
and the performance bound with the benchmark in [29], Similar to [23], we use this offline dataset to evaluate
shown in Table 2. For the ACR algorithm, the computational our online learning algorithms. When simulating our
complexity is Oðt þ KlðtÞ Þ at each time t. In this case, the algorithms, we randomly pick users from this dataset.
computational complexity up to time T is OðT 2 þ KE T Þ, Since this data set is very large (around 40 million instan-
which is polynomial in T . The space complexity for the ces), for a user that has a context x, it is possible to find
P 271 instances, each of which contains a different item rec-
ACR algorithm is Oð E l¼0 Kl þ T Þ. The ACR algorithm ommendation, a context that is within " of x in terms of
achieves the a regret OðT ðdC þdI þ1ÞðdC þdI þ2Þ ln T Þ. Actually, the Euclidian distance (in our experiment, we use
the regret of the ACR algorithm is potentially better than " ¼ 0:01), and the user’s click behavior for that item rec-
OðT ðdC þdI þ1ÞðdC þdI þ2Þ ln T Þ due to that KE is often much ommendation. In this way, the online algorithm is evalu-
smaller than cI ð1þb dI dI ln 2= ln b
1bÞ T . ated by using the users and their click behaviors in this
We also note that when the dimensions of the context dataset. Note that a very similar simulation method is
and item spaces are small, the algorithm in [29] has a lower also considered in [23] to run the online LinUCB algo-
time complexity than our algorithm (OðT 1þ2 þ IT Þ < rithm on this offline dataset. In our numerical results we
OðT 2 ln T þ KE T ÞÞ (when  < 0:5Þ .6 But when the dimen- consider T ¼ 70000 user arrivals.
sions of the context and item spaces are big, our algorithms
achieve a lower time complexity (OðT 2 ln T þ KE T Þ < 5.2 Comparison with Existing Context-Free
OðT 1þ2 þ IT ÞÞ (when  > 0:5). Recommendation Algorithms
To show the importance of contexts on the payoffs, we com-
5. At the beginning of the online learning periods, the algorithm pare the performances of our proposed ACR algorithm and
needs to explore more, hence incurring higher regret. To tackle this, we our previously proposed OLCR algorithm [30] with those
can use a prior information of payoffs for each item in different context-free algorithms. The performances of the algorithms
contexts.
6. In [29], T  is the number of balls to be selected,  2 ð0; 1Þ and 
increases as the dimension of the context and item spaces increase. 7. http://labs.yahoo.com/Academic_Relations
SONG ET AL.: ONLINE LEARNING IN LARGE-SCALE CONTEXTUAL RECOMMENDER SYSTEMS 441

TABLE 3
Comparison with Context-Free Algorithms

Percentage of user arrivals (K ¼ 50) Percentage of user arrivals (K ¼ 150)


20% 40% 60% 80% 100% 20% 40% 60% 80% 100%
CTR Random 0.026 0.028 0.027 0.026 0.026 0.026 0.028 0.027 0.026 0.026
UCB1 0.028 0.032 0.030 0.029 0.029 0.029 0.030 0.029 0.028 0.029
"-greedy 0.030 0.029 0.029 0.029 0.029 0.032 0.031 0.031 0.031 0.031
OLCR 0.028 0.033 0.037 0.037 0.038 0.025 0.028 0.031 0.034 0.034
ACR 0.039 0.044 0.046 0.049 0.049 0.039 0.044 0.046 0.049 0.049
Gain OLCR over Random 8% 18% 37% 42% 46% -4% 0% 15% 31% 31%
OLCR over UCB1 0% 3% 23% 28% 31% -14% -7% 7% 21% 17%
OLCR over "-greedy -7% 14% 28% 28% 31% -22% -10% 0% 10% 10%
ACR over Random 50% 57% 70% 88% 88% 50% 57% 70% 88% 88%
ACR over UCB1 39% 38% 53% 69% 69% 34% 47% 59% 75% 69%
ACR over "-greedy 30% 52% 59% 69% 69% 22% 42% 48% 58% 58%

are evaluated through CTR up to time t. We compare 5.3 Comparison with Existing Contextual
against the following three context-free algorithms. Recommendation Algorithms
In this section, we compare the CTRs of the OLCR and ACR
 Random: The algorithm randomly selects an item algorithms with those of the existing contextual recommen-
each time. This can be seen as the benchmark for dation algorithms. We compare against the following
other algorithms. algorithms.
 UCB1 [15]: This is a classical MAB algorithm, which
performs well in the case of learning the best item  CACF [21]: The algorithm exploits the information of
without exploiting the context information. historical selections of item clusters to predict the
 "-greedy [15], [17]: This is another widely-used algo- current payoff given current context arrival. Note
rithm to perform online recommendations. In time that this algorithm only makes exploitations, regard-
period t, item cluster with the highest payoff is less of explorations. Thus, it has a “cold start” prob-
selected with probability 1  "t , and other item clus- lem, and we use the first 5 percent of data instances
ters are randomly selected with probability "t , where to train the algorithm.
"t is in 1=t order, such that the exploration rate is  Hybrid-" [22]: This algorithm combines the contexts
inversely proportional to the number of users. and "-greedy algorithms together, by extracting
In simulations, depending on the number of clusters we information within a small region of the context
consider two scenarios. In the first scenario, we choose space and by running "-greedy algorithms in that
K ¼ 50 item clusters, and in the second scenario, we choose small region.
K ¼ 150 item clusters for all the algorithms except the ACR  LinUCB [23]: This algorithm assumes that the pay-
algorithm, which can adaptively choose the number of item offs of items are linear in contexts. It calculates the
clusters over time. For the ACR algorithm, we do not distin- index of each arm by adding an upper bound term
guish users (i.e., As ¼ 1Þ and choose the parameter b ¼ 0:75. to a linear combination of contexts observed in previ-
The comparison results are given in Table 3. We can see ous periods, and then recommends the arm with the
that both the OLCR algorithm and the ACR algorithm sig- maximum index.
nificantly outperform the context-free algorithms, as  Contextual-zooming [29]: This algorithm combines
expected. The simulation results show that the OLCR algo- the context space and the item space together and
rithm achieves a 10-31 percent performance gain, in terms selects a ball in this space each time based on the
of CTR up to time T , over the UCB1 and "-greedy algo- past context arrivals and item selections.
rithms. The CTR of the ACR algorithm is 58-69 percent In this section we consider two fixed item clusters for
higher than those of the UCB1 and "-greedy algorithms. We algorithms that do not have the item clustering property:
also notice that the UCB1 and "-greedy algorithms learn K ¼ 50 and K ¼ 150 (algorithms except contextual-
faster than the OLCR and ACR algorithms. This is because zooming and ACR). For the ACR algorithm, we do not dis-
the context-free algorithms estimate the CTR of an item by tinguish users (i.e., As ¼ 1) and choose the parameter
averaging the clicks over all users, while the contextual b ¼ 0:75. We can categorize the contextual algorithms
algorithms estimate the CTR of an item based on specific into two classes depending on whether an exploitation-
user contexts. However, the contextual algorithms can exploration tradeoff is considered. The CACF algorithm
make more accurate estimations of the expected CTRs of does not make exploration, and the Hybrid-", LinUCB, con-
items by employing the contextual information when the textual-zooming, OLCR, and ACR algorithms consider both
number of user arrivals is sufficiently large. Hence, the con- the exploitation and the exploration.
text-free algorithms converge faster (less than 20 percent of The simulation results are given in Table 4. We can see
user arrivals) than the contextual algorithms and the contex- that the ACR algorithm significantly outperform the exist-
tual algorithms can achieve a higher CTR than the context- ing contextual algorithms. The simulation results show that
free algorithms in the long run. the CTRs up to time T obtained by the OLCR algorithm are
442 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 9, NO. 3, MAY/JUNE 2016

TABLE 4
Comparison with Contextual Algorithms

Percentage of user arrivals (K ¼ 50) Percentage of user arrivals (K ¼ 150)


20% 40% 60% 80% 100% 20% 40% 60% 80% 100%
CTR CACF 0.028 0.029 0.027 0.029 0.029 0.028 0.029 0.028 0.029 0.029
Hybrid-" 0.034 0.035 0.034 0.033 0.033 0.033 0.034 0.032 0.033 0.033
LinUCB 0.035 0.039 0.040 0.041 0.041 0.032 0.037 0.039 0.039 0.039
Contextual-zooming 0.037 0.039 0.038 0.038 0.038 0.037 0.039 0.038 0.038 0.038
OLCR 0.028 0.033 0.037 0.037 0.038 0.025 0.028 0.031 0.032 0.032
ACR 0.039 0.044 0.046 0.049 0.049 0.039 0.044 0.046 0.049 0.049
Gain OLCR over CACF 0% 14% 37% 28% 31% -11% -3% 11% 10% 10%
OLCR over Hybrid-" -18% -6% 9% 12% 15% -24% -18% -3% -3% -3%
OLCR over LinUCB -20% -15% -8% -10% -7% -22% -24% -21% -18% -18%
OLCR over contextual-zooming -24% -15% -3% -3% 0% -32% -28% -18% -16% -16%
ACR over CACF 39% 52% 70% 69% 69% 39% 52% 64% 69% 69%
ACR over Hybrid-" 15% 26% 35% 48% 48% 18% 29% 44% 48% 48%
ACR over LinUCB 11% 13% 15% 20% 20% 22% 19% 18% 26% 26%
ACR over contextual-zooming 5% 13% 21% 29% 29% 5% 13% 21% 29% 29%

31 and 10 percent higher than those obtained by the CACF with parameter b ¼ 0:75, and compare the average per-
algorithm, are 15 percent higher and 3 percent lower than period CTR of two user types.
those obtained by the Hybrid-" algorithm, are 7 and 18 per- We show the average per-period CTR (normalized with
cent lower than those obtained by the LinUCB algorithm, respect to the anonymous users) of two user types in Fig. 5. We
and are the same and 16 percent lower than those obtained can see that the average per-period CTR of registered users is
by the contextual-zooming algorithm, in the scenarios of around 4 to 11 percent higher than that of the anonymous
K ¼ 50 and K ¼ 150, respectively. The CTRs up to time T users when the ratio of exploration-exploitation trade-off
obtained by the ACR algorithm are 69 and 69 percent higher factor A2 =A1 (registered user/anonymous user) varies from
than those obtained by the CACF algorithm, are 48 and 0.2 to 0.8 in the scenarios of 10 and 30 percent registered users.
48 percent higher than those obtained by the Hybrid-" algo- We can also see that as the ratio of exploration-exploitation
rithm, are 20 and 26 percent higher than those obtained by trade-off factor increases, the CTR gain of the registered users
the LinUCB algorithm, and are 29 and 29 percent higher over anonymous users decreases 5-6 percent. In fact, as the
than those obtained by the contextual-zooming algorithm. ratio of exploration-exploitation trade-off factor increases, the
In fact, our ACR algorithm aggregates information in recommendation strategy for both types of users become simi-
an adaptive ball of the context space each time, but the lar to each other.
Hybrid-" algorithm considers the fixed small region, which
causes a low efficient learning. The LinUCB assumes the lin- 5.5 Scalability of Algorithms
ear payoff function, which can result in the inaccurate esti- In practice, the number of items is large, which requires that
mation of the CTR, while our proposed algorithms do not the algorithms have the ability to scale up, i.e., learning
make such an assumption. The performance of the contex- speed and total CTR should not degrade as the number of
tual-zooming algorithm is affected by the Lipschitz constant items increases. In order to show this, we increase the num-
included in the algorithm, which may not fit the real data ber of items in the system by duplicating items to form the
very well. The CACF algorithm learns faster (converges in large item set. Since our proposed algorithms and the con-
less than 20 percent user arrivals) than our proposed ACR textual-zooming algorithm make recommendations on clus-
and OLCR algorithms, but the accuracy of learning is low ter level, the increase in items in the system only affects the
due to the fact that the CACF algorithm only makes exploi- number of items in a cluster, and does not affect the per-
tation, and our algorithms makes explorations that cause a formances of the algorithms. For the CACF, Hybrid-", and
lower speed of learning in the short run but a higher accu- LinUCB algorithms, when they do not cluster the items and
racy of recommendation in the long run. Note that the pro-
posed ACR algorithm outperforms the OLCR algorithm,
because the ACR algorithm exploits the item-cluster tree to
adaptively cluster items, but the OLCR algorithm does not
use this tree structure but has fixed clusters.

5.4 Differential Services


In this section, we show how the recommender system pro-
vides differential services according to users’ types. We
assume that the users in the system are divided into two
types: registered users and anonymous users. For the regis-
tered users, we aim to provide a higher quality recommen- Fig. 5. Normalized per-period CTR comparison between registered
dation. In the simulation, we choose the ACR algorithm users and anonymous users, with percentage p of registered users.
SONG ET AL.: ONLINE LEARNING IN LARGE-SCALE CONTEXTUAL RECOMMENDER SYSTEMS 443

Fig. 8. The impact of a on the LinUCB algorithm.

Fig. 6. Comparison of scalable performances of algorithms. the Lipschitz constant to run the algorithm. So we also show
the effect of parameters on the performances of the LinUCB
learn the CTR separately for each item, there is some perfor- and contextual-zooming algorithms.
mance losses when the number of items grows. Simulation results are shown in Figs. 7, 8, and 9. In Fig. 7,
The simulation results are shown in Fig. 6. We use a scale we can see that for the ACR algorithm, the CTR difference
factor s to denote the scalability of the system (i.e., number is less than 5 percent when the base b changes from 0.50 to
of items is sK, where K is the number of items without 0.95. This is because the base b controls the trade-off of the
being scaled up). We calculate the learning gain of the eval- number of clusters (learning speed) and the cluster size
uated algorithm over the random algorithm (i.e., (accuracy). When b is large, learning is fast but less accurate;
ðCTRp;sK  CTRrandom Þ=CTRrandom , where CTRp;sK denotes when b is small, learning is slow but more accurate. Hence,
the CTR of the algorithm p with a scale factor sÞ, and nor- an optimal b exists to balance the speed of learning and the
malize it with respect to that of the ACR algorithm as a per- accuracy. For the LinUCB algorithm and the contextual-
formance measure to evaluate the scalability of the zooming algorithm, since the input parameters have a range
algorithm. We can see that when the number of items from 0 to infinity, we test several possible values to evaluate
becomes 100 times of current number, the performance loss the robustness of the algorithm. We can see from Fig. 8 that
in terms of the normalized learning gain against the random the LinUCB algorithm has more than 17 percent perfor-
algorithm is 63, 61, and 67 percent for the CACF, Hybrid-" mance loss, in terms of CTR, compared to its maximum cor-
and LinUCB algorithms, respectively. The long-term CTRs responding to the optimal value of a. We can also see from
of the contextual-zooming algorithm and our proposed Fig. 9 that the contextual-zooming algorithm has more than
OLCR and ACR algorithms do not change, as expected, 18 percent performance loss, in terms of CTR, when the Lip-
implying that our algorithms are scalable. schitz constant is not estimated correctly. Moreover, since
the range of input parameter a and Lipschitz constant is
from 0 to infinity, it is difficult to find the optimal value of
5.6 Robustness of Algorithms parameters without training data. Hence, we can see that
In this section, we evaluate the robustness of algorithms the LinUCB algorithm and the contextual-zooming algo-
when input parameters vary. Note that, to construct the rithm are less robust to the input parameters than the ACR
item-cluster tree in our proposed approach, the theoretical algorithm.
optimal value of b (b 2 ð0:5; 1ÞÞ given in Section 4.3 (i.e.,
b ¼ 21=ðdI þdC þ2Þ ¼ 0:94) may be conservatively large, and so 5.7 User Arrival Dependent Performance Variation
optimizing this parameter may result in higher total pay- In this section, we evaluate the robustness of our ACR algo-
offs. In particular, we compare the CTR of the ACR algo- rithm to different types of user arrivals. We randomly gen-
rithm when the base b changes. In the existing works, the erate 10 sequences of user arrivals, and evaluate the
LinUCB algorithm [23] needs an input parameter a > 0 (a is variation in the CTR of the ACR algorithm over these
used to control size of the confidence interval) to run the sequences. To evaluate the variations of CTR in different
algorithm, and the contextual-zooming algorithm [29] needs

Fig. 9. The impact of the Lipschitz constant on the contextual-zooming


Fig. 7. The impact of b on the ACR algorithm. algorithm.
444 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 9, NO. 3, MAY/JUNE 2016

TABLE 5
The Variation of CTRs in Different User Arrival Processes

Cases Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7 Case 8 Case 9 Case 10
Percentage of CTR variation 4.4% -3.6% -4.0% 5.1% -5.5% -0.5% 5.1% -5.2% 4.8% -0.5%
compared to average CTR

cases, we first calculate the average CTR up to time T of the REFERENCES


10 cases and normalize it to 1. We then compare the CTR [1] P. Resnick and H. R. Varian, “Recommender systems,” Commun.
variation of each case with the average CTR. The experi- ACM, vol. 40, no. 3, pp. 56–58, 1997.
mental results are shown in Table 5. For a specific sequence [2] M. Balabanovi and Y. Shoham, “Fab: Content-based, collaborative
of user arrivals, a negative variation means the CTR is lower recommendation,” Commun. ACM, vol. 40, no. 3, pp. 66–72, 1997.
[3] G. Adomavicius and A. Tuzhilin, “Toward the next generation of
than the average CTR, while a positive variation means the recommender systems: A survey of the state-of-the-art and possi-
CTR is higher than the average CTR. We can see from ble extensions,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6,
Table 5 that the variation of CTR up to time T obtained by pp. 734–749, Jun. 2005.
[4] X. Chen, Z. Zheng, X. Liu, Z. Huang, and H. Sun, “Personalized
our proposed algorithm is less than 5.5 percent. QoS-aware web service recommendation and visualization,” IEEE
Trans. Service Comput., vol. 6, no. 1, pp. 35–47, First Quarter 2013.
[5] Z. Zheng, H. Ma, M. R. Lyu, and I. King, “QoS-aware Web service
6 CONCLUSIONS AND FUTURE WORKS recommendation by collaborative filtering,” IEEE Trans. Trans.
Services Comput., vol. 4, no. 2, pp. 140–152, Apr.–Jun. 2011.
In this paper, we propose a contextual MAB based cluster- [6] J. Liu, P. Dolan, and E. R. Pedersen, “Personalized news recom-
ing approach to design and deploy recommender systems mendation based on click behavior,” in Proc. 15th Int. Conf. Intell.
for a large number of users and items, while taking into con- User Interfaces, 2010, pp. 31–40.
[7] X. Su and T. M. Khoshgoftaar, “A survey of collaborative filtering
sideration the context in which the recommendation is
techniques,” Adv. Artif. Intell., vol. 2009, p. 421425, 2009.
made. Our proposed ACR algorithm makes use of an adap- [8] H. Kim, J. K. Kim, and Y. U. Ryu, “Personalized recommendation
tive item clustering method to improve the learning speed. over a customer network for ubiquitous shopping,” IEEE Trans.
More importantly, our algorithm can address the cold start Services Comput., vol. 2, no. 2, pp. 140–151, Apr.–Jun. 2009.
[9] M. J. Pazzani and D. Billsus, “Content-based recommendation
problem and provide differential services to users of differ- systems,” in The Adaptive Web, vol. 4321. Berlin, Germany:
ent types. Theoretical results show that the algorithm can Springer, 2007, pp. 325–341.
achieve a sublinear regret, while our simulations show that [10] R. Burke, “Hybrid recommender systems: Survey and
our algorithm outperforms the state-of-the-art algorithms experiments,” User Model. User-Adapted Interaction, vol. 12, no. 4,
pp. 331–370, 2002.
by 20 percent in terms of average CTRs. [11] K. Yoshii, M. Goto, K. Komatani, T. Ogata, and H. G. Okuno, “An
Understanding the utility and impact of the proposed efficient hybrid music recommender system using an incremen-
approach in the current practice of Web services and recom- tally trainable probabilistic generative model,” IEEE Trans. Audio,
Speech, Lang. Process., vol. 16, no. 2, pp. 435–447, Feb. 2008.
mendations requires further study. Nevertheless, our online [12] H. Liu, K. Liu, and Q. Zhao, “Logarithmic weak regret of non-
learning algorithm for recommendation systems is general Bayesian restless multi-armed bandit,” in Proc. IEEE Int. Conf.
and can potentially be used in numerous other application Acoust., Speech Signal Process., 2011, pp. 1968–1971.
domains. However, each of these domains has its own [13] W. Dai, Y. Gai, B. Krishnamachari, and Q. Zhao, “The non-
Bayesian restless multi-armed bandit: A case of near-logarithmic
unique set of contextual information, user characteristics, regret,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2011,
recommendation payoffs and associated challenges. For pp. 2940–2943.
example, users’ demographics, environmental variables [14] K. Wang and L. Chen, “On optimality of myopic policy for restless
(e.g., time, location), devices used (e.g. cellphones, laptops multi-armed bandit problem: An axiomatic approach,” IEEE
Trans. Signal Process., vol. 60, no. 1, pp. 300–309, Jan. 2012.
etc.) can be potential contexts, while the recommendation [15] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of
payoff can be the rating(s) given by users to the recommen- the multi-armed bandit problem,” Mach. Learn., vol. 47, pp. 235–
dation they receive. Hence, this will require adaptation of 256, 2002.
[16] Y. Deshpande and A. Montanari, “Linear bandits in high dimen-
the proposed algorithm to the specific domain in which the sion and recommendation systems,” in Proc. 50th Annu. Allerton
recommendation system needs to operate. Conf. Commun., Control, Comput., 2012, pp. 1750–1754.
While general and applicable to many domains, our [17] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games.
approach has also several limitations. We mention here a Cambridge, U.K.: Cambridge Univ. Press, 2006.
[18] G. Adomavicius and A. Tuzhilin, “Context-aware recommender
few. First, we do not consider specific challenges associated systems,” in Recommender Systems Handbook. New Yrok, NY, USA:
with implementing the proposed system in a distributed Springer, 2011, pp. 217–253.
manner, as several Web services and applications may [19] G. Adomavicius, R. Sankaranarayanan, S. Sen, and A. Tuzhilin,
“Incorporating contextual information in recommender systems
require. However, this is essential and future work will using a multidimensional approach,” ACM Trans. Inf. Syst.,
need to consider such distributed solutions for implement- vol. 23, no. 1, pp. 103–145, 2005.
ing recommender systems. Second, recommender systems [20] A. Said, S. Berkovsky, E. W. De Luca, and J. Hermanns,
are increasingly co-evolving together with social networks: “Challenge on context-aware movie recommendation:
CAMRa2011,” in Proc. 5th ACM Conf. Recommender Syst., 2011,
users often recommend items they liked or purchased to pp. 385–386.
others in their social network. Hence, understanding how [21] A. Chen, “Context-aware collaborative filtering system: Predicting
such recommendation systems and social networks co- the user’s preference in the ubiquitous computing environment,”
evolve and interact with each other forms another interest- in Location-and Context-Awareness. Berlin, Germany: Springer,
2005, pp. 244–253.
ing, yet challenging future work.
SONG ET AL.: ONLINE LEARNING IN LARGE-SCALE CONTEXTUAL RECOMMENDER SYSTEMS 445

[22] D. Bouneffouf, A. Bouzeghoub, and A. L. Gancarski, “Hybrid- Cem Tekin received the BSc degree in electrical
"-greedy for mobile context-aware recommender system,” in Proc. and electronics engineering from the Middle East
16th Adv. Knowl. Discovery Data Mining, 2012, pp. 468–479. Technical University, Ankara, Turkey, in 2008, the
[23] L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual- MSE degree in electrical engineering: systems,
bandit approach to personalized news article recommendation,” the MS degree in mathematics, the PhD degree
in Proc. 19th Int. Conf. World Wide Web, 2010, pp. 661–670. in electrical engineering: systems from the Uni-
[24] W. Chu, L. Li, L. Reyzin, and R. E. Schapire, “Contextual bandits versity of Michigan, Ann Arbor, in 2010, 2011,
with linear payoff functions,” in Proc. Int. Conf. Artif. Intell. Statist., and 2013, respectively. He is a postdoctoral
2011, pp. 208–214. scholar at the University of California, Los
[25] T. Lu, D. Pal, and M. Pal, “Contextual multi-armed bandits,” in Angeles. His research interests include machine
Proc. 13th Int. Conf. Artif. Intell. Statist., 2010, pp. 485–492. learning, multi-armed bandit problems, data min-
[26] G. Shani, D. Heckerman, and R. I. Brafman, “An MDP-based rec- ing, cognitive radio, and networks. He received the University of
ommender system,” in Proc. 18th Conf. Uncertainty Artif. Intell., Michigan Electrical Engineering Departmental Fellowship in 2008, and
2005, vol. 6, pp. 1265–1295. the Fred W. Ellersick Award for the Best Paper in MILCOM 2009.
[27] S. Pandey, D. Chakrabarti, and D. Agarwal, “Multi-armed bandit
problems with dependent arms,” in Proc. 24th Int. Conf. Mach.
Learn., 2007, pp. 721–728.
[28] R. Kleinberg, A. Slivkins, and E. Upfal, “Multi-armed bandits in Mihaela van der Schaar (F’10) is a Chancellor’s
metric spaces,” in Proc. 40th Annu. ACM Symp. Theory Comput., professor of electrical engineering at the Univer-
2008, pp. 681–690. sity of California, Los Angeles. She is a distin-
[29] A. Slivkins, “Contextual bandits with similarity information” in guished lecturer of the Communications Society
Proc. 24th Annu. Conf. Learn. Theory, 2011, pp. 679–701. for 2011-2012 and the editor in chief of IEEE
[30] L. Song, C. Tekin, and M. van der Schaar, “Clustering based Transactions on Multimedia. She received an
online learning in recommender systems: A bandit approach,” in NSF CAREER Award (2004), the Best Paper
Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2014, pp. 4528– Award from the IEEE Transactions on Circuits
4532. and Systems for Video Technology (2005), the
[31] N. Ailon and M. Charikar, “Fitting tree metrics: Hierarchical clus- Okawa Foundation Award (2006), the IBM Fac-
tering and phylogeny,” in Proc. 46th Annu. IEEE Symp. Found. ulty Award (2005, 2007, 2008), the Most Cited
Comput. Sci., 2005, pp. 73–82. Paper Award from EURASIP: Image Communications Journal (2006),
[32] Z. Ren and B. H. Krogh, “State aggregation in Markov decision the Gamenets Conference Best Paper Award (2011) and the 2011 IEEE
processes,” in Proc. 41st IEEE Conf. Decision Control, 2002, Circuits and Systems Society Darlington Award Best Paper Award. She
pp. 3819–3824. holds 33 granted US patents. She is a fellow of the IEEE. For more infor-
[33] E. Chlebus, “An approximate formula for a partial sum of the mation about her research visit: http://medianetlab.ee.ucla.edu/.
divergent p-series,” Appl. Math. Lett., vol. 22, no. 5, pp. 732–737,
2009.

Linqi Song received the BE and ME degrees in


electrical engineering from Tsinghua University,
Beijing, China, in 2006 and 2009, respectively. He
is currently working toward the PhD degree in the
Electrical Engineering Department at the Univer-
sity of California, Los Angeles. His research inter-
ests include machine learning, optimization, and
communication networks.

You might also like