Contextual Recommender Systems SEO
Contextual Recommender Systems SEO
Abstract—In this paper, we propose a novel large-scale, context-aware recommender system that provides accurate
recommendations, scalability to a large number of diverse users and items, differential services, and does not suffer from “cold start”
problems. Our proposed recommendation system relies on a novel algorithm which learns online the item preferences of users based
on their click behavior, and constructs online item-cluster trees. The recommendations are then made by choosing an item-cluster level
and then selecting an item within that cluster as a recommendation for the user. This approach is able to significantly improve the
learning speed when the number of users and items is large, while still providing high recommendation accuracy. Each time a user
arrives at the website, the system makes a recommendation based on the estimations of item payoffs by exploiting past context arrivals
in a neighborhood of the current user’s context. It exploits the similarity of contexts to learn how to make better recommendations even
when the number and diversity of users and items is large. This also addresses the cold start problem by using the information gained
from similar users and items to make recommendations for new users and items. We theoretically prove that the proposed algorithm for
item recommendations converges to the optimal item recommendations in the long-run. We also bound the probability of making a
suboptimal item recommendation for each user arriving to the system while the system is learning. Experimental results show that our
approach outperforms the state-of-the-art algorithms by over 20 percent in terms of click through rates.
1 INTRODUCTION
TABLE 1
Comparison with Existing Algorithms
recommender systems with large or infinite numbers of continuous exploration addresses the cold start problem
contexts remains a key challenge. since it helps the recommender to learn the payoffs of
In this paper we propose methods for contextualization new items and preferences of new users that are not simi-
which helps the recommender to learn from its past experi- lar to the previous ones over time.
ences that are relevant to the current user characteristics, to In this paper, we propose a novel, large-scale contextual
recommend an item to the current user. recommender system which is able to address all of the
(b) Differential services are important for online services, above mentioned challenges. Our system is based on a
including recommender systems. Existing works evaluate novel contextual Multi-Arm Bandit (MAB) approach. When
recommender systems based on their average performance a user with a specific context arrives, the recommender uti-
[22], [23], [29]. However, it may also be desirable for service lizes information from past arrivals within a neighborhood
providers to build customer loyalty by providing registered of the current context (i.e., “similar” to the current context).
users with a higher quality of service (in this case better rec- Based on the payoff estimations through these past arrivals,
ommendations) than the quality provided to anonymous the recommendation is made at an item-cluster level,
users using the system. Hence, designing recommender sys- instead of an item level. Our approach controls the size of
tems which have the ability to provide differential services the context neighborhood and the set of clusters to obtain
to various users represents a key challenge. sufficiently accurate and efficient payoff estimations. To
(c) Scalability is a key challenge faced by a recom- learn about new items and contexts, our algorithms use a
mender system with a large number of items and/or con- differential method for exploration: the system explores
texts. If not designed properly, the learning speed and more often when the users have a lower priority, at the risk
the implementation complexity of such systems can suffer of providing lower quality recommendations, and explore
significantly [3], [7], [10]. Hence, the recommender system less in order to provide consistently high quality recommen-
needs to easily scale to accommodate a large number of dations when the users have a higher priority. After each
items and contexts. recommendation, the realized payoff is received by the rec-
Another important challenge is cold start, which ommender system. Each item-cluster corresponds to an
appears in practice when the system has a very diverse set action (referred to as the arm in a bandit framework) for our
of items and very limited or no a priori information about learning algorithm, hence learning is performed at the clus-
the items that the users prefer [3], [7]. In this case, recom- ter level, and this solves the scalability problem.
mendation algorithms [2], [6], [9] fail to recommend items The goal of learning is to minimize the regret, which is the
that are most liked by users due to the limited information difference between the optimal expected total payoff up to
about the user’s preference for specific items. This prob- time T , which can be achieved if all information (i.e.,
lem is also often referred to as the “coverage” problem [3], expected payoffs of each item/cluster in a certain context) is
[7]. New items are unlikely to be recommended and new known, and the expected total payoff gained up to time T
users are unlikely to be given good recommendations due through our learning algorithm, in which the information is
to the limited interaction histories of these new items/ incomplete.
users. Therefore, dealing with cold start remains a key The remainder of this paper is organized as follows. In
challenge for large-scale recommender systems. Two Section 2, we discuss the related works and highlight the
important features of our proposed algorithm address the differences from our work. Section 3 describes the system
issue of cold start: contextualization and continuous model of the recommender system. Section 4 introduces the
exploration. By contextualization, our algorithm clusters adaptive clustering based contextual MAB recommendation
the users according to their contexts and items according algorithm. Section 5 provides simulation results. Section 6
to their features such that past information gained from concludes the paper.
users with similar contexts and items with similar features
can be used to solve the cold start problem when a new
user comes or a new item is introduced. Moreover, our 2 RELATED WORKS
algorithm explores continuously, i.e., it does not explore A plethora of prior works exists on recommendation algo-
only initially based on limited data, and then exploit for rithms. We compare our work with existing works in
the rest of the users. It explores and exploits in an alternat- Table 1. We categorize these algorithms based on the follow-
ing fashion, and keeps learning as new users come. This ing characteristics: their consideration of contexts; their
SONG ET AL.: ONLINE LEARNING IN LARGE-SCALE CONTEXTUAL RECOMMENDER SYSTEMS 435
refinement of the context space (if any); their ability to of the context with an unknown weight vector, and the
address the issues of “scalability”, “cold start”, and LinUCB algorithm is proposed to solve news article rec-
“differential services”; and their ability to operate without a ommendation problems. However, the linearity may not
period of training. Two main categories are filtering-based hold in practice for some recommender systems [22],
and reinforcement learning methods [3]. Filtering-based [25], [29]. In a more general setting [25], the context
approaches, such as collaborative filtering [7], [8], content- space is modeled as a bounded metric space in which
based filtering [2], [9] and hybrid approaches [10], [11], the payoff function is assumed to satisfy a Lipschitz con-
employ the historical data of users’ feedback to calculate the dition, instead of being a linear function of contexts. In
future payoffs of users based on some prediction functions. this context space, partition based algorithms are pro-
Collaborative filtering approaches predict users’ preferen- posed to solve the contextual recommendation problems
ces based on the preferences of their “peer” users, who have [25]. These works [18], [22], [25], however, do not take
similar interests on other items observed through historical into account the large item space, which can cause scal-
information [7], [8]. Content-based filtering methods help ability problems in practice.
certain users to find the items that are similar to the past A different strand of related works studies recommender
selected items [2], [9]. Hybrid approaches combine the col- systems that have a large number of items to recommend
laborative filtering and content-based filtering approaches [3], [7], [27], [28]. In [28], the item space is partitioned into
to gather more historical information in the user-item space subspaces, and online learning is performed in each sub-
when a new user or a new item appears in the system [10], space. Alternatively, in [27], the problem is solved by con-
[11]. However, filtering-based approaches are unable to sidering a static clustering of items. These works [27], [28]
solely tackle issues of cold start and scalability. Although can solve the scalability problem by reducing the number of
the hybrid approaches can alleviate the cold start problem choices of each recommendation. However, these works [3],
to some extent due to the consideration of both similar items [7], [27], [28] do not incorporate contexts in the recommen-
and similar users when a new item or user appears, it can- dations. In contrast, we consider the contexts when making
not solve the cold start problem when no similar items/ recommendations.
users of a new item/user exist in the system. Building large-scale contextual recommender systems is
Reinforcement learning methods, such as MAB [23], [24], more challenging. Few related works tackle both problems
[25] and Markov Decision Processes (MDPs) [26], are widely in an integrated manner. The most relevant theoretical
used to derive algorithms for recommending items. MDP- works related to our proposed solution are [25], [29]; how-
based learning approaches model the last k selections of a ever, there are significant differences between our work and
user as the state and the available items as the action set, these existing works. First, the works in [25], [29] are theo-
aiming at maximizing the long-term payoff [26]. However, retical and do not consider the key characteristics and
the state set will grow fast as the number of items increases, requirements of recommender systems. For example, the
thereby resulting in very slow convergence rates. In [32], algorithm proposed in [29] requires the knowledge of
“state aggregation” methods are used to solve this MDP Lipschitz constants, which makes the algorithm data-
problem with large state set, but the incurred regret grows dependent (different datasets may have different Lipschitz
linearly in time. MAB-based approaches [12], [13], [15], [17], constants based on the characteristics of the data) and hence
such as "-greedy [17] and UCB1 [15], provide convergence difficult to implement in practice; the algorithm in [25]
to the optimum, and not only asymptotic convergence, but requires knowing the covering constants of context and
also a bound on the rate of convergence for any time step. item spaces. In contrast, our learning algorithm does not
They do this by balancing exploration and exploitation, need the Lipschitz constant or covering constants to
where exploration involves learning about new items’ pay- operate; these constants are only used to analyze the perfor-
offs for a particular user by recommending new items while mance of the algorithm (i.e., to show the regret). Second, the
exploitation means recommending the best items based on models in [25], [29] consider only two spaces: the context
the observations made so far. space and the item space; while in our work, we consider
To further improve the effectiveness of recommenda- the user space, the context space, and the item space. Hence,
tions, contexts are considered in recent recommender we can provide differential services for different types of
systems [18], [21], [22]. For instance, it is important to users by choosing different trade-offs between exploitation
consider the search queries and the purchase histories and exploration. Third, in [29], combining the items and
(which can be considered as the contexts) of users when contexts greatly increases the space dimension, resulting in
making news article recommendations. Moreover, in a highly-complex online algorithm. In our work, we greatly
many applications, such as news recommendations, the reduce the dimension of the spaces by separately consider-
recommender system only observes the features of anon- ing the context and the item space. We build the item-
ymous users, without knowing the exact users. In this cluster tree offline, and only perform online calculations in
case, the users’ features (cookies) can be also considered the context space, resulting in lower-complexity online
as contexts. The context-aware collaborative-filtering learning compared with that in [29].
(CACF) and the contextual MAB methods have been In existing works [25], [27], [28], [29], the algorithms
studied to utilize such information [21], [23], [25], [29]. operate by fixing a partition of the context space. In con-
The CACF algorithm extends the collaborative filtering trast, our approach selects a dynamic subspace in the con-
method to a contextual setting by considering different text space each time to ensure a sufficiently accurate
weights for different contexts [21]. In [23], [24], the pay- estimation of the expected payoffs with respect to current
off of a recommendation is modeled as a linear function context, with a precise control of the size of the context
436 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 9, NO. 3, MAY/JUNE 2016
subspace. Moreover, our approach can provide differential news recommender system, an item is a news item, and
services (different instantaneous performance) to ensure the similarity between two news items depends on their
the satisfaction of higher type users, which are not consid- features: whether they are in the same category (e.g.,
ered before in existing works [4], [5], [6], [8], [9], [21], [23], Local News), whether they have the same keywords
[27], [28], [29]. A detailed comparison with the existing (e.g., Fire), etc.
works is presented in Table 1. Since the number of items in the system is large, we clus-
ter them based on the metric sI in the item space in order to
reduce the possible options for recommendation. We use an
3 RECOMMENDER SYSTEM MODEL
item-cluster tree to represent the hierarchical relationship
We consider the contextual recommender system shown in among item clusters, and an example is shown in Fig. 2. In
Fig. 1, which operates in discrete time slots t ¼ 1; 2; . . .. At an item-cluster tree, each leaf represents an item and each
each time slot, the system operates as follows: (1) a user node represents an item cluster. The depth of a node is
with a specific context arrives; (2) the recommender system defined as the length of the path to the root node, and the
observes the context, and recommends to the user an item depth of the tree is defined as the maximum depth of all
from the set of items, based on the current context as well as nodes in the tree. We denote the depth of the tree by L. We
the historical information about users, contexts, items, and define the collection of nodes at depth 0 l L of the tree
payoffs of former recommendations; and (3) the payoff of as layer l and a node1 at layer l is denoted by Cl;k , where
this recommendation, whose expectation depends on the k 2 f1; 2; . . . ; Kl g and Kl is the number of nodes at depth l.
item and context, is received by the recommender system There is only one node C0;1 at layer 0, i.e., the root node. The
according to the user’s response. I nodes at layer L are the leaves of the tree, each of which is
Users: We formally model the set of users as U, which a single-element cluster that only contains one item. The
can be either a finite set (i.e., U ¼ f1; 2; . . . ; UgÞ or infinite hierarchical structure of the cluster tree implies that for a
set (i.e., U ¼ f1; 2; . . .g, e.g., a news recommender system child node Clþ1;k0 of node Cl;k , we have item i 2 Cl;k , for all
frequented by an infinite number of users). When a user i 2 Clþ1;k0 .
u 2 U arrives, the system can observe the user’s type:
The cluster size sðCl;k Þ is defined as the maximum dis-
important or ordinary users, registered or anonymous
tance of two items in that cluster, namely, sðCl;k Þ ¼
users, etc. We divide the users into S types, and denote
maxi;i0 2Cl;k fsI ði; i0 Þg. In an item-cluster-tree structure, a
the set of type s 2 f1; 2; . . . ; Sg users by U s . Based on the
users’ types, the recommender system can provide differ- general tree metric can be defined as a mapping from
ential services for users. the set of depths of the nodes/clusters in the tree to
Contexts: We model the context set X as a dC dimensional maximum size of clusters, i.e., sT : N ! R. Thus, sT ðlÞ
continuous space, and the context x 2 X is a dC dimensional denotes the maximum cluster size at depth l, namely,
vector. For simplicity of exposition, we assume that the con- sT ðlÞ maxi;i0 2Cl;k ;1kKl fsI ði; i0 Þg for any cluster at depth
text space is X ¼ ½0; 1dC , but our results will hold for any l. For a tree metric, the maximum cluster size at depth l
bounded dC dimensional context space. In the context space, is not smaller than that of a cluster at depth l þ 1, i.e.,
the similarity distance sC is defined as a metric sT ðlÞ sT ðl þ 1Þ. We can see from Fig. 2 that item 6 and
sC : X X ! ½0; 1Þ. A smaller sC ðx; x0 Þ indicates that the item 7 are in the same cluster C2;3 ; item 6 and item 9 are
two contexts x and x0 exhibit higher similarity. In the in the same cluster C1;2 ; and the metric between item 6
Euclidean space X , the metric sC can be the Euclidean norm and item 7 is smaller than that between item 6 and item
or any other norm. Note that we will add the subscript t to 9, namely, sI ð6; 7Þ < sI ð6; 9Þ. In Section 4, we will provide
user u and context x when referring to the user and context an algorithm to construct such item-cluster trees. Using
associated with time period t. the constructed item-cluster tree, we can control the size
Items: We denote the set of items by I ¼ f1; 2; . . . ; Ig. and number of the clusters when the recommendations
In the item space, the similarity distance is defined as a are made.
metric sI : I I ! R, which is based on the features of
the items and known to the recommender system, as in
1. Note that each node of the tree corresponds to an item cluster, so
[10], [22], [29]. A smaller sI ði; i0 Þ indicates that the two we will use the terms “node” and “cluster” and their notations
items i and i0 exhibit higher similarity. For example, in a interchangeably.
SONG ET AL.: ONLINE LEARNING IN LARGE-SCALE CONTEXTUAL RECOMMENDER SYSTEMS 437
The expected payoff of a recommendation depends on to construct an item-cluster tree that fulfils this tree metric,
the context and the item. For a user with context x 2 X , based on an arbitrary given metric among items.2 The con-
the payoff of recommending item i is denoted by a ran- structed item-cluster tree will be used in the recommenda-
dom variable ri;x 2 ½0; 1, whose distribution is fixed but tion algorithm. The Construction of Item-Cluster Tree
unknown to the recommender system. The expectation algorithm is described in Algorithm 1, and an example is
of ri;x is denoted by mi;x . Given the number of items nl;k shown in Fig. 2.
in cluster Cl;k , the expected payoff
P of cluster Cl;k for con-
text x is denoted by ml;k;x ¼ i2Cl;k mi;x =nl;k . Only the Algorithm 1. Construction of Item-Cluster Tree
payoffs of the recommended items can be observed by Input: item set I ¼ f1; 2; Ig, base b, depth L of the tree.
the recommender system and can be used for further Initialization: Set L :¼ minfL; Lb g, where Lb is the maximum l
recommendations. such that bl1 mini;i0 2I ;sI ði;i0 Þ6¼0 sI ði; i0 Þ.
Context-aware collaborative-filtering algorithms con- 1: set node set at layer L as KL :¼ fCL;1 ; CL;2 ; . . . CL;KL g, where
sider that an item has similar expected payoffs in similar CL;k :¼ fkg; 81 k KL , and KL ¼ I.
contexts [18], [19], [21], [22], while contextual bandit algo- 2: set representative point set at layer L as
rithms use a Lipschitz condition to model the similar RP L :¼ fiL;1 ;iL;2 ; iL;KL g, where iL;k :¼ k; 81 k KL .
expected payoffs of an item in two similar contexts [25], 3: set the child sets of nodes at layer L as empty sets:
[29]. As in [25], [29], we formalize this in terms of a Lipschitz ChdðCL;k Þ :¼ ;; 81 k KL .
condition as follows. 4: for l ¼ ðL-1):0 do
5: set k :¼ 0. //counter for the number of clusters at layer l
Assumption 1 (Lipschitz condition for contexts). Given two 6: while RP lþ1 6¼ ; do
contexts x; x0 2 X , we have the following assumption: 7: set k :¼ k þ 1.
jmi;x mi;x0 j LC sC ðx; x0 Þ for any item i, where LC is the 8: arbitrary select an element ilþ1;k^ 2 RP lþ1 .
Lipschitz constant in the context space. 9: find the subset SK of Klþ1 such that
Similarly to [28], [29], we model the similar payoffs for SK :¼ fClþ1;k0 : 1 k0 Klþ1 ; ilþ1;k0 2 RP lþ1 ;
l
similar items in the same context using a Lipschitz condi- sI ðilþ1;k^; ilþ1;k0 Þ ð1bÞb
1þb g:
tion, described in Assumption 2. 10: set Cl;k :¼ [Clþ1;k0 2SK Clþ1;k0 .
Assumption 2 (Lipschitz condition for items). Given two 11: set ChdðCl;k Þ :¼ SK.
items i; i0 2 I , we have the following assumption: 12: set PrtðClþ1;k0 Þ :¼ Cl;k ; 8Clþ1;k0 2 SK.
13: set il;k :¼ ilþ1;k^.
jmi;x mi0 ;x j LI sI ði; i0 Þ for any context x 2 X , where LI is
14: remove representative points of selected clusters from
the Lipschitz constant in the item space. RP lþ1 :
Based on Assumption 2, in an item-cluster tree with tree RP lþ1 :¼ RP lþ1 nfilþ1;k0 : Clþ1;k0 2 SKg:
metric sT ðÞ, we have jmi;x mi0 ;x j LI sT ðlÞ for any x 2 X 15: end while
and i; i0 2 Cl;k . Note that the Lipschitz constants LC and LI 16: set Kl :¼ k.
are not required to be known by our recommendation algo- 17: end for
rithm. They will only be used in quantifying its performance. 23: set PrtðC0;1 Þ :¼ ;.
bl1 mini;i0 2I;sI ði;i0 Þ6¼0 sI ði; i0 Þ, and (2) the maximum number
of clusters at layer l of this tree is bounded by cI ð1þb dI 1 ldI
1bÞ ð b Þ ,
where cI and dI are the covering constant and covering dimen-
sion for the item space.3
Proof. See Appendix B, which can be found on the Com-
puter Society Digital Library at http://doi.
ieeecomputersociety.org/10.1109/TSC.2014.2365795. tu Fig. 3. Context arrivals and refinement in the context space.
3. See Appendix A, available in the online supplemental material, Each time, a user with context xt arrives, and the algo-
for the definitions of covering dimension and covering constant. rithm chooses a relevant ball Bðxt ; rt Þ in the context space,
SONG ET AL.: ONLINE LEARNING IN LARGE-SCALE CONTEXTUAL RECOMMENDER SYSTEMS 439
TABLE 3
Comparison with Context-Free Algorithms
are evaluated through CTR up to time t. We compare 5.3 Comparison with Existing Contextual
against the following three context-free algorithms. Recommendation Algorithms
In this section, we compare the CTRs of the OLCR and ACR
Random: The algorithm randomly selects an item algorithms with those of the existing contextual recommen-
each time. This can be seen as the benchmark for dation algorithms. We compare against the following
other algorithms. algorithms.
UCB1 [15]: This is a classical MAB algorithm, which
performs well in the case of learning the best item CACF [21]: The algorithm exploits the information of
without exploiting the context information. historical selections of item clusters to predict the
"-greedy [15], [17]: This is another widely-used algo- current payoff given current context arrival. Note
rithm to perform online recommendations. In time that this algorithm only makes exploitations, regard-
period t, item cluster with the highest payoff is less of explorations. Thus, it has a “cold start” prob-
selected with probability 1 "t , and other item clus- lem, and we use the first 5 percent of data instances
ters are randomly selected with probability "t , where to train the algorithm.
"t is in 1=t order, such that the exploration rate is Hybrid-" [22]: This algorithm combines the contexts
inversely proportional to the number of users. and "-greedy algorithms together, by extracting
In simulations, depending on the number of clusters we information within a small region of the context
consider two scenarios. In the first scenario, we choose space and by running "-greedy algorithms in that
K ¼ 50 item clusters, and in the second scenario, we choose small region.
K ¼ 150 item clusters for all the algorithms except the ACR LinUCB [23]: This algorithm assumes that the pay-
algorithm, which can adaptively choose the number of item offs of items are linear in contexts. It calculates the
clusters over time. For the ACR algorithm, we do not distin- index of each arm by adding an upper bound term
guish users (i.e., As ¼ 1Þ and choose the parameter b ¼ 0:75. to a linear combination of contexts observed in previ-
The comparison results are given in Table 3. We can see ous periods, and then recommends the arm with the
that both the OLCR algorithm and the ACR algorithm sig- maximum index.
nificantly outperform the context-free algorithms, as Contextual-zooming [29]: This algorithm combines
expected. The simulation results show that the OLCR algo- the context space and the item space together and
rithm achieves a 10-31 percent performance gain, in terms selects a ball in this space each time based on the
of CTR up to time T , over the UCB1 and "-greedy algo- past context arrivals and item selections.
rithms. The CTR of the ACR algorithm is 58-69 percent In this section we consider two fixed item clusters for
higher than those of the UCB1 and "-greedy algorithms. We algorithms that do not have the item clustering property:
also notice that the UCB1 and "-greedy algorithms learn K ¼ 50 and K ¼ 150 (algorithms except contextual-
faster than the OLCR and ACR algorithms. This is because zooming and ACR). For the ACR algorithm, we do not dis-
the context-free algorithms estimate the CTR of an item by tinguish users (i.e., As ¼ 1) and choose the parameter
averaging the clicks over all users, while the contextual b ¼ 0:75. We can categorize the contextual algorithms
algorithms estimate the CTR of an item based on specific into two classes depending on whether an exploitation-
user contexts. However, the contextual algorithms can exploration tradeoff is considered. The CACF algorithm
make more accurate estimations of the expected CTRs of does not make exploration, and the Hybrid-", LinUCB, con-
items by employing the contextual information when the textual-zooming, OLCR, and ACR algorithms consider both
number of user arrivals is sufficiently large. Hence, the con- the exploitation and the exploration.
text-free algorithms converge faster (less than 20 percent of The simulation results are given in Table 4. We can see
user arrivals) than the contextual algorithms and the contex- that the ACR algorithm significantly outperform the exist-
tual algorithms can achieve a higher CTR than the context- ing contextual algorithms. The simulation results show that
free algorithms in the long run. the CTRs up to time T obtained by the OLCR algorithm are
442 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 9, NO. 3, MAY/JUNE 2016
TABLE 4
Comparison with Contextual Algorithms
31 and 10 percent higher than those obtained by the CACF with parameter b ¼ 0:75, and compare the average per-
algorithm, are 15 percent higher and 3 percent lower than period CTR of two user types.
those obtained by the Hybrid-" algorithm, are 7 and 18 per- We show the average per-period CTR (normalized with
cent lower than those obtained by the LinUCB algorithm, respect to the anonymous users) of two user types in Fig. 5. We
and are the same and 16 percent lower than those obtained can see that the average per-period CTR of registered users is
by the contextual-zooming algorithm, in the scenarios of around 4 to 11 percent higher than that of the anonymous
K ¼ 50 and K ¼ 150, respectively. The CTRs up to time T users when the ratio of exploration-exploitation trade-off
obtained by the ACR algorithm are 69 and 69 percent higher factor A2 =A1 (registered user/anonymous user) varies from
than those obtained by the CACF algorithm, are 48 and 0.2 to 0.8 in the scenarios of 10 and 30 percent registered users.
48 percent higher than those obtained by the Hybrid-" algo- We can also see that as the ratio of exploration-exploitation
rithm, are 20 and 26 percent higher than those obtained by trade-off factor increases, the CTR gain of the registered users
the LinUCB algorithm, and are 29 and 29 percent higher over anonymous users decreases 5-6 percent. In fact, as the
than those obtained by the contextual-zooming algorithm. ratio of exploration-exploitation trade-off factor increases, the
In fact, our ACR algorithm aggregates information in recommendation strategy for both types of users become simi-
an adaptive ball of the context space each time, but the lar to each other.
Hybrid-" algorithm considers the fixed small region, which
causes a low efficient learning. The LinUCB assumes the lin- 5.5 Scalability of Algorithms
ear payoff function, which can result in the inaccurate esti- In practice, the number of items is large, which requires that
mation of the CTR, while our proposed algorithms do not the algorithms have the ability to scale up, i.e., learning
make such an assumption. The performance of the contex- speed and total CTR should not degrade as the number of
tual-zooming algorithm is affected by the Lipschitz constant items increases. In order to show this, we increase the num-
included in the algorithm, which may not fit the real data ber of items in the system by duplicating items to form the
very well. The CACF algorithm learns faster (converges in large item set. Since our proposed algorithms and the con-
less than 20 percent user arrivals) than our proposed ACR textual-zooming algorithm make recommendations on clus-
and OLCR algorithms, but the accuracy of learning is low ter level, the increase in items in the system only affects the
due to the fact that the CACF algorithm only makes exploi- number of items in a cluster, and does not affect the per-
tation, and our algorithms makes explorations that cause a formances of the algorithms. For the CACF, Hybrid-", and
lower speed of learning in the short run but a higher accu- LinUCB algorithms, when they do not cluster the items and
racy of recommendation in the long run. Note that the pro-
posed ACR algorithm outperforms the OLCR algorithm,
because the ACR algorithm exploits the item-cluster tree to
adaptively cluster items, but the OLCR algorithm does not
use this tree structure but has fixed clusters.
Fig. 6. Comparison of scalable performances of algorithms. the Lipschitz constant to run the algorithm. So we also show
the effect of parameters on the performances of the LinUCB
learn the CTR separately for each item, there is some perfor- and contextual-zooming algorithms.
mance losses when the number of items grows. Simulation results are shown in Figs. 7, 8, and 9. In Fig. 7,
The simulation results are shown in Fig. 6. We use a scale we can see that for the ACR algorithm, the CTR difference
factor s to denote the scalability of the system (i.e., number is less than 5 percent when the base b changes from 0.50 to
of items is sK, where K is the number of items without 0.95. This is because the base b controls the trade-off of the
being scaled up). We calculate the learning gain of the eval- number of clusters (learning speed) and the cluster size
uated algorithm over the random algorithm (i.e., (accuracy). When b is large, learning is fast but less accurate;
ðCTRp;sK CTRrandom Þ=CTRrandom , where CTRp;sK denotes when b is small, learning is slow but more accurate. Hence,
the CTR of the algorithm p with a scale factor sÞ, and nor- an optimal b exists to balance the speed of learning and the
malize it with respect to that of the ACR algorithm as a per- accuracy. For the LinUCB algorithm and the contextual-
formance measure to evaluate the scalability of the zooming algorithm, since the input parameters have a range
algorithm. We can see that when the number of items from 0 to infinity, we test several possible values to evaluate
becomes 100 times of current number, the performance loss the robustness of the algorithm. We can see from Fig. 8 that
in terms of the normalized learning gain against the random the LinUCB algorithm has more than 17 percent perfor-
algorithm is 63, 61, and 67 percent for the CACF, Hybrid-" mance loss, in terms of CTR, compared to its maximum cor-
and LinUCB algorithms, respectively. The long-term CTRs responding to the optimal value of a. We can also see from
of the contextual-zooming algorithm and our proposed Fig. 9 that the contextual-zooming algorithm has more than
OLCR and ACR algorithms do not change, as expected, 18 percent performance loss, in terms of CTR, when the Lip-
implying that our algorithms are scalable. schitz constant is not estimated correctly. Moreover, since
the range of input parameter a and Lipschitz constant is
from 0 to infinity, it is difficult to find the optimal value of
5.6 Robustness of Algorithms parameters without training data. Hence, we can see that
In this section, we evaluate the robustness of algorithms the LinUCB algorithm and the contextual-zooming algo-
when input parameters vary. Note that, to construct the rithm are less robust to the input parameters than the ACR
item-cluster tree in our proposed approach, the theoretical algorithm.
optimal value of b (b 2 ð0:5; 1ÞÞ given in Section 4.3 (i.e.,
b ¼ 21=ðdI þdC þ2Þ ¼ 0:94) may be conservatively large, and so 5.7 User Arrival Dependent Performance Variation
optimizing this parameter may result in higher total pay- In this section, we evaluate the robustness of our ACR algo-
offs. In particular, we compare the CTR of the ACR algo- rithm to different types of user arrivals. We randomly gen-
rithm when the base b changes. In the existing works, the erate 10 sequences of user arrivals, and evaluate the
LinUCB algorithm [23] needs an input parameter a > 0 (a is variation in the CTR of the ACR algorithm over these
used to control size of the confidence interval) to run the sequences. To evaluate the variations of CTR in different
algorithm, and the contextual-zooming algorithm [29] needs
TABLE 5
The Variation of CTRs in Different User Arrival Processes
Cases Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7 Case 8 Case 9 Case 10
Percentage of CTR variation 4.4% -3.6% -4.0% 5.1% -5.5% -0.5% 5.1% -5.2% 4.8% -0.5%
compared to average CTR
[22] D. Bouneffouf, A. Bouzeghoub, and A. L. Gancarski, “Hybrid- Cem Tekin received the BSc degree in electrical
"-greedy for mobile context-aware recommender system,” in Proc. and electronics engineering from the Middle East
16th Adv. Knowl. Discovery Data Mining, 2012, pp. 468–479. Technical University, Ankara, Turkey, in 2008, the
[23] L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual- MSE degree in electrical engineering: systems,
bandit approach to personalized news article recommendation,” the MS degree in mathematics, the PhD degree
in Proc. 19th Int. Conf. World Wide Web, 2010, pp. 661–670. in electrical engineering: systems from the Uni-
[24] W. Chu, L. Li, L. Reyzin, and R. E. Schapire, “Contextual bandits versity of Michigan, Ann Arbor, in 2010, 2011,
with linear payoff functions,” in Proc. Int. Conf. Artif. Intell. Statist., and 2013, respectively. He is a postdoctoral
2011, pp. 208–214. scholar at the University of California, Los
[25] T. Lu, D. Pal, and M. Pal, “Contextual multi-armed bandits,” in Angeles. His research interests include machine
Proc. 13th Int. Conf. Artif. Intell. Statist., 2010, pp. 485–492. learning, multi-armed bandit problems, data min-
[26] G. Shani, D. Heckerman, and R. I. Brafman, “An MDP-based rec- ing, cognitive radio, and networks. He received the University of
ommender system,” in Proc. 18th Conf. Uncertainty Artif. Intell., Michigan Electrical Engineering Departmental Fellowship in 2008, and
2005, vol. 6, pp. 1265–1295. the Fred W. Ellersick Award for the Best Paper in MILCOM 2009.
[27] S. Pandey, D. Chakrabarti, and D. Agarwal, “Multi-armed bandit
problems with dependent arms,” in Proc. 24th Int. Conf. Mach.
Learn., 2007, pp. 721–728.
[28] R. Kleinberg, A. Slivkins, and E. Upfal, “Multi-armed bandits in Mihaela van der Schaar (F’10) is a Chancellor’s
metric spaces,” in Proc. 40th Annu. ACM Symp. Theory Comput., professor of electrical engineering at the Univer-
2008, pp. 681–690. sity of California, Los Angeles. She is a distin-
[29] A. Slivkins, “Contextual bandits with similarity information” in guished lecturer of the Communications Society
Proc. 24th Annu. Conf. Learn. Theory, 2011, pp. 679–701. for 2011-2012 and the editor in chief of IEEE
[30] L. Song, C. Tekin, and M. van der Schaar, “Clustering based Transactions on Multimedia. She received an
online learning in recommender systems: A bandit approach,” in NSF CAREER Award (2004), the Best Paper
Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2014, pp. 4528– Award from the IEEE Transactions on Circuits
4532. and Systems for Video Technology (2005), the
[31] N. Ailon and M. Charikar, “Fitting tree metrics: Hierarchical clus- Okawa Foundation Award (2006), the IBM Fac-
tering and phylogeny,” in Proc. 46th Annu. IEEE Symp. Found. ulty Award (2005, 2007, 2008), the Most Cited
Comput. Sci., 2005, pp. 73–82. Paper Award from EURASIP: Image Communications Journal (2006),
[32] Z. Ren and B. H. Krogh, “State aggregation in Markov decision the Gamenets Conference Best Paper Award (2011) and the 2011 IEEE
processes,” in Proc. 41st IEEE Conf. Decision Control, 2002, Circuits and Systems Society Darlington Award Best Paper Award. She
pp. 3819–3824. holds 33 granted US patents. She is a fellow of the IEEE. For more infor-
[33] E. Chlebus, “An approximate formula for a partial sum of the mation about her research visit: http://medianetlab.ee.ucla.edu/.
divergent p-series,” Appl. Math. Lett., vol. 22, no. 5, pp. 732–737,
2009.