Communityrs
Communityrs
ABSTRACT this case, it becomes hard to find similar users to the target user, in
Recommender systems have become de facto tools for suggesting order to generate recommendations based on their ratings.
items that are of potential interest to users. Predicting a user’s rat- The increasing popularity of online social networks offer chances
ing on an item is the fundamental recommendation task. Tradi- to improve the accuracy of rating predictions; as sociologists pos-
tional methods that generate predictions by analyzing the user-item tulate, people tend to relate to people with similar preferences and
rating matrix perform poorly when the matrix is sparse. Recent ap- people that influence each other become more similar [27]. [2] also
proaches use data from social networks to improve accuracy. How- confirm that a social network provides an independent source of in-
ever, most of the social-network based recommender systems only formation which can be exploited to improve the quality of rating
consider direct friendships and they are less effective when the tar- predictions. Based on the rationale that a user’s interest is similar to
geted user has few social connections. In this paper, we propose or influenced by her/his friends, several social-based recommender
two alternative models that incorporate the overlapping commu- systems [17, 16, 7, 18, 15, 30, 12] have recently been proposed. Ex-
nity regularization into the matrix factorization framework. Our periments demonstrate that making recommendations based on the
empirical study on four real datasets shows that our approaches ratings of the users socially connected to the target user improves
outperform the state-of-the-art algorithms in both traditional and traditional techniques, especially when the user-item rating matrix
social-network based recommender systems regarding both cold- is sparse. However, in the literature, the most effective social-based
start users and normal users. recommenders systems [7, 18, 30] only consider direct friendships
in the network. As shown in our empirical study, they become less
effective for target users who have few ratings (rating-cold-start
1. INTRODUCTION users) or few social connections (social-cold-start users).
Recommender systems have become essential tools for suggest-
ing items of potential interest to users and they have successfully
been deployed in the industry, with applications such as movie rec-
ommendations (Netflix), product recommendations (Amazon), and
music recommendations (Last.fm).
The various definitions of the recommendation problem all boil
down to predicting the ratings of a target user on items (e.g., movies)
that the user has not rated before (e.g., unwatched movies). Specifi-
cally, consider a set of m users and a set of n items in a rating-based
recommender system: each user u can rate any item by giving it a
score. Given a target user u, for each item i that u has not rated,
the system predicts the rating, based on the existing ratings of other
users. Then, the unrated items with high predicted rating scores are
offered as suggestions to u.
Traditional recommender systems [9, 3, 11, 24, 10, 5, 26, 31, 14] Figure 1: Communities in Shelfari
are effective for target users who have rated many items, since it is In this paper, we exploit information about the communities formed
easy to find other users that have rated these items. However, they by users in social networks, to improve the recommendation accu-
perform poorly for cold-start users who have very few ratings; in racy. Social network users tend to establish relationships with peo-
∗
ple who share similar interests with them. For example, Figure 1 il-
Work supported by grant HKU 715413E from Hong Kong RGC. lustrates user communities in a book recommender system1 , based
on different topics. The members in the same community usually
Permission to make digital or hard copies of all or part of this work for personal or share characteristics and can be an alternative of direct friends for
classroom use is granted without fee provided that copies are not made or distributed social recommender systems. Aiming at solving the problems of
for profit or commercial advantage and that copies bear this notice and the full cita- social-based recommenders discussed in the previous paragraph,
tion on the first page. Copyrights for components of this work owned by others than
we propose two models that incorporate the overlapping commu-
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
nity regularization into the matrix factorization framework differ-
and/or a fee. Request permissions from Permissions@acm.org. ently. The communities are detected based on the social network
RecSys’15, September 16–20, 2015, Vienna, Austria. structure; a user may belong to multiple communities with differ-
c 2015 ACM. ISBN 978-1-4503-3692-5/15/09 ...$15.00. 1
DOI: http://dx.doi.org/10.1145/2792838.2800171. http://www.shelfari.com
27
1
0.4 0.8 1 2 3 4 5 1 2 3 4 5
5 1 2 3 4 5
2
1 1 1
0.1 0.1
2 2 2
4 4 4
0.7 5 5 5
4 3
(a) Social Graph (b) Adjacency Matrix (c) Rating Matrix (d) Output
Figure 2: An Example of Social Recommender System
ent interests. One of our models (MFC) ensures that the distance collaborative filtering (CF). Two classes of CF methods are widely
between the latent feature vectors of users u and u0 is low if u and used. Memory-based methods predict ratings for the target user
u0 belong to the same community c. Our other model (MFC+ ) based on the ratings of similar users [9] or the computed infor-
forces the user latent feature vectors to be close to those of her/his mation of items similar to those chosen by the target user [3, 24].
communities. Empirical studies on four real datasets show that our Model-based methods make predictions using a trained compact
approaches outperform the state-of-the-art traditional and social- model from the user-item rating matrix. Various training models
based recommenders by 6%-42% for general users. Moreover, we have been investigated, such as the clustering model [10], the as-
put emphasis on cold-start users. While the problem of rating-cold- pect models [5, 26], the Bayesian hierarchical model [31], and the
start users is studied in previous research on social recommender ranking model [14]. None of these traditional approaches take so-
systems, social-cold-start users are ignored by previous work. Our cial network data into account.
methods consider both cases and beat baselines by 7%-32% for
rating-cold-start users and 4%-37% for social-cold-start users. 3.2 Social-based Approaches
The rest of the paper is organized as follows. Section 2 formally Most of the social-based approaches [17, 16, 7, 18, 15, 30] fol-
defines the problem. Section 3 reviews major rating prediction low the matrix factorization (MF) framework [11, 23], due to its
approaches in the literature. Section 4 introduces our MFC and effectiveness and efficiency in dealing with large user-item rating
MFC+ models. The results of our empirical study are reported in matrices. Let R be an m × n matrix with the ratings of m users
Section 5. We conclude in Section 6. on n items. The basic MF method shown in Figure 3(a) predicts
the rating matrix R by multiplying a d-rank user-specific matrix
2. PROBLEM DEFINITION U ∈ Rd×m with a d-rank item-specific matrix V ∈ Rd×n , i.e.,
In a traditional ratings-based recommender system, there are m R ≈ U T V , where d min(m, n). Column vectors Ui and Vi
users {u1 , · · · um } and n items {v1 , · · · vn }. The users’ ratings on represent the d-dimensional latent feature vectors of user ui and
items form a m × n rating matrix R = [rij ] where rij is the rating item vj , respectively. The latent vectors can be learnt by minimiz-
of user ui on item vj . Typically, 5-scale or 10-scale rating systems ing the following sum-of-squared-errors objective function with
are used. two quadratic regularization terms to avoid overfitting:
In a social recommender system, we have a social network graph m n
1 XX R λU λV
where each node represents a user and edges model the social rela- Iij (Rij − UiT Vj )2 + kU k2F + kV k2F ,
tionships between users (e.g., friendship or influence). The social 2 i=1 j=1 2 2
graph can also be modeled by a m×m adjacency matrix S = [sij ],
where sij represents the similarity between users ui and uj or how where k·k2F denotes the Frobenius norm and Iij R
is equal to 1 if user
much user ui trusts user uj . Figure 2 shows a toy example of a so- ui rated item vj and equal to 0 otherwise. Gradient descent can be
cial recommender system with 5 users and 5 items. Figure 2(a) is applied to find a local minimum. Having latent feature vectors Ui
the social network graph and Figure 2(b) is the corresponding adja- and Vj , the unknown rating on item vj for user ui is predicted as
cency matrix S = [sij ] where a positive sij indicates a social edge R̂ij = UiT Vj .
between user ui and user uj . Figure 2(c) is an exemplary user- SoRec [17] extends the basic MF model by integrating the social
item rating matrix where questionmarks are unknown ratings. Fig- network, as shown in Figure 3(b). T is the matrix representation of
T T
ure 2(d) illustrates the possible output of the social recommender the social network; Iik =1 if users ui and uk are friends and Iik =0
system where any unknown ratings are predicted. otherwise. Matrix T is factorized into a user-specific matrix U and
The basic task of a social recommender is as follows: given the a factor-specific matrix W . The latent feature vectors of users are
user-item rating matrix R = [rij ] and the adjacency matrix S = learnt based on both the rating and social network matrices. The
[sij ], predict an unknown rating rij for user ui on item vj . objective function to be minimized is:
m n m m
3. RELATED WORK 1 XX R
Iij (Rij − UiT Vj )2 +
λT X X T
Iik (Tik − UiT Wk )2
This section reviews important rating prediction approaches in 2 i=1 j=1 2 i=1
k=1
traditional and social-based recommender systems.
λU λV λW
+ kU k2F + kV k2F + kW k2F .
3.1 Traditional Approaches 2 2 2
One of the most commonly-used and successfully-deployed rat- Later, as shown in Figure 3(c), STE [16] modified the basic MF
ing prediction approaches in traditional recommender systems is model so that each rating Rij in the user-item matrix R reflects (i)
28
j i j i i1 1
i2
2
ik .
.
* ** .
ij i=1,…,m ij i=1,…,m
j=1,…,n j=1,…,n k
u1,…,uk N(i)
ij 1 k
user ui ’s favor on item vj and (ii) the favors of user ui ’s friends MFC+ are presented in Sections 4.2 and 4.3. The time complexity
(N (i) indicates ui ’s friends) on item vj . The objective function is: is analyzed in Sections 4.4.
m n
1 X X R X 2 4.1 Overlapping Community
Iij Rij − αUiT Vj + (1 − α) Tik UkT Vj Community structures are quite common in social networks. The
2 i=1 j=1
uk ∈N (i)
users in the same community share characteristics (e.g., they may
λU λV have common locations, interests, occupations, etc.). In some real
+ kU k2F + kV k2F ,
2 2 applications (e.g., Douban which is an online recommender system
for book, movie and music and Shelfari which is a recommender
where T is the matrix representation of the social network, N (i) is
system for book), there are some manually formed communities
the friend set of user ui , and α controls the effect of friends on the
which represents members’ interests. It is natural that a user may
rating estimation.
belong to multiple communities and overlapping communities can
A recent model, SocialMF [7], learns the latent feature vectors of
represent the users’ diverse characteristics. For example, a user
users based on the latent feature vectors of their friends, as shown in
may join a group for mystery novels and a group for historical nov-
Figure 3(d). SR model [18] improves SocialMF by treating friends
els at the same time.
with dissimilar tastes differently, so as to consider the diversity of
We derive the rating vector of a community as the mean vector
each user’s friends. SR’s objective function is:
of the rating vectors of all the users in the community. We adopt the
m n widely used Pearson Correlation Coefficient (PCC) [22] to measure
1 XX R λU λV
Iij (Rij − UiT Vj )2 + kU k2F + kV k2F the similarity Sij between two users based on their rating vectors.
2 i=1 j=1 2 2 The interest Zih of a user ui on community ch is the PCC between
m the rating vectors of user ui and community ch . We map PCC into
βX X
+ Sik kUi − Uk k2F , range [0, 1] using function f (x) = (x + 1)/2. PCC is defined as:
2 i=1
uk ∈N (i) P
p∈P (xip − xi )(xjp − xj )
simij = qP ,
where Sik is the similarity between users ui and his/her friend uk .
qP
p∈P (xip − xi )
2
p∈P (xjp − xj )
2
Concerning the social-cold-start users who have few friends in the
social network, in the SR+ model [15], the latent feature vectors where P is the intersection of the two vectors.
of users depend on the latent feature vectors of both their friends
and the users with high similarities. However, SR+ requires an a 4.2 MFC Model
priori similarity threshold. The CircleCon model [30] refines So- The SR model [18, 15] improves ratings prediction by imposing
cialMF by considering category-specific friends and the intuition similarity constraints between users and their friends. Given the
is that a user may trust different subsets of users regarding differ- social network, model SR can be easily extended by considering
ent domains. GSBM [8] extends the mixed membership stochastic all the members in the community where the target user belongs to.
blockmodel [1] to capture both the social relations and the rating However, taking the individual influence based on the user-to-user
behavior for groups of users and items. However, its rating predic- similarity constraints may cause overfitting.
tion accuracy is worse than that of SocialMF.
PSLF [25] is a unified probabilistic model for social recommen-
dation. It extracts the social factor vectors of users from the so-
cial network based on the mixture membership stochastic block-
model [1] and integrates them into the user-item space. i1 i1
In summary, most of the social-based approaches only consider
direct friendships in the social network. They become less effective i1 ik
U
when a user has few social connections. 1 k i1
ih i1
IZATION
We propose two models, MFC and MFC+ , that incorporate the
overlapping community information as regularization terms into
U
the widely used MF framework. We first introduce the concept of 1 k ih
29
Our proposed MFC model, shown in Figure 4, injects commu-
1
nity interest constraints into the SR model. Our motivation is that
users who belong to different communities should be treated differ- 11
ently, as opposed to the SR model, which treats them equally. If the
j i i1 1 k1 k
target user is more interested in music than sports, users belonging C
to the music community should be weighed higher. In MFC, the 1 k 1
30
Yelp Flixster Douban Dianping
1000 1000 1000 1000
Number of Ratings
Number of Ratings
Number of Ratings
100 100 100 100
20 20 20 20
10 10 10 10
1 1 1 1
1 10 20 100 500 1000 1 10 20 100 500 1000 1 10 20 100 500 1000 1 10 20 100 500 1000
Number of Friends Number of Friends Number of Friends Number of Friends
31
• CESNA is one of the few overlapping community detection million US dollar was awarded to a team for reducing the RMSE by
methods that consider both structure and node attributes [29]. 10% compared to the state-of-the-art. As another evidence, the im-
Since CESNA also takes user attributes into account, we use provements reported in previous work are around 5% for SocialMF
a binary category vector from users to rated items to repre- compared with STE in dataset Flixster and 1.42%-2.33% for SR
sent each user’s attributes. For example, if user ui rated item compared with STE in dataset Douban.
vj of which the categories are Fast Food and Ice Cream then CircleCon divides direct friends into different circles and each
we set the corresponding entities of his/her binary category circle corresponds to one category. When making prediction, only
vector to 1. It has a linear runtime in the network size. one circle is considered. To compare our approaches with Circle-
Con, we evaluate the prediction results of category Restaurants in
We use the fast clique percolation algorithm6 introduced in [21] datasets Yelp and Dianping since datasets Douban and Flixster do
for CPM and the implementations from Stanford Network Analysis not have the information about item category. Restaurants is the
Project7 for BIGCLAM and CESNA. For each user who does not largest category in both two datasets. Table 3 displays the results
belong to any community, we form a simple community by taking for Restaurants category. According to the results, our approaches
the user and all his/her direct friends. outperform CircleCon, since CircleCon has the same limitation as
does SR that only direct friends are considered.
5.5 Parameter Settings From experiment results, we can also find that our methods out-
We performed 5-fold cross-validation on the training set to em- perform baselines no matter which overlapping community detec-
pirically tune parameters, so that each method achieves its own best tion methods is employed. MFCb and MFC+ b are slightly better
result. than MFCp and MFC+ p . When additional information about users
λU and λV are set to 0.01 for all methods. In SR and SR+ , β is is available, MFCc and MFC+ c perform much better than MFCb ,
set to 0.35, 1.1, 0.6 and 0.55 on Yelp, Flixster, Douban, and Dian- MFC+ +
b , MFCp and MFCp . It is reasonable because CESNA con-
ping, respectively. SR+ requires an additional parameter θ which siders both node attributes and structure while CPM and BIGCLAM
is hard to know a priori and we adopt 0.75, used in the original pa- only take structure into account. Our methods are independent of
per, in our evaluation. For CircleCon, β are set to 20 for Yelp and the communitiy detection methods and the reduction of RMSE may
Dianping. For MFC, λZ is set to 0.0001 on Douban dataset and go up if better community detection method is used. When explicit
0.001 on other datasets. For MFC+ , λZ is set to 0.1 for Douban, community structure is known, our models are expected to perform
0.5 for other datasets. We list the results for d = 10 where d is even better since they outperform baselines though current commu-
the dimensionality of the latent space. As d varies, our methods nity structure is generated from community detection methods.
outperform the competitors consistently.
For community detection approaches BIGCLAM and CESNA, 5.7 Performance on Cold-Start Users
default parameters are used. For CPM, we report the results when
In this section, we consider two types of cold-start users in or-
k = 3. When k is less than 6, our methods beat SR method and
der to evaluate the performance of our approaches:rating cold-start
achieve their best result when k = 3. When k exceeds 6, the num-
users, who have few ratings, and social cold-start users, who have
ber of communities that can be detected begins to decline sharply
few social connections. Recommender systems must be capable
and our methods degrade to BaseMF.
of matching the characteristics of an item against relevant features
5.6 Performance on All Users in the user’s profile. In order to do this, it must first construct
a sufficiently-detailed model of the user’s tastes and preferences
We report the results of all tested users in Table 2. The standard
through preference elicitation. Rating cold-start users is challeng-
deviations of the results are around 0.0015. The notation p, b, c
ing for traditional recommender systems, because there is not enough
denote that community structures are generated from CPM, BIG-
information about them that can be utilized to generate a recom-
CLAM and CESNA, respectively. The percentages are the highest
mendation. A motivation behind social recommender systems is to
improvements of MFC or MFC+ over the competitors. For exam-
utilize social information to improve prediction accuracy for rating
ple, the best results in Flixster (1.0134) is achieved by MFC+ b and
cold-start users. Previous studies [17, 7] illustrate the effectiveness
the percentages in that row mean the percentages of the improve-
of taking direct friendships into account when making predictions
ments of MFC+ b over the baselines. Note that datasets Douban and
for rating cold-start users. However, social cold-start users are ig-
Flixster do not contain additional information about item category.
nored in most evaluations of social recommender systems. On av-
Therefore, CESNA cannot be used in Douban and Flixster and the
erage, 44.70% of users are rating cold-start users and 50.62% of
corresponding results are denoted by “None".
them are social cold-start users in four datasets used in our experi-
Our proposed models beat all competitors in all the four datasets,
ments (i.e., users in red regions in Figure 6). Due to the significant
since the community information plays a positive role in rating pre-
number of both rating cold-start and social cold-start users, the ef-
dictions. The SR model only considers direct friends, hence the
fectiveness of any social recommendation approach is important for
predictions for the users having few friends may not be accurate.
both of these two types of users.
Also, some of the friends may have inconsistent interest with the
We assess the performance of our approaches for rating cold-
target user, causing the SR model to make inaccurate predictions.
start users with less than 5 ratings (following the setting in [7, 19,
Although the SR+ model takes highly similar users into account,
6]) and the results are reported in Table 4. We also evaluate the per-
our approaches further refine the similarities using community in-
formance for soical cold-start users (with less than 5 direct friends)
terest. The effectiveness has been proved by the result. Our propos-
and the result is reported in Table 5. CircleCon cannot model cir-
als beat the state-of-the-art SR model by 7%-16% and SR+ model
cle influence when user does not have enough friends who have
by 5%-11% on four datasets. As an indication about the signif-
rated items belonging to a specific category. CESNA fails to iden-
icance of this improvement, the well-known Netflix Prize8 of one
tify community for cold-start users due to the lack of their ratings.
6
http://github.com/aaronmcdaid/MaximalCliques Hence, their results are omitted. As expected, all methods perform
7
http://snap.stanford.edu/snap worse compared to the results in Table 2. However, our proposed
8
http://www.netflixprize.com methods still reduce RMSE by 5%-10% for rating cold-start users
32
Table 2: Performance Comparison
Dataset SCF BaseMF SR SR+ MFCp MFC+
p MFCb MFC+
b MFCc MFC+
c
1.4730 1.2498 1.2216 1.2032
Yelp 1.1618 1.1617 1.1543 1.1602 1.1324 1.1143
24.35% 10.84% 8.78% 7.39%
1.1761 1.1853 1.1041 1.0823
Flixster 1.0427 1.0436 1.0325 1.0134 None None
13.83% 14.50% 8.21% 6.37%
1.2788 1.0478 0.9441 0.9322
Douban 0.8952 0.8961 0.8776 0.8823 None None
31.37% 16.24% 7.04% 5.86%
1.3642 1.0598 0.9449 0.9012
Dianping 0.8678 0.8748 0.8812 0.8721 0.7932 0.8211
41.86% 25.16% 16.05% 11.98%
and 4%-10% for social cold-start users, compared with SR and 5.9 Comparison between MFC and MFC+
SR+ indicating that community regularization handles both rating From the experimental results, we can see that MFC and MFC+
cold-start users and social cold-start users better than baselines. have similar performance. Intuitively, MFC should perform better
when the communities include diverse users since it minimizes the
distance between community members, while MFC+ should out-
5.8 Impact of parameter λZ perform MFC when community members have consistent tastes,
Parameter λZ controls the influence of communities in the pre- because it minimizes the distances between members and commu-
diction process of MFC and MFC+ . A large λZ indicates strong nity center. To analyze differences in the performance of these two
influence of communities. Figure 7(a) show the impact of λZ over methods for different kinds of communities, we use Root Mean
dataset Dianping when CPM is employed and similar trends can be Square Distance (RMSD) for comparison. RMSD is defined as:
observed over other datasets. MFC+ achieves its best performance P
when λZ = 0.5, while the best results of MFC are when λZ is 2 ui ,uj ∈ch Sij
S̄h = ,
small (0.001). |Ch | (|Ch | − 1)
The two models exhibit different behaviors because MFC+ min-
s P
2 ui ,uj ∈ch (Sij − S̄h )2
imizes the distance between the target user and her/his interested RM SD = ,
communities, while MFC tries to minimize the distances between |Ch | (|Ch | − 1)
the target user and the users in the same communities. This means where |Ch | is the number of users in community ch and Sij is
that λZ is considered more times for MFC than for MFC+ in the the PCC defined in Section 4.1. Small RMSD means community
calculation of gradient descent, since the number of communities members have consistent tastes.
to which the target user belongs is generally much smaller than the Take for example dataset Dianping, where CPM is employed.
numbers of users in these communities. Thus, a large λZ can help From Figure 7(b), we can see that MFC+ performs better than MFC
MFC+ achieve a good result, while a small λZ can avoid the over- for users in communities with small RMSD. When RMSD exceeds
influence of communities to MFC. 0.3, the results of MFC are superior.
33
1.4 0.90
MFC MFC+ MFC MFC+
1.3 0.88
1.2
Average RMSE
RMSE 0.86
1.1 0.84
1.0 0.82
0.9 0.80
0.8 0.78
1e-4 0.001 0.01 0.1 0.5 1 5 [0, 0.1] (0.1, 0.2] (0.2, 0.3] (0.3, 0.4] (0.4, 0.5]
λZ RMSD
(a) (b)
Figure 7: (a) Impact of parameter λZ on RMSE (Dianping) when CPM is employed. (b) Performance of MFC and MFC+ over
different kinds of communities in Dianping when CPM is employed.
34