0% found this document useful (0 votes)

101 views17 pages

Tag Recommendations

The document proposes and evaluates several algorithms for recommending tags in social bookmarking systems. It compares an adaptation of user-based collaborative filtering, a graph-based recommender using the FolkRank algorithm, and simple methods based on counting tag occurrences. The results show that both FolkRank and collaborative filtering provide better recommendations than non-personalized baseline methods based on tag counts. However, counting-based methods are more computationally efficient for real-time recommendations.

Uploaded by

Catalin Cristian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views17 pages

Tag Recommendations

Uploaded by

Catalin Cristian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

1

Tag Recommendations
in Social Bookmarking Systems
Robert Jaschke a,c Leandro Marinho b,d
Andreas Hotho a Lars Schmidt-Thieme b
Gerd Stumme a,c
a

Knowledge & Data Engineering Group (KDE),

University of Kassel,
Wilhelmshoher Allee 73, 34121 Kassel, Germany
http://www.kde.cs.uni-kassel.de
b
Information Systems and Machine Learning Lab
(ISMLL), University of Hildesheim,
Samelsonplatz 1, 31141 Hildesheim, Germany
http://www.ismll.uni-hildesheim.de
c
Research Center L3S,
Appelstr. 9a, 30167 Hannover, Germany
http://www.l3s.de
d
Brazilian National Council Scientific and
Technological Research (CNPq) scholarship holder.
Abstract. Collaborative tagging systems allow users to assign keywordsso called tagsto resources. Tags are
used for navigation, finding resources and serendipitous
browsing and thus provide an immediate benefit for users.
These systems usually include tag recommendation mechanisms easing the process of finding good tags for a resource,
but also consolidating the tag vocabulary across users. In
practice, however, only very basic recommendation strategies are applied.
In this paper we evaluate and compare several recommendation algorithms on large-scale real life datasets: an adaptation of user-based collaborative filtering, a graph-based recommender built on top of the FolkRank algorithm, and simple methods based on counting tag occurences. We show
that both FolkRank and Collaborative Filtering provide better
results than non-personalized baseline methods. Moreover,
since methods based on counting tag occurrences are computationally cheap, and thus usually preferable for real time
scenarios, we discuss simple approaches for improving the
performance of such methods. We show, how a simple recommender based on counting tags from users and resources
can perform almost as good as the best recommender.
Keywords: Folksonomies, Recommender Systems, Social
Bookmarking, Ranking
AI Communications
ISSN 0921-7126, IOS Press. All rights reserved

1. Introduction
Folksonomies are web-based systems that allow
users to upload their resources, and to label them with
arbitrary words, so-called tags. The systems can be distinguished according to what kind of resources are supported. Flickr1 , for instance, allows the sharing of photos, del.icio.us2 the sharing of bookmarks, CiteULike3
and Connotea4 the sharing of bibliographic references,
and last.fm5 the sharing of music listening habits. BibSonomy6 allows to share bookmarks and BIBTEX based
publication entries simultaneously.
In their core, these systems are all very similar. Once
a user is logged in, he can add a resource to the system, and assign arbitrary tags to it. The collection of
all his assignments is his personomy, the collection of
all personomies constitutes the folksonomy. The user
can explore his personomy, as well as the whole folksonomy, in all dimensions: for a given user one can see
all resources he has uploaded, together with the tags he
has assigned to them; when clicking on a resource one
sees which other users have uploaded this resource and
how they tagged it; and when clicking on a tag one sees
who assigned it to which resources. Based on the tags
that are assigned to a resource, users are able to search
and find her own or other users resources within such
systems.
To support users in the tagging process and to expose different facets of a resource, most of the systems offered some kind of tag recommendations already at an early stage. Del.icio.us, for instance, had a
tag recommender in June 2005 at the latest,7 and also
included resource recommendations.8 However, no algorithmic details were published. We assume that these
recommendations basically provide those tags which
were most frequently assigned to the resource (called
most popular tags by resource in the sequel).
1

2 http://del.icio.us
http://flickr.com
4 http://www.connotea.org
http://www.citeulike.org
5 http://www.last.fm
6 http://www.bibsonomy.org
7 http://www.socio-kybernetics.net/saurierduval/archive/2005 06
8 http://blog.del.icio.us/blog/2005/08/people
01 archive.html
who like.html
3

As of today, nobody has empirically shown the benefits of recommenders in such systems. In this paper, we will evaluate a tag recommender based on
Collaborative Filtering (introduced in Section 3.1), a
graph based recommender using our ranking algorithm
FolkRank (see Section 3.2), and several simpler approaches based on tag counts (Section 3.3). In Section 4, we discuss the computational costs of the different algorithms. The quality of the resulting recommendations is evaluated on three real world folksonomy datasets from del.icio.us, BibSonomy9 and last.fm
(Sections 5 and 6). In the next section we start with
recalling the basics and discussing related work.
The results presented in this article built upon results presented at the 18th European Conference on
Machine Learning (ECML) / 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) 2007 [17].

2.1. A Formal Model for Folksonomies

Formally, a folksonomy is a tuple F := (U, T, R, Y )
where
U , T , and R are finite sets, whose elements are
called users, tags and resources, resp., and
Y is a ternary relation between them, i. e., Y
U T R, whose elements are called tag assignments (tas for short).10
Users are typically described by their user ID, and
tags may be arbitrary strings. What is considered a resource depends on the type of system. For instance,
in del.icio.us, the resources are URLs, in BibSonomy
URLs or publication references, and in last.fm, the resources are artists.
For convenience we also define, for all u U and
r R, T (u, r) := {t T | (u, t, r) Y }, i. e.,
T (u, r) is the set of all tags that user u has assigned
to resource r. The set of all posts of the folksonomy
is then P := {(u, S, r) | u U, r R, S =
T (u, r), S 6= }. Thus, each post consists of a user, a
resource and all tags that this user has assigned to that
resource.

2. Recommending TagsProblem Definition and

State of the Art

2.2. Problem Definition

Most recommender systems are typically used to

call users attentions to new objects they do not know
yet and have not rated already in the past. This is often
due to the fact that there is no repeat-buying in domains
like books, movies, music etc. in which these systems
typically operate. In social bookmarking systems, on
the contrary, re-occurring tags are an essential feature
for structuring the knowledge of a user or a group of
users, and have to be considered by a tag recommender.
This means that the fact that a tag already has been
used to annotate a resource does not exclude the possibility of recommending the same tag for a different resource of the same user. Overall, recommending
tags can serve various purposes, such as: increasing the
chances of getting a resource annotated, reminding a
user what a resource is about and consolidating the vocabulary across the users.
In this section we formalize the notion of folksonomies, formulate the tag recommendation problem,
and briefly describe the state of the art on tag recommendations in folksonomies.

Recommender systems (RS) in general recommend

interesting or personalized information objects to users
based on explicit or implicit ratings. Usually RS predict ratings of objects or suggest a list of new objects
that the user hopefully will like the most. The task
of a tag recommender system is to recommend, for a
given user u U and a given resource r R with
T (u, r) = , a set T(u, r) T of tags. In many cases,
T(u, r) is computed by first generating a ranking on
the set of tags according to some quality or relevance
criterion, from which then the top n elements are selected.
Notice that the notion of tag relevance in folksonomies can assume different perspectives, i. e., a tag
can be judged relevant to a given resource according
to the society point of view, through the opinion of experts in the domain or based on the personal profile of
an individual user. For all the evaluated algorithms, we
concentrate here on measuring the individual notion of
tag relevance, i. e., the degree of likeliness of a user for
a certain set of tags, given a new or untagged resource.

9 We make the BibSonomy dataset publicly available for research

purposes to stimulate research in the area of folksonomy systems
(details in Section 5.1).

In the original definition [15], we introduced additionally a subtag/supertag relation, which we omit here. The version used here is
known in Formal Concept Analysis [11] as a triadic context [19,28].
2

be applied to folksonomies disregarding the domain

and kind of resource supported.
Most recently, the ECML PKDD 2008 Discovery
Challenge11 has addressed the problem of tag recommendations in folksonomies.

2.3. Related Work

General overviews on the rather young area of
folksonomy systems and their strengths and weaknesses are given in [13,20,22]. In [23], Mika defines
a model of semantic-social networks for extracting
lightweight ontologies from del.icio.us. Recently, work
on more specialized topics such as structure mining
on folksonomiese. g. to visualize trends [9] and patterns [27] in users tagging behavioras well as ranking of folksonomy contents [15], analyzing the semiotic dynamics of the tagging vocabulary [7], or the dynamics and semantics [12] have been presented.
The literature concerning the problem of tag recommendations in folksonomies is still sparse. The existent approaches usually lay in the collaborative filtering and information retrieval areas. In [24,6], algorithms for tag recommendations are devised based on
content-based filtering techniques. Xu et al. [32] introduce a collaborative tag suggestion approach based on
the HITS algorithm [18]. A goodness measure for tags,
derived from collective user authorities, is iteratively
adjusted by a reward-penalty algorithm. Benz et al. [3]
introduce a collaborative approach for bookmark classification based on a combination of nearest-neighborclassifiers. There, a keyword recommender plays the
role of a collaborative tag recommender, but it is just
a component of the overall algorithm, and therefore
there is no information about its effectiveness alone.
Basile et al. [1] suggests an architecture of an intelligent recommender tag system. In [10,31,29] the problem of tag-aware resource recommendations is investigated. The standard tag recommenders, in practice, are
services that provide the most-popular tags used for a
particular resource. This is usually done by means of
tag clouds where the most frequent used tags are depicted in a larger font or otherwise emphasized.
The approaches described above address important
aspects of the problem, but they still diverge on the
notion of tag relevance and evaluation protocol used.
In [32,1], e. g., no quantitative evaluation is presented,
while in [24], the notion of tag relevance is not entirely
defined by the users but partially by experts. Furthermore, most of them make use of some content information which is specific to the particular type of resource
of the system. It is certainly interesting to exploit content information, but since folksonomies can support
different types of resources, e.g. audio, image, text, or
video, one would need to write specific recommenders
suited for each distinct content type. In this paper we
are particulary interested in generic algorithms that can

3. Recommendation Algorithms
In this section we present three classes of recommendation algorithms we will evaluate in the following sections: a straight-forward adaptation of Collaborative Filtering [4,25] based on user-tag and userresource projections, two adaptations of PageRank [5]
for folksonomies, and various methods based on counting the most popular tags.
3.1. Collaborative Filtering
Due to its simplicity and promising results, Collaborative Filtering (CF) has been one of the most dominant methods used in recommender systems. In the
next section we recall the basic principles and then
present the details of the adaptation to folksonomies.
3.1.1. Basic Collaborative Filtering Principle
The idea is to suggest new objects or to predict the
utility of a certain object based on the opinion of likeminded users [26]. In CF, for m users and n objects,
the user profiles are represented in a user-object matrix
X Rmn . The matrix can be decomposed into row
vectors:
X := [~x1 , ..., ~xm ]T with ~xu := [xu,1 , ..., xu,n ], for
u := 1, . . . , m,
where xu,o indicates that user u rated object o by
xu,o R. Each row vector ~xu corresponds thus to a
user profile representing the object ratings of a particular user. This decomposition leads to user-based CF
in contrast to item-based algorithms (see [8]).12
Now, one can compute, for a given user u, the recommendation as follows. First, based on the matrix X
and for a given k, the set Nuk of the k users that are
most similar to user u U are computed:
k

Nuk := argmax sim(~xu , ~xv )

vU \{u}

11 http://www.kde.cs.uni-kassel.de/ws/rsdc08/
12 We also measured the performance of item-based CF. Since precision and recall of its recommendations were for all datasets worse than those
of user-based CF, we decided to present the later type of CF as the
baseline for CF algorithms.

ta
g

recommended tags, we compute first Nuk as described

above, followed by:

users

n
T(u, r) := argmax
tT

resources
Y

sim(~xu , ~xv )(v, t, r) (1)

vNuk

where (v, t, r) := 1 if (v, t, r) Y and 0 else.

3.2. A Graph-Based Approach
tags

The web search algorithm PageRank [5] reflects the

idea that a web page is important if there are many
pages linking to it, and if those pages are important
themselves.13 In [15], we employed the same underlying principle for Google-like search and ranking in
folksonomies. The key idea of our FolkRank algorithm
is that a resource which is tagged with important tags
by important users becomes important itself. The same
holds, symmetrically, for tags and users. We have thus
a graph of vertices which are mutually reinforcing each
other by spreading their weights. In this section we
briefly recall the principles of the FolkRank algorithm,
and explain how we use it for generating tag recommendations.
Because of the different nature of folksonomies
compared to the web graph (undirected triadic hyperedges instead of directed binary edges), PageRank cannot be applied directly on folksonomies. In order to
employ a weight-spreading ranking scheme on folksonomies, we overcome this problem in two steps.
First, we transform the hypergraph into an undirected
graph. Then we apply a differential ranking approach
that deals with the skewed structure of the network and
the undirectedness of folksonomies, and which allows
for topic-specific rankings.

users

resources

U R Y

U T Y

Fig. 1. Projections of Y into the users resource and users tag spaces.

where the superscript in the argmax function indicates

the number k of neighbors to be returned, and sim is
regarded (in our setting) as the cosine similarity meaxu ,~
xv i
sure, i. e., sim(~xu , ~xv ) := k~h~
xu kk~
xv k .
Then, for a given n N, the top n recommendations consist of a list of objects ranked by decreasing
frequency of occurrence in the ratings of the neighbors
(see Eq. 1 below for the folksonomy case).
3.1.2. Collaborative Filtering for Recommending
Tags in Folksonomies
Because of the ternary relational nature of folksonomies, traditional CF cannot be applied directly,
unless we reduce the ternary relation Y to a lower
dimensional space [21]. To this end we consider as
matrix X alternatively the two 2-dimensional projections U R Y {0, 1}|U ||R| with (U R Y )u,r := 1 if
there exists t T s. t. (u, t, r) Y and 0 else, and
U T Y {0, 1}|U ||T | with (U T Y )u,t := 1 if there
exists r R s. t. (u, t, r) Y and 0 else (Figure 1).
The projections preserve the user information, and
lead to recommender systems based on occurrence or
non-occurrence of resources or tags, resp., with the
users. This approach is similar to recommenders that
are based on web log data. Notice that here we have
two possible setups in which the k-neighborhood Nuk
of a user u can be formed, by considering either the
resources or the tags as objects.
Having defined matrix X, and having decided whether
to use U R Y or U T Y for computing user neighborhoods, we have the required setup to apply Collaborative Filtering. For determining, for a given user u, a
given resource r, and some n N, the set T(u, r) of n

3.2.1. Folksonomy-Adapted PageRank

First we convert the folksonomy F = (U, T, R, Y )
into an undirected tri-partite graph GF = (V, E). The
set V of nodes of the graph consists of the disjoint
union of the sets of tags, users and resources (i. e.,
R).

V = U T
All co-occurrences of tags and users,
users and resources, tags and resources become edges
between the respective nodes. I. e., each triple (u, t, r)
in Y gives rise to the three undirected edges {u, t},
{u, r}, and {t, r} in E.
Like PageRank, we employ the random surfer model,
that is based on the idea that an idealized random web
surfer normally follows links (e. g., from a resource
13

This idea was extended in a similar fashion to bipartite subgraphs

of the web in HITS [18] and to n-ary directed graphs in [30].
4

2. Let w
~ (1) be the fixed point from Equation (2)
with p~ = 1, but p~[u] = 1+|U | and p~[r] = 1+|R|.
3. w
~ := w
~ (1) w
~ (0) is the final weight vector.

page to a tag or a user page), but from time to time

jumps to a new node without following a link. This results in the following definition.
The rank of the vertices of the graph is computed
with the weight spreading computation
w
~ t+1 dAT w
~ t + (1 d)~
p ,

Thus, we compute the winners and losers of the mutual reinforcement of nodes when a user/resource pair
is given, compared to the baseline without a preference
vector. We call the resulting weight w[x]
~ of an element
x of the folksonomy the FolkRank of x.16
For generating a tag recommendation for a given
user/resource pair (u, r), we compute the ranking as
described and then restrict the result set T(u, r) to the
top n tag nodes.

(2)

where w
~ is a weight vector with one entry for each
node in V , A is the row-stochastic version of the adjacency matrix14 of the graph GF defined above, p~ is
the random surfer vectorwhich we use as preference
vector in our setting, and d [0, 1] is determining the
strength of the influence of p~. By normalization of the
vector p~, we enforce the equality ||w||
~ 1 = ||~
p||1 . This15
ensures that the weight in the system will remain constant. The rank of each node is its value in the limit
w
~ := limt w
~ t of the iteration process.
For a global ranking, one will choose p~ = 1, i. e.,
the vector composed by 1s. In order to generate recommendations, however, p~ can be tuned by giving a
higher weight to the user node and to the resource node
for which one currently wants to generate a recommendation. The recommendation T(u, r) is then the set of
the top n nodes in the ranking, restricted to tags.
As the graph GF is undirected, most of the weight
that went through an edge at moment t will flow back
at t + 1. The results are thus rather similar (but not
identical, due to the random surfer) to a ranking that
is simply based on edge degrees. In the experiments
presented below, we will see that this version performs
reasonable, but not exceptional. This is in line with
our observation in [15] which showed that the topicspecific rankings are biased by the global graph structure. As a consequence, we developed in [15] the following differential approach.

3.3. Most Popular Tags

In this section we introduce methods based on tag
counts. In the sequel we will see that these methods are
particulary cheap to compute and therefore might be
good candidates for online computation of recommendations.
For convenience, we define, for a user u U , the set
of all his tag assignments Yu := Y ({u}T R). The
sets Yr (for any resource r R) and Yt (for any tag
t T ) are defined accordingly. Similiarly, we define,
for t T and r R, Yt,u := Y ({u}{t}R); and
define Yt,r accordingly. Finally, we define, for a user
u U , the set of all his tags Tu := {t T | r
R : (u, t, r) Y }. The set Tr (for any resource r R)
is defined accordingly.
3.3.1. Variants of Most Popular Tags
1. Recommending the most popular tags of the
folksonomy is the most simplistic approach. It
recommends, for any user u U and any resource r R, the same set:

3.2.2. FolkRankTopic-Specific Ranking

The undirectedness of graph GF makes it very difficult for other nodes than those with high edge degree to
become highly ranked, no matter what the preference
vector is.
This problem is solved by the differential approach
in FolkRank, which computes a topic-specific ranking
of the elements in a folksonomy. In our case, the topic
is determined by the user/resource pair (u, r) for which
we intend to compute the tag recommendation.

n
T(u, r) := argmax(|Yt |).
tT

This approach suffers only minimally from coldstart problems.

2. Tags that globally are most specific to the resource will be recommended when using the
most popular tags by resource:
n
T(u, r) := argmax(|Yt,r |) .
tT

1. Let w
~ (0) be the fixed point from Equation (2)
with p~ = 1.

In [15] we showed that w

~ provides indeed valuable results on a
large-scale real-world dataset while w
~ (1) provides an unstructured
mix of topic-relevant elements with elements having high edge degree. In [16], we applied this approach for detecting trends over time
in folksonomies.

1
aij := deg(i)
if {i, j} E and 0 else 15 . . . together with
the condition that there are no rank sinkswhich holds trivially in
the undirected graph GF .
14

3. Since users might have specific interests for

which they already tagged several resources, using the most popular tags by user is another option:
n
T(u, r) := argmax(|Yt,u |) .

important tag(s) having weight 0. A pre-defined factor

[0, 1] allows us to balance the influence of the user
and the resource:
n
T(u, r) := argmax( normr (t)+(1) normu (t)) .

As we will later see (cf. Sec. 6), none of the aforementioned methods alone will in general provide the
best recommendations. Nevertheless, the simplicity
and cost efficiency of algorithms based on tag counts
make them a favored approach for use in existing folksonomy systems. Therefore, we experimented with a
mix of the recommendations generated by variants 2
and 3 which we call most popular tags mix in the following sections.

We call this method the most popular tags mix.

Note that the most popular tags 0mix is just the
most popular tags by user strategy, since the normalization does not change the order of the tags. Similarly, the most popular tags 1mix is just the most popular tags by resource strategy. Note, however, that due
to normalization the most popular tags 0.5mix is not
identical to the most popular tags mix 1:1!
In Section 6 we will analyze how well different values of perform and find the best value for the examined datasets.

3.3.2. Mix of Most Popular Tags Recommenders

The main idea of this approach is to recommend a
mix of the most popular tags of the user with the most
popular tags of the resource. The simplest way to mix
the tags is to add their counts and then sort them by
their count:

4. Computational Costs

n
T(u, r) := argmax(|Yt,r | + |Yt,u |) .

In an online scenario, where tag recommendations

should be given to the user while he tags a resource,
one must consider the computational costs of the used
algorithm. Hence, in this section we want to discuss
briefly the costs of the algorithms proposed so far. We
will see that the methods described in the preceding
section are especially cheap to compute and therefore
might be good candidates for real-time computation of
recommendations, if they can provide useful recommendations. Here we want to estimate the complexity
of recommending n tags for a given user-resource tuple
(u, r) using the proposed solutions.

This way of mixing will be called most popular tags

mix 1:1, since we just add the counts as they are. For instance, if the resource has been tagged four times with
web by other users and the user has used the tag web
six times on other resources, the tag web would get a
count of ten.
Although this method already yields good results (as
we will show in the results section 6), the influence
of the user-based recommendation will be very small
compared to the resource-based recommendation if
many people have tagged this resource. Vice versa, if a
user has tagged many resources, his most popular tags
might have counts that are much higher than the counts
provided by the resources. Hence, we introduced another mix variant, where the tag counts of the two participating sets are normalized and weighted before they
are added. We define as normalization function, for
each tag t Tr :

4.1. Collaborative Filtering

The computational complexity of the CF algorithm
depends on three steps:
1. Computation of projections: In order to compose
the projections, we need to determine only the resources and tags co-occurrences with the set of
users V U that have tagged the active resource
r R. For that, we need to do a linear scan in
Y resulting in a complexity of O(|Y |). However,
with appropriate index structures, which allow to
access the tag assignments of u (or r) efficiently,
this reduces to O(log(|R|) + |Yu ||V | log(|U |)).

|Yt,r | mint0 T |Yt0 ,r |

.
maxt0 T |Yt0 ,r | mint0 T |Yt0 ,r |
(3)
For t Tu , the normalisation normu (t) is defined in
an analogue fashion. After normalization the weights
of all tags in Tr and Tu lie between zero and onewith
the most popular tag(s) having weight 1 and the least
normr (t) :=

2. Neighborhood formation: In traditional user-based

CF algorithms, the computation of the neighborhood Nu is usually linear on the number of
users as one needs to compute the similarity of
the active user with all the other users in the
database. However, in CF-based tag recommendations we are only interested in the subset V of
users that tagged the active resource. Thus, the
upper bound on the complexity of this step would
be O(|V |Z), as we need to compute |V | similarities each requiring Z operations. In the worst
case |V | = |U | but this rarely occurs in practice. In addition, we need to sort the similarities
to compute the Nu nearest users. Therefore the
complexity of this step is O(|V |(Z+log(|Nu |))).
3. Recommendations: In order to compute the top-n
recommendations for a given (u, r) pair, we need
to: (i) count the tag occurrences of nearest users
Nu similarities (see Eq. 1), and (ii) sort the tags
based on their weight, which results in a complexity of O(|Yu ||Nu | log(n)).

4.3. Most Popular Tags

If we want to compute, for a given pair (u, r),

the most popular tags of the user u (or the resource
r), we need to linearly scan Y to calculate the occurrence counts for us tags (or rs tags) and afterwards sort the tags we gathered by their count. This
would result in a complexity of O(|Y | + |Tu | log(n))
(or O(|Y | + |Tr | log(n))). Nevertheless (as for CF),
with efficient index structures to access Tu (or Tr )
this reduces to O(log(|U |) + |Yu | + |Tu | log(n)) (or
O(log(|R|) + |Yr | + |Tr | log(n))).
For the most popular tags mixes we have to consider
both of the costs and additionally add the costs to normalize the tags, which includes finding the tags with
the highest and lowest counts. This results in a complexity of O(log(|U |) + |Yu | + log(|R|) + |Yr | + |Tu | +
|Tr | + (|Tu | + |Tr |) log(n)). With |Tu | |Yu | the costs
are at most O(4|Yu | + 2|Yu | log(n)) O(|Yu |)

Hence, the whole complexity given the three steps

above is O(log(|R|) + |Yu ||V | + |V |(Z + log(|Nu |)) +
|Yu ||Nu | log(n)) and can be simplified to O(|V |(2|Yu |+
log(|V |) + |Yu | log(|n|)) O(|V ||Yu |) since |Nu |
|V | and Z |Yu |.

4.4. Comparison

Since Yu is only a small part of Y , CF and the most

popular methods are much cheaper to compute than
FolkRank, which in each iteration has to scan Y . Additionally, both methods dont need any iteration. Comparing CF and the most popular mixes requires to estimate the size of the set V of users, which have tagged
a particular resource. This certainly depends on the resource at hand, but on average the factor |V | of the CF
costs will be larger than the constant factors of |Yu | in
the most popular mix costs. In general, both methods
have similiar costs with some advantage on the side of
the mixes.

4.2. The Graph-Based Approach

One iteration of the adapted PageRank requires the
computation of dAT w
~ + (d 1)~
p, with A Rss
where s := |U | + |T | + |R|. If t marks the number of iterations, the complexity would therefore be
(s2 + s)t O(s2 t). However, since A is sparse, it is
more efficient to go linearly over all tag assignments
in Y to compute the product AT w.
~ Together with the
costs of adding the preference vector p~ Rs this results in a complexity of O((|Y |+s)t). After rank computation we have to sort the weights of the tags to collect the top n tags, thus the final complexity of the
adapted PageRank for top-n tag recommendation is
O((|Y | + s)t + |T | log(n)).
For FolkRank, one has to compute the baseline w
~ (0)
once (and update it on a regular basis)hence, these
costs do not really add up to the costs for computing
one recommendation. However, the baseline w
~ (0) has
to be subtracted from w
~ (1) , which costs at most |T | iterations (since we are only interested in the weights of
the tags). Thus, the costs of FolkRank are O((|Y | +
s)t + |T | log(n) + |T |), which can be simplified to
O((|Y | + s)t), since |T | is small compared to |Y |.

5. Evaluation Procedure

In order to evaluate the quality of the recommendations of the different algorithms, we have run experiments on three real-world datasets. In this section
we first describe the datasets we used, how we prepared the data, the methodology deployed to measure
the performance, and which algorithms we used, together with their specific settings. The results will be
discussed in Section 6.
7

5.1. Datasets

5.2. Core Computation

Many recommendation algorithms suffer from sparse
data and will thus produce bad recommendations on
the long tail of items which were used by only few
users. We follow the conventional approach and restrict
the evaluation to the dense part of the folksonomy.
To this end, we adapt the notion of a p-core [2] to tripartite hypergraphs. The p-core of level k is a subset
of the folksonomy with the property, that each user,
tag and resource has/occurs in at least k posts. For the
del.icio.us dataset we will later see that using the core
will (except for the adapted PageRank) not change the
relative performance differences of the algorithms.
To construct the p-core, recall that a folksonomy
(U, T, R, Y ) can be formalized equivalently as undirected tri-partite hypergraph G = (V, E) with V =
R
and E = {{u, t, r} | (u, t, r) Y }. First we
U T
0 R
0
define, for a subset V 0 of V (with V 0 = U 0 T
0
0
0
and U U, T T, R R), the function

{(v, S, r) | r R0 , S = TV 0 (v, r)}

if v U 0

{(u, v, r) | u U 0 , r R0 }
posts(v, V 0 ) =

if v T 0

{(u, S, v) | u U 0 , S = TV 0 (u, v)}

if v R0
(4)
which assigns to each v V 0 the set of all posts in
which v occurs. Here, TV 0 (u, r) is defined as in Section 2.1, but restricted to the subgraph (V 0 , E 0 ), with
E 0 containing all edges from E whose nodes are contained in V 0 . Let p(v, V 0 ) := | posts(v, V 0 )|. The pcore at level k N is then the subgraph of (V, E) induced by V 0 , where V 0 is a maximal subset of V such
that, for all v V 0 , p(v, V 0 ) k holds.
Since p(v, V 0 ) is, for all v, a monotone function in
V , the p-core at any level k is unique [2], and we can
use the algorithm presented in [2] for its computation.
An overview on the p-cores we used for our datasets
is given in Table 2. For BibSonomy, we used k = 5
instead of 10 because of its smaller size. The largest k
for which a p-core exists is listed, for each dataset, in
the last column of Table 1.
Although the p-core as defined above breaks the
symmetry of the hypergraph structure (contrary to tags,
for users and resources the p-degree is not the same as
the natural degree in the graph) it is the natural definition for our recommender scenario. We have also performed the evaluation on the symmetric variant of p
(with lines 1 and 3 in Equation 4 modified similar to
line 2), with rather similar results.

To evaluate the proposed recommendation techniques we have chosen datasets from three different folksonomy systems: del.icio.us, BibSonomy and
last.fm. They have different sizes, different resources,
and are probably used by different people. Therefore
we assume that our observations will also be significant for other social bookmarking systems. Table 1
gives an overview on the datasets. For all datasets we
disregarded if the tags had lower or upper case, since
this is the behaviour of most systems when querying
them for posts tagged with a certain tag (although often
they store the tags as entered by the user).
Del.icio.us. One of the first and most popular folksonomy systems is del.icio.us17 which exists since the
end of 2003. It allows users to tag bookmarks (URLs)
and had according to its blog around 1.5 Mio. users in
February 2007. We used a dataset from del.icio.us we
obtained from July 27 to 30, 2005 [15].
BibSonomy. This system allows users to manage and
annotate bookmarks and publication references simultanously. Since three of the authors have participated
in the development of BibSonomy,18 we were able to
create a complete snapshot of all users, resources (both
publication references and bookmarks) and tags publicly available at April 30, 2007, 23:59:59 CEST.19
From the snapshot we excluded the posts from the
DBLP computer science bibliography20 since they are
automatically inserted and all owned by one user and
all tagged with the same tag (dblp). Therefore they do
not provide meaningful information for the analysis.
Last.fm. Audioscrobbler21 is a database that tracks
listening habits. The user profiles are built through
the use of the companys flagship product, last.fm,22 a
system that provides personalized radio stations for its
users and updates their profiles using the music they
listen to. Audioscrobbler exposes large portions of data
through their web services API. The data were gathered during July 2006, partly through the web services
API (collecting user nicknames), partly crawling the
last.fm site. Here the resources are artist names, whose
spellings are already normalized by the system.
17

18 http://www.bibsonomy.org
http://del.icio.us
On
request
to
bibsonomy@cs.uni-kassel.de
a
snapshot
of
BibSonomy
is
available
for
research
20 http://www.informatik.uni-trier.de/ley/db/
purposes.
21 http://www.audioscrobbler.net
22 http://www.last.fm
19

Table 1
Characteristics of the used datasets.
dataset
del.icio.us

|U |

|T |

|R|

|Y |

|P |

date

kmax

75,245

456,697

3,158,435

17,780,260

7,698,653

2005-07-30

BibSonomy

1,037

28,648

86,563

341,183

96,972

2007-04-30

77
7

last.fm

3,746

10,848

5,197

299,520

100,101

2006-07-01

Table 2
Characteristics of the p-cores at level k.
dataset

|U |

|T |

|R|

|Y |

|P |

del.icio.us

37,399

22,170

74,874

7,487,319

3,055,436

BibSonomy
last.fm

5
10

116
2,917

412
2,045

361
1,853

10,148
219,702

2,522
75,565

5.3. Evaluation Methodology

5.3.2. Settings of the Algorithms

For each of the algorithms of our evaluation, we will
now describe briefly the specific settings used to run it.

5.3.1. Evaluation Measures

To evaluate the recommenders we used a variant
of the leave-one-out hold-out estimation [14] which
we call LeavePostOut. In all datasets, we picked randomly, for each user u, one resource ru , which he had
posted before. The task of the recommenders was then
to predict the tags the user assigned to ru , based on
the folksonomy (U, T, R, Y 0 ) with Y 0 := Y \ ({u}
T (u, ru ) {ru }).
As performance measures we use precision and recall which are standard in such scenarios [14]. For
(U, T, R, Y 0 ), u, and ru as defined above, precision
and recall of a recommendation T(u, ru ) are defined
as follows
recall(T(u, ru )) =
precision(T(u, ru )) =

|T (u, ru ) T(u, ru )|
|T (u, ru )|

Collaborative Filtering UT. For this Collaborative

Filtering variant the neighborhood is computed based
on the user-tag matrix U T Y . The only parameter to
be tuned in the CF based algorithms is the number k
of nearest neighbors. For that, multiple runs were performed where k was successively incremented in steps
of 10 until a point where no more improvements in the
results were observed. The best values for k were 80
for del.icio.us, 20 for BibSonomy and 60 for the last.fm
dataset.
Collaborative Filtering UR. Here the neighborhood
is computed based on the user-resource matrix U R Y .
For this approach the best values for k were 100 for
del.icio.us, 30 for BibSonomy and 100 for the last.fm
dataset.

(5)

|T (u, ru ) T(u, ru )|
. (6)
|T(u, ru )|

Adapted PageRank. With the parameter d = 0.7 we

stopped computation after 10 iterations. In p~, we gave
higher weights to the user u and the resource ru at
hand: While each user, tag and resource got a preference weight of 1, u and ru got a preference weight of
1 + |U | and 1 + |R|, resp.

For each dataset, we averaged these values over all

its users:
recall =

1 X
recall(T(u, ru ))
|U |

(7)

1 X
precision(T(u, ru )) .
|U |

(8)

FolkRank. The same parameters and preference weights

were used as in the adapted PageRank.

precision =

Most Popular Tags / Most Popular Tags by Resource /

Most Popular Tags by User. These three approaches
have no parameters. They were applied as described in
Section 3.3.

This process was repeated ten times for each dataset,

each time with another resource per user, to further
minimize the variance. In the sequel, the listed recall
and precision values are thus always the averages over
all ten runs.

Most Popular Tags Mix. We computed these recommendations for all {0, 0.1, . . . , 0.9, 1}. We will
show in Section 6.1.1 that = 0.6 is the most suit9

6.1.1. Determining for Most Popular Mix

Before comparing the different algorithms described
in the previous sections, we focus on finding an appropriate for the most popular mix recommender on
the del.icio.us p-core at level 10. Therefore, we varied in 0.1-steps from 0 to 1 and plotted the resulting precision and recall; for comparison purposes we
also added the plot of the most popular mix 1:1 recommender.
As can be seen in Figure 2, the most popular tags
by user ( = 0) recommender performs worse than the
most popular tags by resource ( = 1) recommender
for all numbers of recommended tags. All mixed versions perform better than most popular tags by user
and all mixed versions with 0.5 perform better
than most popular tags by resource. The best performance is obtained for = 0.6 for the top three recommendations and = 0.7 for more than three recommendations. We conclude, that the tags which other
users used for that resource are better recommendations than the most popular tags of the user. Nevertheless, adding a small amount of popular tags of the user
to the tags from the resource increases both precision
and recall.
We observed a similar precision/recall behaviour for
the different values of on the non-pruned del.icio.us
data as well as on the BibSonomy dataset. (The results are not shown here because of space restrictions.)
For the following evaluations we decided therefore to
include the results of the most popular tags 0.6mix
recommendations only, since for the top recommendations they have the best recall and precision and for
more tags are still very close to the best results.

able of these values (at least on del.icio.us and BibSonomy), so that the comparison with the other algorithms
will be done with this setting only.
It is important to notice that not all algorithms necessarily have maximal coverage, i. e., can always recommend n tags. Since FolkRank and most popular tags
are the only algorithms with maximal coverage, the
evaluation can be perturbated if the other algorithms
cannot fill the list up to the given n. In this sense, whenever the recommendation list of an algorithm is not
filled up to n, we complete the remaining entries with
tags taken from the most popular tags that are not already in the list.

6. Results
In this section we present and describe the results of
the evaluation. We will see that all three datasets show
the same overall behavior: most popular tags is outperformed by all other approaches; the CF-UT algorithm
performs slightly better than and the CF-UR approach
approximately as good as the most popular tags by resource, and FolkRank uniformly provides significantly
better results. The results for most popular tags by user
and the most popular tags 0.6mix are different among
the datasets, however. We will further elaborate on this
later.
There are two types of diagrams. The first type of diagram (e. g., Figure 3) shows in a straightforward manner how the recall depends on the number of recommended tags. The other diagrams are usual precisionrecall plots. Here a datapoint on a curve stands for the
number of tags recommended (starting with the highest ranked tag on the left of the curve and ending with
ten tags on the right). Hence, the steady decrease of all
curves in those plots means that the more tags of the
recommendation are regarded, the better the recall and
the worse the precision will be.
Since we averaged for each dataset over ten runs, we
added error bars showing the standard deviation to the
plots. However, except for the BibSonomy dataset, the
standard deviation is so small that the error bars are
mostly hidden by the datapoint symbols.

6.1.2. Comparison of Algorithms on p-core at

Level 10
Figure 3 shows how the recall increases, when more
tags of the recommendation are used. All algorithms
perform significantly better than the baseline most popular tags and the most popular tags by user strategy
whereas it is much harder to beat the most popular
tags by resource. The most apparent result is that the
graph based FolkRank recommendations have superior
recallindependent of the number of regarded tags.
The top 10 tags given by FolkRank contained on average 80 % of the tags the users decided to attach to the
selected resource. The second best results come from
the most popular tags 0.6mix, followed by the Collaborative Filtering approach based on users tag similiarities.
The idea to suggest the most popular tags by resource results in a recall which is very similiar to

6.1. Del.icio.us
Due to the fact that the dataset from del.icio.us is
by far the largest of the three we considered, we will
discuss the results in more detail.
10

0.6
1:1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0

0.5

Precision

0.4

0.3

0.2

0.1

0
0

0.1

0.2

0.3

0.4
Recall

0.5

0.6

0.7

0.8

Fig. 2. Precision and recall of most popular tags mix 1 : 1 and most popular tags mix for {0, 0.1, . . . , 0.9, 1} on the del.icio.us p-core at
level 10.

using the CF recommender based on users resource

similiaritiesboth perform worse than the aforementioned approaches. Between most popular tags by resource and most popular tags are the adapted PageRank which is biased towards the high degree nodes, as
discussed in Section 3.2.1, and the most popular tags
by user recommendations, which again perform not so
well.
The precision-recall plot in Figure 4 extends Figure 3 with the precision measure. It again reveals
clearly the quality of the recommendations given by
FolkRank compared to the other approaches. Its precision values are systematically above those of the other
approaches. For its top recommendations, FolkRank
reaches precisions of 58.7 %.
A post in del.icio.us contains only 2.45 tags on average. A precision of 100 % can therefore not be reached
when recommending ten tags. This justifies the poor
precision of less than 20 % for all approaches when
recommending ten tags. However, from a subjective
point of view, the additional wrong tags may even

be considered as highly relevant, as the following example shows, where the user tnash has tagged the
page http://www.ariadne.ac.uk/issue43/chudnov/ with
the tags semantic, web, and webdesign. Since that page
discusses the interaction of publication reference management systems in the web by the OpenURL standard, the tags recommended by FolkRank (openurl,
web, webdesign, libraries, search, semantic, metadata,
social-software, sfx, seo) are adequate and capture not
only the users point of view that this is a webdesign
related issue in the semantic web, but also provide him
with more specific tags like libraries or metadata. The
CF based on users tag similiarities recommends very
similiar tags (openurl, libraries, social-software, sfx,
metadata, me/toread, software, myndsi, work, 2read).
The additional tags may thus animate users to use more
tags and/or tags from a different viewpoint for describing resources, and thus lead to converging vocabularies.
The essential point in this example is, however, that
FolkRank is able to predictadditionally to globally
11

1
FolkRank
adapted PageRank
Collaborative Filtering UT
Collaborative Filtering UR
most popular tags
most popular tags by resource
most popular tags by user
most popular tags 0.6-mix

0.8

Recall

0.6

0.4

0.2

0
0

4
6
Number of recommended tags

Fig. 3. Recall for del.icio.us p-core at level 10.

relevant tagsthe exact tags of the user which CF

could not. This is due to the fact that FolkRank considers, via the hypergraph structure, also the vocabulary of the user himself, which CF does not do. It was
this observation that motivated the creation of the most
popular tags mix-recommender, where wein contrast to CFinclude also the users tags in the recommendations. As the diagrams show, we succeeded
and could gain results better than those of CF and only
slightly worse than those of FolkRank.
The standard deviation for the ten runs of all algorithms on this dataset is for both precision and recall
below 3 .

Apart from the adapted PageRank, the results are similiar to the results on the the p-core at level 10, with an
overall decrease of both precision and recall. The only
algorithm which seems to profit from the remaining
long tail is the adapted PageRank. This is likely due
to the fact that the many tags in the long tail together
are able to outbalance to a certain degree the strong influence of the nodes with high edge degree. Nevertheless, it can not reach the performance of FolkRank or
the most popular tags 0.6mix.
The standard deviation for the ten runs of all algorithms on this dataset is for both precision and recall
below 2 %.

6.1.3. Comparison of Algorithms on the Unpruned

Dataset
We conclude the evaluation on del.icio.us with results on the unpruned del.icio.us dataset, see Figure 5.
Due to the long tail of users and resources which occur
in only one post, we regarded only resources and users
with at least two posts. Otherwise, most of the algorithms would not be able to produce recommendations.

6.2. BibSonomy
For this dataset, the results have a much larger standard deviation, as can be seen by the error bars in Figure 6. This is due to the fact that every run is averaging over 116 users only (cf. Table 2) and thus the performance of the ten runs differs more. Nevertheless,
12

1
FolkRank
adapted PageRank
Collaborative Filtering UT
Collaborative Filtering UR
most popular tags
most popular tags by resource
most popular tags by user
most popular tags 0.6-mix

0.8

Precision

0.6

0.4

0.2

0
0

0.2

0.4

0.6

0.8

Recall
Fig. 4. Recall and Precision for del.icio.us p-core at level 10.

the tendency of the performance of the different methods is similar to the performance on the other datasets.
FolkRank provides on average best precision and recall, followed by the most popular tags 0.6mix recommender. Both Collaborative Filtering algorithms and
most popular tags by resource show similar results for
higher numbers of tags.

FolkRank. An explanation could be the average number of tags a user has in this dataset (cf. Table 3). Compared to the del.icio.us and BibSonomy datasets, here
the average is much lower with around twelve tags.
Additionally, the average number of tags per resource
in the last.fm dataset is much higher than in the other
two datasets and in particular higher than the average
number of tags per user (in contrast to the other two
datasets, where it is the other way around). Hence, if a
user has on average only twelve tags, proposing tags he
used earlier instead of tags other users attached to the
resource provides a better chance to suggest the tags
the user finally chose. Needless to say that it would be
interesting to know, why the averages on the last.fm
dataset are so different from the other datasets. It could
depend on the rather limited domain of the resources
which can be tagged in last.fm, but might also result from the crawling strategy which was deployed to
gather this dataset.
Due to the different performance of the most popular tags by user/resource recommendations, the perfor-

6.3. Last.fm
On this dataset, FolkRank again outperforms the
other approaches. Here, its recall is considerably higher
than on the other datasets, see Figure 7. Even when just
two tags are recommended, the recall is close to 60 %
and goes up to 92 % for 10 tags. The standard deviation
for the ten runs of all algorithms on this dataset is for
both precision and recall below 7 .
The most surprising observation is, though, that here
most popular tags by user is considerably better than
most popular tags by resource and even Collaborative
Filtering, such that it is the second best algorithm after
13

1
FolkRank
adapted PageRank
Collaborative Filtering UT
Collaborative Filtering UR
most popular tags
most popular tags by resource
most popular tags by user
most popular tags 0.6-mix

0.8

Precision

0.6

0.4

0.2

0
0

0.2

0.4

0.6

0.8

Recall
Fig. 5. Recall and Precision for del.icio.us.
Table 3
Average number of tags per user and tags per resource.
1 P
1 P
|Tu |
|Tr |
dataset
|U |
|R|
del.icio.us p-core at level 10
BibSonomy p-core at level 5
last.fm p-core at level 10

59.18
31.85
11.84

25.87
14.14
44.19

approaches based on tag counts and even better than

those of state-of-the-art recommender systems like
Collaborative Filtering. The tradeoff is, thoughas
discussed in Section 4 that computation of FolkRank
recommendations is cost-intensive so that one might
prefer less expensive methods to recommend tags in a
social bookmarking system.
The most popular tags mix approach proposed in
this work has proven to be considered as a solution
for this problem. It provides results which can almost
reach the grade of FolkRank but which are extremely
cheap to generate. Especially the possibility to use index structures (which databases of social bookmarking services typically provide anyway) makes this approach a good choice for online recommendations.
Finally, despite its simplicity and non-personalized
aspect, the most popular tags achieved reasonable precision and recall on the small datasets (last.fm and BibSonomy) what indicates its adequacy for the cold start
problem.

mance of the most popular tags mix, of course, differs significantly from the results on the other datasets.
A comparison (not shown here) of different values
for showed, that the most popular tags mix on
this dataset mostly performed worse than most popular
tags by user (although always better than most popular
tags by resource).

7. Conclusion
The presented results show that the graph-based approach of FolkRank is able to provide tag recommendations which are significantly better than those of
14

1
FolkRank
adapted PageRank
Collaborative Filtering UT
Collaborative Filtering UR
most popular tags
most popular tags by resource
most popular tags by user
most popular tags 0.6-mix

0.8

Precision

0.6

0.4

0.2

0
0

0.2

0.4

0.6

0.8

Recall
Fig. 6. Recall and Precision for BibSonomy p-core at level 5.

8. Summary and Future Work

problem which is worth to be looked at for other methods, too.

In this paper we presented three classes of algorithms for tag recommendations in folksonomies:
straightforward Collaborative Filtering adaptations based
on projections, adaptations of the well-known PageRank algorithm, and simpler methods based on tag
counts. We conducted experiments on three real-life
datasets, showed that FolkRank outperforms the other
methods, and that the most popular tags mix provides a good tradeoff between recommendation performance and computational costs. The main conclusions
of our experiments were that the exploitation of the hypergraph structure in FolkRank yields a significant advantage and that simple methods based on tag counts
can provide recommendations nearly as good as the
best results with only minimal computational costs.
Currently, our approach for FolkRank always returns a fixed number of tags, often yielding low precision. Future work will include a method to determine
a good cut-off point automatically. This, of course, is a

Particularly appealing would be the inclusion of the

most popular tags mix method into some implemented social bookmarking system. Since three of the
authors are involved in the development of BibSonomy, chances are good that soon this recommendation
method will assist the users during tagging. It will be
interesting to analyse its user acceptance.
Future work further includes improving the most
popular tags mix method. One idea is to study different normalization aspects like the introduction of a
frequency dependent normalization. This would allow
to incorporate the differences in the tag frequency distributions of users and resources.
15

1
FolkRank
adapted PageRank
Collaborative Filtering UT
Collaborative Filtering UR
most popular tags
most popular tags by resource
most popular tags by user
most popular tags 0.6-mix

0.8

Precision

0.6

0.4

0.2

0
0

0.2

0.4

0.6

0.8

Recall
Fig. 7. Recall and Precision for Last.fm p-core at level 10

Acknowledgements

[4] John S. Breese, David Heckerman, and Carl Kadie. Empirical

analysis of predictive algorithms for collaborative filtering. In
Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, pages 4352, 1998.

Part of this research was funded by the EU in

the Nepomuk23 (FP6-027705), Tagora24 (FP6-200534721), and the X-Media25 (IST-FP6-026978) projects.

[5] Sergey Brin and Lawrence Page. The Anatomy of a LargeScale Hypertextual Web Search Engine. Computer Networks
and ISDN Systems, 30(1-7):107117, April 1998.
[6] Andrew Byde, Hui Wan, and Steve Cayzer. Personalized
tag recommendations via tagging and content-based similarity
metrics. In Proceedings of the International Conference on Weblogs and Social Media, March 2007.

References
[1] Pierpaolo Basile, Domenico Gendarmi, Filippo Lanubile, and
Giovanni Semeraro. Recommending smart tags in a social
bookmarking system. In Bridging the Gep between Semantic
Web and Web 2.0 (SemNet 2007), pages 2229, 2007.

[7] Ciro Cattuto, Vittorio Loreto, and Luciano Pietronero.

Collaborative tagging and semiotic dynamics, May 2006.
http://arxiv.org/abs/cs/0605015.
[8] Mukund Deshpande and George Karypis. Item-based top-n
recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143
177, 2004.

[2] V. Batagelj and M. Zaversnik. Generalized cores, 2002.

cs.DS/0202039, http://arxiv.org/abs/cs/0202039.
[3] D. Benz, K. Tso, and L. Schmidt-Thieme. Automatic bookmark classification: A collaborative approach. In Proceedings
of the Second Workshop on Innovations in Web Infrastructure
(IWI 2006), Edinburgh, Scotland, 2006.
23
24

[9] M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, and

A. Tomkins. Visualizing tags over time. In Proc. of the 15th
International WWW Conference, Edinburgh, Scotland, 2006.
[10] Claudiu-S Firan, Wolfgang Nejdl, and Raluca Paiu. The benefit of using tag-based profiles. In 5th Latin American Web
Congress, October 31 - November 2 2007, Santiago de Chile,
2007.

http://nepomuk.semanticdesktop.org
http://www.tagora-project.eu 25 http://www.x-media-project.org

[11] Bernhard Ganter and Rudolf Wille. Formal Concept Analysis: Mathematical Foundations. Springer, Berlin Heidelberg,
1999.
[12] H. Halpin, V. Robu, and H. Shepard. The dynamics and semantics of collaborative tagging. In Proceedings of the 1st Semantic Authoring and Annotation Workshop (SAAW06), 2006.
[13] Tony Hammond, Timo Hannay, Ben Lund, and Joanna Scott.
Social Bookmarking Tools (I): A General Review. D-Lib Magazine, 11(4), April 2005.
[14] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen,
and John T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22(1):553, 2004.
[15] Andreas Hotho, Robert Jaschke, Christoph Schmitz, and Gerd
Stumme. Information retrieval in folksonomies: Search and
ranking. In York Sure and John Domingue, editors, The Semantic Web: Research and Applications, volume 4011 of Lecture
Notes in Computer Science, pages 411426, Heidelberg, June
2006. Springer.
[16] Andreas Hotho, Robert Jaschke, Christoph Schmitz, and Gerd
Stumme. Trend detection in folksonomies. In Yannis S.
Avrithis, Yiannis Kompatsiaris, Steffen Staab, and Noel E.
OConnor, editors, Proc. First International Conference on Semantics And Digital Media Technology (SAMT), volume 4306
of LNCS, pages 5670, Heidelberg, Dec 2006. Springer.
[17] Robert Jaschke, Leandro Balby Marinho, Andreas Hotho, Lars
Schmidt-Thieme, and Gerd Stumme. Tag recommendations
in folksonomies. In Joost N. Kok, Jacek Koronacki, Ramon Lopez de Mantaras, Stan Matwin, Dunja Mladenic, and
Andrzej Skowron, editors, Knowledge Discovery in Databases:
PKDD 2007, 11th European Conference on Principles and
Practice of Knowledge Discovery in Databases, volume 4702
of Lecture Notes in Computer Science, pages 506514, Berlin,
Heidelberg, 2007. Springer.
[18] Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604632, 1999.
[19] F. Lehmann and R. Wille. A triadic approach to formal concept
analysis. In G. Ellis, R. Levinson, W. Rich, and J. F. Sowa,
editors, Conceptual Structures: Applications, Implementation
and Theory, volume 954 of Lecture Notes in Computer Science.
Springer, 1995.
[20] Ben Lund, Tony Hammond, Martin Flack, and Timo Hannay.
Social Bookmarking Tools (II): A Case Study - Connotea. DLib Magazine, 11(4), April 2005.
[21] Leandro Balby Marinho and Lars Schmidt-Thieme. Collaborative tag recommendations. In Proceedings of 31st Annual Conference of the Gesellschaft fur Klassifikation (GfKl), Freiburg.
Springer, 2007.
[22] Adam Mathes.
Folksonomies Cooperative Classification and Communication Through Shared Metadata, December 2004. http://www.adammathes.com/academic/computermediated-communication/folksonomies.html.
[23] Peter Mika. Ontologies Are Us: A Unified Model of Social Networks and Semantics. In Yolanda Gil, Enrico Motta,
V. Richard Benjamins, and Mark A. Musen, editors, ISWC
2005, volume 3729 of LNCS, pages 522536, Berlin Heidelberg, November 2005. Springer-Verlag.
[24] Gilad Mishne. Autotag: a collaborative approach to automated
tag assignment for weblog posts. In WWW 06: Proceedings
of the 15th international conference on World Wide Web, pages
953954, New York, NY, USA, 2006. ACM Press.

[25] P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm, and J. Riedl.

GroupLens: An Open Architecture for Collaborative Filtering
of Netnews. In Proceedings of ACM 1994 Conference on Computer Supported Cooperative Work, pages 175186, Chapel
Hill, North Carolina, 1994. ACM.
[26] Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and
John Reidl. Item-based collaborative filtering recommendation
algorithms. In World Wide Web, pages 285295, 2001.
[27] Christoph Schmitz, Andreas Hotho, Robert Jaschke, and Gerd
Stumme. Mining association rules in folksonomies. In

V. Batagelj, H.-H. Bock, A. Ferligoj, and A. Ziberna,

editors,
Data Science and Classification: Proc. of the 10th IFCS Conf.,
Studies in Classification, Data Analysis, and Knowledge Organization, pages 261270, Berlin, Heidelberg, 2006. Springer.
[28] Gerd Stumme. A finite state model for on-line analytical processing in triadic contexts. In Bernhard Ganter and Robert
Godin, editors, ICFCA, volume 3403 of Lecture Notes in Computer Science, pages 315328. Springer, 2005.
[29] Karen Tso-Sutter, Leandro Balby Marinho, and Lars SchmidtThieme. Tag-aware recommender systems by fusion of collaborative filtering algorithms. In Proceedings of 23rd Annual
ACM Symposium on Applied Computing (SAC08), Edinburgh,
Scotland, 2007.
[30] W. Xi, B. Zhang, Y. Lu, Z. Chen, S. Yan, H. Zeng, W. Ma,
and E. Fox. Link fusion: A unified link analysis framework for
multi-type interrelated data objects. In Proc. 13th International
World Wide Web Conference, New York, 2004.
[31] Yanfei Xu, Liang Zhang, and Wei Liu. Cubic analysis of social
bookmarking for personalized recommendation. Frontiers of
WWW Research and Development - APWeb 2006, pages 733
738, 2006.
[32] Zhichen Xu, Yun Fu, Jianchang Mao, and Difu Su. Towards the
semantic web: Collaborative tag suggestions. In Proceedings
of the Collaborative Web Tagging Workshop at the WWW 2006,
Edinburgh, Scotland, 2006.

Content-Based Recommendation in Social Tagging Sys
No ratings yet
Content-Based Recommendation in Social Tagging Sys
5 pages
Towards More Targeted Recommendations in Folksonomies
No ratings yet
Towards More Targeted Recommendations in Folksonomies
18 pages
Recommending Tags For New Resources
No ratings yet
Recommending Tags For New Resources
14 pages
Web Page Recommender System Based On Folksonomy
No ratings yet
Web Page Recommender System Based On Folksonomy
6 pages
Automatic Tag Recommendation Algorithms For Social Recommender Systems
No ratings yet
Automatic Tag Recommendation Algorithms For Social Recommender Systems
35 pages
Information Retrieval in Folksonomies: Search and Ranking
No ratings yet
Information Retrieval in Folksonomies: Search and Ranking
15 pages
Semantic Annotation From Social Data
No ratings yet
Semantic Annotation From Social Data
13 pages
MAP-based Image Tag Recommendation Using A Visual Folksonomy
No ratings yet
MAP-based Image Tag Recommendation Using A Visual Folksonomy
7 pages
Folkso Viz
No ratings yet
Folkso Viz
2 pages
Article 6
No ratings yet
Article 6
16 pages
T4 - Folksonomies - Digital Collaborative Classification and Tagging
No ratings yet
T4 - Folksonomies - Digital Collaborative Classification and Tagging
2 pages
Collaborative Tagging Systems
No ratings yet
Collaborative Tagging Systems
8 pages
Collaborative Tagging Systems Structure Whitepaper
No ratings yet
Collaborative Tagging Systems Structure Whitepaper
7 pages
Semantic Web Recommender Systems
No ratings yet
Semantic Web Recommender Systems
12 pages
16 Vol 2 No 2
No ratings yet
16 Vol 2 No 2
4 pages
Social Tag Prediction
No ratings yet
Social Tag Prediction
8 pages
8095-Article Text-11622-1-2-20201228
No ratings yet
8095-Article Text-11622-1-2-20201228
7 pages
10 1 1 99
No ratings yet
10 1 1 99
14 pages
Katakis Ecmlpkdd08 Challenge
No ratings yet
Katakis Ecmlpkdd08 Challenge
9 pages
Usage Patterns of Collaborative Tagging Systems: Scott A. Golder and Bernardo A. Huberman
No ratings yet
Usage Patterns of Collaborative Tagging Systems: Scott A. Golder and Bernardo A. Huberman
11 pages
Other Techiniques
No ratings yet
Other Techiniques
63 pages
Frazier
No ratings yet
Frazier
25 pages
Associative Tag Recommendation Exploiting Multiple Textual Features
No ratings yet
Associative Tag Recommendation Exploiting Multiple Textual Features
44 pages
Technologies: Search For Meaning: Collaborative Tagging, Emergent Ontologies
No ratings yet
Technologies: Search For Meaning: Collaborative Tagging, Emergent Ontologies
17 pages
Can All Tags Be Used For Search
No ratings yet
Can All Tags Be Used For Search
10 pages
Recommender Systems Overview
No ratings yet
Recommender Systems Overview
72 pages
An Integrated Approach To Extracting Ontological S
No ratings yet
An Integrated Approach To Extracting Ontological S
16 pages
Communityrs
No ratings yet
Communityrs
8 pages
Project Report Music Recommendation System
No ratings yet
Project Report Music Recommendation System
5 pages
Travel Companion: Keywords:-Blockchain, Machine Learning, Hybrid Filtering
No ratings yet
Travel Companion: Keywords:-Blockchain, Machine Learning, Hybrid Filtering
5 pages
Towards The Semantic Web: Collaborative Tag Suggestions
No ratings yet
Towards The Semantic Web: Collaborative Tag Suggestions
8 pages
Music Recommendation
100% (1)
Music Recommendation
113 pages
Rohan AIML
No ratings yet
Rohan AIML
14 pages
Integrating Folksonomies With The Semantic Web Lucia Specia and Enrico Motta
No ratings yet
Integrating Folksonomies With The Semantic Web Lucia Specia and Enrico Motta
15 pages
Module 6 - Link Analysis Recommendation Systems
No ratings yet
Module 6 - Link Analysis Recommendation Systems
68 pages
RecommenderSystems Shortened
No ratings yet
RecommenderSystems Shortened
95 pages
RecommenderSystemsAnIntroduction PDF
100% (3)
RecommenderSystemsAnIntroduction PDF
353 pages
Music Recommender System Guide
No ratings yet
Music Recommender System Guide
11 pages
Recommender Systems Asanov
No ratings yet
Recommender Systems Asanov
7 pages
Temporal and Social Context Based Burst Detection From Folksonomies
No ratings yet
Temporal and Social Context Based Burst Detection From Folksonomies
6 pages
An Improved Personalized Filtering Recommendation Algorithm:, Sun Mingyang, Liu Xidong
No ratings yet
An Improved Personalized Filtering Recommendation Algorithm:, Sun Mingyang, Liu Xidong
10 pages
Personal Metadata For Cataloguing Microcontent - Linking To Large Ontologies?
No ratings yet
Personal Metadata For Cataloguing Microcontent - Linking To Large Ontologies?
7 pages
Research Paper
No ratings yet
Research Paper
7 pages
DM Lect 6 - Recommender Systems
No ratings yet
DM Lect 6 - Recommender Systems
46 pages
Plurality
No ratings yet
Plurality
2 pages
Friendbook: A Semantic-Based Friend Recommendation System For Social Networks
No ratings yet
Friendbook: A Semantic-Based Friend Recommendation System For Social Networks
6 pages
EAS Journal Template
No ratings yet
EAS Journal Template
7 pages
Social Tagging As A Search Strategy
No ratings yet
Social Tagging As A Search Strategy
112 pages
On Rec Sys
No ratings yet
On Rec Sys
145 pages
Music Recommendation System
No ratings yet
Music Recommendation System
22 pages
Survey Recomender System Algorithm
No ratings yet
Survey Recomender System Algorithm
33 pages
A Sequence-Oblivious Generation Method For Context
No ratings yet
A Sequence-Oblivious Generation Method For Context
11 pages
Mining Social Tags To Predict Mashup Patterns
No ratings yet
Mining Social Tags To Predict Mashup Patterns
8 pages
8tracks Style Dynamic Tag Search Handoff
No ratings yet
8tracks Style Dynamic Tag Search Handoff
18 pages
Content Bookmarking and Recommendation: Ananth V Yasara Yamut Sat Yabrata Behera M Anideep at Tanti Ganesh Ramakrishnan
No ratings yet
Content Bookmarking and Recommendation: Ananth V Yasara Yamut Sat Yabrata Behera M Anideep at Tanti Ganesh Ramakrishnan
10 pages
Material Specification Codes
No ratings yet
Material Specification Codes
56 pages
Ducted System Handing Over Report: Form SH-1
No ratings yet
Ducted System Handing Over Report: Form SH-1
4 pages
Sheet Metal Probelm Bending
No ratings yet
Sheet Metal Probelm Bending
3 pages
Sulzer Diesel: Explanations On The Use of The Operating Manual and Cross Section
No ratings yet
Sulzer Diesel: Explanations On The Use of The Operating Manual and Cross Section
5 pages
Micom Agile P543/P545: Grid Solutions
No ratings yet
Micom Agile P543/P545: Grid Solutions
8 pages
Compressive Strength
No ratings yet
Compressive Strength
9 pages
Canoe Slalom Gate Homologation Guide
No ratings yet
Canoe Slalom Gate Homologation Guide
10 pages
Water Supply Layout for Technical Building
No ratings yet
Water Supply Layout for Technical Building
1 page
SM-J600F Common EPLIS 11
100% (1)
SM-J600F Common EPLIS 11
45 pages
BS en 1998-5-2024 Eurocode 8 Design of Structures For Earthquake
100% (1)
BS en 1998-5-2024 Eurocode 8 Design of Structures For Earthquake
125 pages
Advancement of Relays in Motor Protection
No ratings yet
Advancement of Relays in Motor Protection
75 pages
Materials For Sour Service Nao SPC D MW 001
100% (1)
Materials For Sour Service Nao SPC D MW 001
26 pages
OLI Engine in Aspen HYSYS V8.8 Installation Guide
No ratings yet
OLI Engine in Aspen HYSYS V8.8 Installation Guide
20 pages
Waukesha Valve Adjustment All VHP Engines
100% (1)
Waukesha Valve Adjustment All VHP Engines
6 pages
White Paper IFS Applications 8 Architecture and Technology
No ratings yet
White Paper IFS Applications 8 Architecture and Technology
24 pages
Pipeline Design for Engineers
No ratings yet
Pipeline Design for Engineers
11 pages
Directory Info (Dics)
100% (1)
Directory Info (Dics)
347 pages
Lean Layout for CNC Efficiency
No ratings yet
Lean Layout for CNC Efficiency
8 pages
SM Aqueous Corrosion Book
No ratings yet
SM Aqueous Corrosion Book
61 pages
Billing For RJR
No ratings yet
Billing For RJR
19 pages
No 1-1950-Proceedings of First Conference On Coastal Engineering - Long Beach - California - October - 1950
No ratings yet
No 1-1950-Proceedings of First Conference On Coastal Engineering - Long Beach - California - October - 1950
359 pages
Advertisement Teletalk
No ratings yet
Advertisement Teletalk
3 pages
Training Report of Ushp-II, Saqib Maqbool
No ratings yet
Training Report of Ushp-II, Saqib Maqbool
28 pages
Tso C139
No ratings yet
Tso C139
5 pages
Flashcards - CP 08 Environmental Conditions and Water Uptake - Edexcel (B) Biology A-Level
No ratings yet
Flashcards - CP 08 Environmental Conditions and Water Uptake - Edexcel (B) Biology A-Level
17 pages
Engineering Ethics Case Studies
50% (4)
Engineering Ethics Case Studies
3 pages
Sap PP Resume1
No ratings yet
Sap PP Resume1
5 pages
Product MANUAL: L110H, L120H
100% (7)
Product MANUAL: L110H, L120H
42 pages
1.1 Belt & Bucket Elevator Instruction Manual
No ratings yet
1.1 Belt & Bucket Elevator Instruction Manual
60 pages
Preview NACE+SP0775-2018
No ratings yet
Preview NACE+SP0775-2018
4 pages

Tag Recommendations

Uploaded by

Tag Recommendations

Uploaded by

1

Knowledge & Data Engineering Group (KDE),

2.1. A Formal Model for Folksonomies

2. Recommending TagsProblem Definition and

2.2. Problem Definition

Most recommender systems are typically used to

Recommender systems (RS) in general recommend

9 We make the BibSonomy dataset publicly available for research

be applied to folksonomies disregarding the domain

2.3. Related Work

Nuk := argmax sim(~xu , ~xv )

recommended tags, we compute first Nuk as described

sim(~xu , ~xv )(v, t, r) (1)

where (v, t, r) := 1 if (v, t, r) Y and 0 else.

The web search algorithm PageRank [5] reflects the

where the superscript in the argmax function indicates

3.2.1. Folksonomy-Adapted PageRank

This idea was extended in a similar fashion to bipartite subgraphs

page to a tag or a user page), but from time to time

3.3. Most Popular Tags

3.2.2. FolkRankTopic-Specific Ranking

This approach suffers only minimally from coldstart problems.

In [15] we showed that w

3. Since users might have specific interests for

important tag(s) having weight 0. A pre-defined factor

We call this method the most popular tags mix.

3.3.2. Mix of Most Popular Tags Recommenders

In an online scenario, where tag recommendations

This way of mixing will be called most popular tags

4.1. Collaborative Filtering

|Yt,r | mint0 T |Yt0 ,r |

2. Neighborhood formation: In traditional user-based

4.3. Most Popular Tags

If we want to compute, for a given pair (u, r),

Hence, the whole complexity given the three steps

Since Yu is only a small part of Y , CF and the most

4.2. The Graph-Based Approach

5.2. Core Computation

{(v, S, r) | r R0 , S = TV 0 (v, r)}

{(u, S, v) | u U 0 , S = TV 0 (u, v)}

5.3. Evaluation Methodology

5.3.2. Settings of the Algorithms

5.3.1. Evaluation Measures

Collaborative Filtering UT. For this Collaborative

Adapted PageRank. With the parameter d = 0.7 we

For each dataset, we averaged these values over all

FolkRank. The same parameters and preference weights

Most Popular Tags / Most Popular Tags by Resource /

This process was repeated ten times for each dataset,

6.1.1. Determining for Most Popular Mix

6.1.2. Comparison of Algorithms on p-core at

using the CF recommender based on users resource

Fig. 3. Recall for del.icio.us p-core at level 10.

relevant tagsthe exact tags of the user which CF

6.1.3. Comparison of Algorithms on the Unpruned

approaches based on tag counts and even better than

8. Summary and Future Work

problem which is worth to be looked at for other methods, too.

Particularly appealing would be the inclusion of the

[4] John S. Breese, David Heckerman, and Carl Kadie. Empirical

Part of this research was funded by the EU in

[7] Ciro Cattuto, Vittorio Loreto, and Luciano Pietronero.

[2] V. Batagelj and M. Zaversnik. Generalized cores, 2002.

[9] M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, and

[25] P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm, and J. Riedl.

V. Batagelj, H.-H. Bock, A. Ferligoj, and A. Ziberna,

You might also like