=Paper=
{{Paper
|id=Vol-1922/paper3
|storemode=property
|title=Leveraging User-Interactions for Time-Aware Tag Recommendations
|pdfUrl=https://ceur-ws.org/Vol-1922/paper3.pdf
|volume=Vol-1922
|authors=Daniel Zoller,Stephan Doerfel,Christian Pölitz,Andreas Hotho
|dblpUrl=https://dblp.org/rec/conf/recsys/ZollerDPH17
}}
==Leveraging User-Interactions for Time-Aware Tag Recommendations==
Leveraging User-Interactions for Time-Aware Tag
Recommendations
Daniel Zoller Stephan Doerfel
University of Würzburg Micromata GmbH
Data Mining and Information Retrieval (DMIR) Group Kassel
zoller@informatik.uni-wuerzburg.de s.doerfel@micromata.de
Christian Pölitz Andreas Hotho
University of Würzburg University of Würzburg
Data Mining and Information Retrieval (DMIR) Group Data Mining and Information Retrieval (DMIR) Group
poelitz@informatik.uni-wuerzburg.de L3S Research Center
hotho@informatik.uni-wuerzburg.de
ABSTRACT taking all of a user’s previous tag assignments into account. Next to
For the popular task of tag recommendation, various (complex) its strong performance, the method has another significant advan-
approaches have been proposed. Recently however, research has tage over comparably effective methods: The recommendations can
focused on heuristics with low computational effort and particularly, be computed online – no offline training is required. The latter can
a time-aware heuristic, called BLL, has been shown to compare well be a decisive factor for system operators, especially where tagging
to various state-of-the-art methods. Here, we follow up on these is only a secondary feature and expensive offline training seems out
results by presenting another time-aware approach leveraging user- of proportion due to the amount of required resources (hardware,
interaction data in an easily interpretable, on-the-fly computable time, and expertise).
approach that can successfully be combined with BLL. In this paper, we follow the incentive to build such lightweight
We investigate the influence of time as a parameter in that but effective methods. To that end, we use BLL and augment it
approach, and we demonstrate the effectiveness of the proposed with a time-aware heuristic relying on a new source of data, that is
method using two datasets from the popular public social tagging usually available in all tagging systems and can be exploited easily
system BibSonomy. and on the fly: the interactions of users with the system. We exploit
the context of requested pages in a tagging system to generate
KEYWORDS tags as recommendation candidates. We demonstrate that for the
selection of those requests, time is a critical factor. We evaluate
tag recommendation; user behavior; time-aware
the success of our approach on a dataset of the public real-world
tagging system BibSonomy.
1 INTRODUCTION Contributions: The main contributions of this investigation are:
Tagging allows users to easily organize and share resources online.
Users add keywords, called “tags”, to resources and store them (1) We describe a heuristic to harvest tag recommendation
together, thus enabling themselves and others to retrieve these candidates from user interactions with the system before
resources using those tags as cues. Tagging is the basis of social posting a resource.
bookmarking systems, such as Delicious,1 BibSonomy,2 or Flickr.3 (2) We test two variants of a tag recommender, that utilize the
However, it has also long found its way as a secondary feature into extracted recommendation candidates, one using a fix num-
many applications, like web shops, wikis, blogs, or libraries. ber of the most recent interactions and one using specific
The task of recommending tags has proven to be a fruitful line of time-windows over which interactions are considered.
research. Many approaches (see Section 3) utilize complex or com- (3) We evaluate the overall gain in performance for recom-
putationally costly models, often relying on the full data collected mendations of the interaction-based heuristics, as well as
in the system. However, [10] already showed that simple and fast a hybridized version in combination with BLL.
popularity-based heuristics can achieve comparably good results
as more difficult and more expensive state-of-the-art methods. Our results demonstrate that the time-aware approach provides
Recently, [19] confirmed this tendency in a large-scale experi- better suggestions than the variant using a fixed number of previ-
ment on multiple datasets, using a time-aware heuristic that trans- ous interactions. Furthermore, a hybrid combining the time-aware,
fers the base-level learning (BLL) equation to social tagging. BLL interaction-based recommender with BLL outperforms plain BLL.
scores a tag based on the time that has passed since it was last used, Moreover (like plain BLL), our proposed recommender is indepen-
dent of the tagged resources (making no use of their contents) and
1 https://del.icio.us (for storing web links)
2 https://www.bibsonomy.org (for storing web links and publications) respects the requirement of easy, online computability. We expect
3 https://www.flickr.com (for storing images) our approach to be relevant for the tag recommender community,
as well as for operators of web systems who want to support their
RecTemp Workshop @ ACM RecSys ’17, Como, Italy users with tag recommendations without spending effort and re-
Copyright © 2017 for this paper by its authors. Copying permitted for private and sources on costly optimization procedures.
academic purposes.
RecTemp Workshop @ ACM RecSys ’17, August 2017, Como, Italy Daniel Zoller, Stephan Doerfel, Christian Pölitz, and Andreas Hotho
The remainder of this paper is structured as follows: First, we much) smaller than the complete list of a user’s previously used
introduce our recommender (Section 2) that leverages user inter- tags, let alone the complete historical posting data in the system.
action in a tagging system and the incorporation of the aspect of
time. Then, we discuss related work in Section 3. We describe the 2.2 Hybrid with BLL
dataset and experimental setup in Section 4. Next, we present our
Both proposed variants of the tag recommender cannot provide
evaluation results for time-based and user interaction-based recom-
results for new users that post resources immediately after regis-
menders in Section 5. Finally, we conclude the paper in Section 7,
tration without interacting with the system. Also, it is not unusual
after discussing our findings in Section 6.
that users do not interact with the system before they store a new
resource. This fact also impedes the time-aware variant LRt . To
2 RECOMMENDERS
compensate, we combine each of the interaction-based methods
Tag recommenders provide recommendations during the process with BLL into a hybrid recommender. BLL has been shown to out-
of posting a resource. When a user u is posting a resource r , the perform other (time-dependent) approaches (see Section 3). It uses
recommender will provide a list of tags that u might want to assign all tags previously used by the active user and computes a recency-
to r . For that task, we propose and describe a new time-aware, based ranking on them: Let Tu be the set of tags previously used
interaction-driven recommender which exploits the previous inter- by user u and timep(p) the timestamp when user u stored post p.
actions of the active user (the one to provide recommendations for) Further, let Yt,u be the set of tag assignments for tag t of the user u
with the tagging system in this section. Lastly, we describe a hybrid (i.e., we add a tuple (u, r , t) every time a user u annotates a resource
approach, combining our approach with BLL (Trattner et al. [19]). r with tag t to the corresponding set), and time(y) be the timestamp
of the tag assignment y ∈ Yt,u . The BLL-score of each tag in Tu
2.1 Time-Aware Interaction-Driven is calculated as ln( y ∈Yt,u (timep(p) − time(y))−d ) and normalized
Í
Recommenders by the softmax function over all scores. We set d = 0.5 for our
In tagging systems, and more so in systems where tagging is a evaluation – the setting which obtained the best results in [19].
secondary feature, a user’s interactions with the system are much Our interaction-based recommender approaches are combined
more diverse than just adding and tagging new resources. Still, tag with BLL in the following way: First, the interaction-based method
recommendation approaches commonly focus only on the result is used to compute a ranked list L 1 of candidates (possibly of length
of these interactions (the tagged resources) to compute candidates 0). Similarly, BLL also yields a ranked list L 2 of candidate tags. From
to recommend. In this paper, we propose going beyond only the L 2 , we remove all tags that also occur in L 1 . Then the two lists are
tagging activities and consider the context of all interactions that a concatenated, such that the final ranked list of recommendations
user has with the system. To generate meaningful recommendations contains the suggestions from L 1 followed by those from L 2 . We
for a resource r , we focus especially on the most recent interactions, denote the two resulting hybrids by LRt + BLL and LRn + BLL.
as they are likely to belong to the same context as r . The rationale
behind the approach is, that by browsing and searching resources, 3 RELATED WORK
users reveal their current interests, which are useful to select fitting
Recommending tags can serve various purposes, such as increasing
tags. A user’s interactions occur in the requests she sends to the
the chance of getting a resource annotated, reminding a user what
system. Therefore, we call our recommender approach Last Requests
a resource is about, and consolidating the vocabulary across users.
(LR). To select requests from which candidate tags are computed
Furthermore, as Sood et al. [18] pointed out, tag recommenders
we distinguish two options: First, a time-window based variant
lower the effort of annotation by changing the process from a
LRt : When user u stores a new resource r , recommendations are
generation to a recognition task: rather than “inventing” tags, the
computed using all previous requests that u made during the time
user only needs to select some of the recommended tags.
frame of length d immediately preceding the current posting of r .
Since the emergence of social bookmarking, the topic of tag rec-
In the second variant, called LRn , only the last n interactions of u
ommendations has raised considerable interest among researchers.
are considered. While the first variant is explicitly time-aware and
As for all recommender domains, tag recommender algorithms can
thus more likely to capture interactions relevant in the context of
roughly be classified into three classes: Content-based algorithms
the active post, the second variant is more broadly applicable, as
use the content of resources, for instance to compute similarities
users do not necessarily interact with the system prior to posting a
between items and to present items that are similar to the ones the
new resource (e.g., when using a posting tool, like a bookmarklet
active user previously liked. Collaborative algorithms make use of
or browser add-on). In such a case, older but perhaps still relevant
the relations between the users and the items, for instance by iden-
information could be used.
tifying similar users and suggesting items similar users liked. The
In both variants, recommendation candidates are derived from
third class are algorithms that exploit both data sources, sometimes
each of the considered interactions. To yield a ranked list of rec-
called hybrid recommenders. A major drawback of content-based
ommendations, the resulting tags are ordered by their frequency
approaches is that the usability of a resource’s content depends
among the interactions.4 We describe methods for deriving recom-
on the type of resource. For example, when the tagged resources
mendable tags from interactions in Section 4.3. Both recommenders
are textual, using words from those resources (e.g., from the title,
can compute the candidates more efficiently than other recom-
like Lipczak et al. [11] did) can be successful. However, the same
menders that leverage all previously used tags of a user, because
is much harder when the resources are images. Moreover, even
they only compute frequencies on a set of tags which is (usually
for textual resources, the implementation of the recommendation
4 In the case of ties, we return the tags in lexicographic order. (e.g., the word selection strategy) is resource dependent. Thus, since
Leveraging User-Interactions for Time-Aware Tag Recommendations RecTemp Workshop @ ACM RecSys ’17, August 2017, Como, Italy
our goal is to create a simple and highly versatile recommender, no extra data structures nor precomputed values (see Section 2.2),
we focus only on the second class of recommender algorithms – we use this approach as our baseline. Our method is similar to that
those that exploit the folksonomy structure between users, tags, of Yin et al. [20], however, instead of using tags of the previous
and resources, which exists in any tagging system. posts, we use tags extracted from previous user interactions with
An evaluation of collaborative algorithms, such as collaborative the tagging system. We will show that the time-frame for collecting
filtering, the FolkRank algorithm [9], and simpler, popularity-based such interactions is critical and that time-frames of less than a day
methods was performed in [10] on various datasets. FolkRank out- are worth considering.
performed the other methods. However, the hybrid heuristic based
recommender, that combined users’ frequently used tags with tags 4 EXPERIMENTAL SETUP
that were frequently used to annotate the resource, was second. In this section, we introduce the tagging system BibSonomy, of
Rendle and Schmidt-Thieme [16] produced recommendations with which we use data for our study. We further describe the datasets
a statistical method based on factor models. They factorized the with all preprocessing steps, the experiments and their evaluation.
folksonomy structure to find latent interactions between users, re-
sources and tags. Using a variant of the stochastic gradient descent 4.1 BibSonomy
algorithm, the authors optimized an adaptation of the Bayesian
BibSonomy [1] allows users to collect references to (scientific) pub-
Personal Ranking criterion [15]. Seitlinger et al. [17] proposed an
lications and bookmarks to websites, and to annotate these with
approach that simulates human category learning in a three-layer
arbitrary keywords, so called tags. While entering the metadata
connectionist network. In the input layer, Latent Dirichlet Allo-
of a resource in a form, the user can also enter tags, which she
cation is used to characterize the resource (and user). [14] intro-
can later use for retrieval. The system assists the user with a tag
duced a slightly different folksonomy graph model in which edges
autocompletion relying on her previously used tags. Next to the
are weighted and directed. On the resulting graph, PageRank is
possibility of filtering resources by tag and/or user, users can find
used to produce a ranking of tags. [12] proposed ‘TagRank’, a vari-
new interesting resources through a full text search. Additionally,
ant of topic-sensitive PageRank upon a tag-tag correlation graph
users can form groups in which they can share posts and litera-
which they integrate into a hybrid with collaborative filtering and
ture. On group pages, all group members’ resources are displayed.
popularity-based algorithms. The selection of the algorithms for
Overview pages for websites and publications enable users to see
the hybrid is guided by a greedy algorithm. A drawback of all the
who else bookmarked a specific resource and the tags they used
presented algorithms is their reliance on complex methodology that
to describe the resource. Detail pages for publications show the
uses the full corpus of folksonomy data to learn a recommendation
metadata that the user who saved the resource had entered for the
model. While they are suitable approaches to boost performance,
publication. While browsing the system, a user can copy resources
they also require a lot of effort in terms of additional computation
of other users into her own collection. Another feature allows users
time, hardware, implementation (e.g., additional data structures,
to group tags to concepts (e.g., the tags "time" and "tag" to the con-
methods to update the trained models), and expertise. Due to the
cept "recsys"). These concepts can be used for retrieving resources.
fact that a folksonomy changes over time, the learned models must
BibSonomy is a popular target for spammers, that is, for users who
be updated regularly to fit the current data.
store links to advertisements to promote their visibility. For that
In [21], the authors introduce GIRP, a temporal tag usage pattern
reason, users are classified by a learning algorithm and manually
model. It uses an exponential function that considers the first- and
by the system’s administrators. For our analysis, we only used data
last-time usage of a tag. A short-term interests model is proposed
generated by users that were not marked as spammers.
in [20], recommending the most popular tags of users based on re-
cent data, that is, data from a time-window of fixed length (one day
4.2 Datasets
or higher). It is found that a window of 30 days works best on the
overall BibSonomy dataset. Recently, Trattner et al. [19] presented Our experiments rely on two types of data gathered from the real-
a comprehensive study of various tag recommender strategies, in- world tagging system BibSonomy: posts and user interactions. The
cluding their own development based on a model of human memory latter type of data is rarely published – due to privacy concerns.
(BLL). In contrast to GIRP [21], BLL models the temporal tag usage However, BibSonomy makes such data available to researchers
using a power function rather than an exponential function (see Sec- in the form of collected HTTP-request server logs.5 Thus, at the
tion 2.2). They compare BLL with other methods (including several moment, BibSonomy is the only source enabling the analyses pre-
of those mentioned above) and find that BLL performs better than sented here. The methodology, however, is transferable to other
the time-dependent algorithm GIRP and other methods based on tagging systems.
matrix factorization. Only more computationally expensive models Request Log Data: The request log data contains every web re-
achieve a higher F-score on the evaluated datasets. Most of these quest any user made to the tagging system. We removed all non-
models extend basic models by re-ranking the tag candidates by human requests like redirects to other pages, or requests by bots or
the semantic context of the resource. other applications (using the user agent information). Also, we only
In this work, we assume the perspective of a tagging system considered requests to HTML sites and excluded system pages (e.g.,
operator or, respectively, the operator of a system that includes the login page). We used two different time frames for the evalu-
tagging as a secondary feature. We aim at supporting the tagging ation: (i) from 2006-01-01 (the start date of BibSonomy) through
process with as little cost as possible while still delivering good 2011-12-31 (a dataset already used in behavioral analyses in previ-
results. Following the strong results of BLL in [19], and given its ous work [5]) and (ii) a more recent share, ranging from 2014-07-01
low computational effort and the convenient fact that it requires 5 http://www.kde.cs.uni-kassel.de/bibsonomy/dumps
RecTemp Workshop @ ACM RecSys ’17, August 2017, Como, Italy Daniel Zoller, Stephan Doerfel, Christian Pölitz, and Andreas Hotho
Table 1: Statistics about the content datasets, where |U | is the External Referer: When users use external engines for searching
number of users, |P | the number of posts, |R| the number of and click on results linking to BibSonomy, the external source is
resources, |T | the number of tags and |Y | the number of tag logged as referer in the request logs. For candidate extraction, we
assignments. tokenize the value of the ’q’ url parameter (because it is the most
commonly used parameter by search engines) of these requests and
Dataset |U | |P | |R| |T | |Y |
remove stop words.
Bib1 3,674 180,807 162,364 74,634 672,249 Tags of a copied post: While browsing by tags or searching, the
Bib2 1,912 53,098 46,441 33,937 207,709 system presents the user with the resources that match her entered
query. Next to every resource, the user can click on a copy button.
through 2016-06-30. For the remainder of this paper, we refer to the This click is recorded in the request logs. We extract the tags of the
older dataset as Bib1 and to the newer dataset as Bib2 . The older copied resource to represent this type of request.
dataset contains 2,074,182 and the newer 246,472 requests. While some of the request types (e.g., concepts) are specific to
Content Data: The second type of data is the data generated by BibSonomy, most of them are usually present in a tagging sys-
the users of the system by annotating resources with tags. We split tem, representing search options as well as the typical navigation
the content dataset of BibSonomy into the same two time frames paradigm in a folksonomy.
as the interaction data. We applied several cleaning steps to each
dataset. First, we removed system tags (like myown or imported). 4.4 Evaluation
To filter imports, we deleted all posts of a user that shared a posting
For tuning the parameters t and n of the two recommender heuris-
date with another post.6 The remaining tags were normalized by
tics, LRt and LRn , we split each of the two datasets into a validation
decapitalizing and removing all non-alphanumeric characters from
and a test set: each part contains 50 % of the posts. We use the
the tag string. We did not prune our datasets using a p-core, to avoid
validation set to determine the best parameters t and n. Then we
biases to the results (see Section 4.4). Furthermore, the goal of this
use the test to evaluate a hybrid with BLL (cf. Section 2.2).
research is to produce broadly applicable recommender strategies,
The choice of the evaluation setup often has a strong influence
but restricting the dataset to only its dense part would neglect new
on the experiments. For example, Cremonesi et al. [3] observed
users, rare tags and rare resources, thus providing an incomplete
that different sampling strategies yield different outcomes, while
impression on the overall performance. The statistics for the two
Doerfel et al. [4] showed that different restrictions on the dataset can
cleaned content datasets can be found in Table 1.
also lead to different (contradictory) results. Both suggest that the
scenario should be selected such that it resembles reality as much
4.3 Extraction of Tag Recommendation
as possible and that choices should be based on the use case rather
Candidates from User Interaction than on issues like sparsity. Therefore, we adapt the rating-based
While browsing in a tagging system, the user queries different page temporal leave-one-out method introduced in [2] to our scenario. In
types. For extracting tag recommendation candidates from requests our experiments, we consider each post as a test post. Moreover, we
to BibSonomy, we are using the following methods for the different ensure that the algorithms only use data from before the creation of
page types. After the extraction, we also normalized the extracted the test posts. More specifically, for each post p, we do the following:
tags as described in the previous section. We select all posts (and requests) that have been created before p
Tag Pages: In BibSonomy, users can restrict the global collection of and use it to compute recommendations for p based on the user
resources or the collections of other users, groups and search results and resource of p. Then we compare the recommended tags to the
by tags on a separate tag page for the corresponding entity. All actual tags of p and evaluate the number of correctly predicted tags.
pages also support to filter with more than one tag. We represented This scenario is the most realistic offline evaluation scenario as it
a request to one of the tag pages with the specified tag(s). considers each occasion for recommendations and uses only data
User Pages: For user pages, we extracted the user’s tags that she resembling exactly the state of the system at the time the test post
used before the request was made for her own posts. We only was actually created.
considered user pages where the logged-in user requested a user Metric: We use the standard information performance metric F-
page of another user. This is the same representation that [13] used score for measuring the quality of the recommendations [7]. Online
for their analysis. systems usually present users with only a limited number of tag
Resource Pages: Also, like [13], we represent a publication or recommendations while saving a new resource (often five tags). For
website overview page with the tags that any user used to describe that reason, we report F@5, that is, the F-score computed for the
the requested resource. We restrict the tags extracted from a details set of the first five suggested tags from the ranked list of recommen-
page to the page owner’s tags. dations (cf. Section 2.2). The parameters t and n are selected based
Concept Pages: Users can request concept pages for users, groups, on the best F@5 score found in experiments on the validation sets.
or globally, showing resources tagged with the keywords the user, Significance: To test the obtained results for statistically signifi-
group or all users defined as subtags of the concept. We added all cant differences, we use the Wilcoxon signed-rank test. Since we
subtags of the concept to the set of considered tag candidates. consider a large number of posts for the evaluation, we compute
Search Pages: We represent a search request to BibSonomy with two versions – considering either the posts or the users as the popu-
the terms of the search query after removing stop words using a lation. For the latter case, we averaged the obtained F@5 scores over
multi-language stop word list. all posts per user in the test set, and we conduct the significance
6 Furthermore, we removed user accounts that are used by libraries, like DBLP, from test based on these averages. To indicate statistical significance, we
the two BibSonomy datasets. use the symbol * after a reported F-score when the user-based test
Leveraging User-Interactions for Time-Aware Tag Recommendations RecTemp Workshop @ ACM RecSys ’17, August 2017, Como, Italy
indicates significance, and we use + for the post-based test, in both Table 2: F-scores for LRn and LRt with their corresponding
cases using the α-level of 0.01. best configurations N and T . For comparison both BLL and
the hybrid recommender are reported for the subset. We also
5 RESULTS report the number of considered users |U | and posts |P |. Sym-
bols * and + indicate a statistically significant difference (see
In this section, we present the results of our experiments. In the first
Section 4.4 for details).
set of experiments, we use the validation datasets to tune our two
approaches LRt and LRn , using various settings for t and n, thus (a) Results for LRn . We report the significant difference of BLL com-
pared to both other methods.
including different sets of user interactions. In all these experiments,
we consider only those posts where the situation was suitable for Bib1 (N=6) Bib2 (N=7)
|U |
|U |
the respective approach, that is, where interactions with the system F@5 F@5
|P |
|P |
had been recorded. Thus, the number of considered posts for which
BLL 0.27*+/ *+ 0.32*+/ *+
an approach is tested varies from recommender type and setting.
62,541
17,788
1,998
LRn=N 0.13 0.10
509
On the same subsets of posts, we also evaluate BLL and the hybrid
with BLL for comparison. Eventually, in Section 5.2, we use the LRn=N + BLL 0.16 0.17
test datasets to evaluate the impact of our approach on the overall
performance, that is, we evaluate the recommender strategies using (b) Results for LRt . We report the significant difference of
LRt and LRt + BLL compared to BLL.
all posts without any restriction.
Bib1 (T=1m) Bib2 (T=1m)
5.1 Time-Aware Interaction-Driven
|U |
|U |
F@5 F@5
|P |
|P |
Recommenders BLL 0.31 0.35
6,974
910
163
First, we report the results for the recommenders LRt and LRn (see 0.32*+
927
LRt =T 0.33
Section 2.1). We report results averaged over all posts where the re-
LRt =T + BLL 0.42*+ 0.45+
spective heuristic was applicable (i.e., where there were observable
interactions), and we compare to BLL on the same set of posts.
In our first evaluation, we vary the parameter n – the number of recommender BLL hybrid
0.425
included previous interactions – of the LRn recommender from 1 to 0.400
10 on the validation set and report the results of the best parameter 0.375
F@5
0.350
in Table 2a. We find that on Bib1 and Bib2 similar configurations 0.325
(n = 6 for Bib1 and n = 7 for Bib2 ) worked best for LRn . Further, 0.300
1m 2m 3m 4m 5m
we can observe that LRn alone is clearly inferior to BLL in terms of 6,974/927 9,718/1,142 11,527/1,253 12,937/1,322 14,066/1,374
F@5. Combining results by concatenating the lists of recommended t (with |P| / |U|)
tags (LRn + BLL), as described in Section 2.2, can improve the F@5 Figure 1: F-scores for BLL and the hybrid LRt + BLL on the
score, but still cannot reach that of plain BLL. Our hypothesis is that test set of Bib1 . The time-window t ranges from one to five
the last requests are too far in the past and thus have no relevance minutes. For each t the number of posts and users in the
for the current post. Thus, the time of the considered interactions considered subset are reported.
is a critical factor.
Therefore, in Table 2b, we switch the mode of selecting requests from one minute to five minutes. We can observe, that BLL re-
to time-windows, using LRt . We vary the considered time-window mains roughly constant when t increases. On the other hand, the
for including requests from one minute to 30 days.7 Although the combination of LRt and BLL decreases from 0.42 to 0.35 when we
number of posts for which the heuristic is applicable grows with the increase the time-window t from one minute to five minutes, but
selected time-window length, we observe decreasing scores for LRt outperforms BLL for every considered time-window. For the dataset
on both datasets – more evidence for the above hypothesis. We find Bib2 , we find similar results, except that LRt decreases faster (figure
that using a time-window t of one minute yields the highest perfor- omitted due to space limitations).
mance on (the respective validation sets of) both datasets, Bib1 and
Bib2 . Other than before with LRn , the time-aware interaction-based 5.2 Overall Performance
heuristic LRt yields F@5 scores comparable to BLL on both Bib1
In the previous section, we saw that the time-window based LRt
and Bib2 . When combining the two recommenders (LRt =T + BLL),
recommender achieved better results than BLL on those subsets
the scores improve significantly, by ten percentage points over
of the data where the respective heuristic was applicable. In this
plain BLL on both sets. The improvements obtained for Bib1 are
section, we evaluate the hybrid of LRt with BLL on the complete
significant according to the Wilcoxon signed-rank test, considering
test sets to get an impression of its overall impact as an improve-
both the users and the posts as entities. On Bib2 , the significance
ment over plain BLL. In the following, we combine LRt with those
of the hybrid’s improvement is confirmed when the posts are used
parameters that produced the best results on the validation sets. Re-
as entities in the test.
sults are given in Table 3. The combination with the request-based
Figure 1 shows the trend of the F@5 scores calculated on BLL
LRt recommender improves the recommendation result by about
compared to the hybrid LRt + BLL for the time-windows ranging
three per cent on the Bib1 test set and one per cent on the Bib2 test
7 30 days is the setting for which Yin et al. [20] report the best results on the overall set. A Wilcoxon signed-rank test, conducted on the average F-score
dataset for their approach based in previous posts. of each user, indicates that the difference is significant for Bib1 .
RecTemp Workshop @ ACM RecSys ’17, August 2017, Como, Italy Daniel Zoller, Stephan Doerfel, Christian Pölitz, and Andreas Hotho
Table 3: F@5-scores of hybrid recommender and BLL on the Computational Cost: In contrast to many other methods (cf. Sec-
full BibSonomy test sets. Symbols * and + indicate a statisti- tion 3), our heuristic does not require large user profiles, as it draws
cally significant difference (see Section 4.4 for details). the recommendations only from the interactions in the tagging sys-
tem directly preceding a new post. Moreover, the heuristic is easy
Bib1 Bib2 to implement into arbitrary tagging systems, and requires only data
BLL 0.265 0.315 collected from browsing activities. While, in our experiments, we
LRt =1m + BLL 0.274*+ 0.318+ made use of the request logs to exploit interactions, in a production
environment tags can be derived directly from the interactions and
Testing with all posts as entities, significance is confirmed for both can be stored into a temporary cache. Thus, the recommendations
Bib1 and Bib2 . Thus, we can conclude that exploiting user inter- can be computed directly without accessing additional data sources.
actions in very short time-windows immediately before posting a Relying solely on counting occurrences and restrictions on small
resource can boost the performance of the already well-performing subsets of interactions, recommendations can be computed online
recommender algorithm BLL. without previous training. Consequently, they require only little
effort, making them ideal candidates for systems where tagging is
included as a secondary feature and for quickly prototyping a new
6 DISCUSSION AND LIMITATIONS tagging system.
In our studies, we found that the time-window based recommender Explainability: Finally, it is worth pointing out that, in our ap-
LRt provides better recommendations than the recommender LRn , proach, the choice of the recommended tags can be easily explained.
which uses the last requests. We also saw that in those situations Explanations are suitable for increasing users’ acceptance of the
where it is applicable its results are comparable to the more complex recommendations, particularly as the explanations reveals that no
algorithm BLL. Combining the time-window variant with BLL into profiling of the user is necessary as only the few recent activities
a hybrid significantly improves the performance. In the following are exploited.
we discuss several aspects and limitations of our study.
Timing: Overall, we could demonstrate the applicability of our 7 CONCLUSION
time-aware interaction-driven heuristic. We saw particularly that In this work, we have proposed a time-aware, interaction-driven
short time spans immediately before the posting of a resource yield tag recommendation heuristic, for the first time leveraging user
good recommendations. It seems that users store posts in bursts, interaction in a tagging system beyond the publicly visible posting.
each representing different aspects of their (shifting) interests. Thus, We have evaluated our approaches on data of the real-world tagging
relying on very recent tags is a reasonable approach. system BibSonomy. We have shown that time is a critical factor
Applicability: The number of posts and users for which the time- and that particularly interactions immediately before a new post
window based recommender LRt can provide tag recommendations are a good source for recommendable tags.
is only a relatively small subset of the data. One reason for this Finally, combining the time-aware approach with BLL leads to
phenomenon may be the fact that BibSonomy offers browser ex- a hybrid recommender that outperforms both individual compo-
tensions for posting resources, thus a useful means of posting to nents. The result is an effective recommender system that is easy
BibSonomy without visiting the system first. to implement and independent of the tagged resources and that
Generalizability of the Results: Since we could evaluate our requires no offline training. The approach is thus not only suitable
recommendation approach only on two BibSonomy datasets (due for dedicated tagging systems, but also for broader web systems in
to the unavailability of suitable data from other systems), it remains which tagging is merely a secondary feature.
an open research question to see how it would perform in other Future work: In this study, we have evaluated LRt and LRn with
tagging systems. Heckner et al. [8] found that users of tagging fixed parameters (time-window length or number of considered
systems with different resources tend to tag for different reasons, interactions) for all users. A chance for improving the results fur-
for example, Flickr (images) is used mostly for sharing, Delicious ther would be an analysis of personalized parameters for each
(websites) mainly for retrieving resources from one’s own collection. user or different user groups. It would also be conceivable to use
Since the resource types in BibSonomy and Delicious are similar more refined methods to detect the user’s current context when
(references to documents; websites in both systems, publications posting a new resource. For example, by looking at the requests,
only in BibSonomy), we hypothesize that they are often used for it might be possible to distinguish situations where users do re-
similar purposes. Also, most page types (e.g., a user page) exist search regarding one specific topic from situations where users just
in both systems. Thus, we would expect results of recommender “stumble” through the system changing their focus based on what
algorithm experiments conducted on Delicious to be qualitatively they find on each page. Another topic for future work is the tag
similar to those found for BibSonomy. Another influence on the candidate extraction. For this study, we extracted only the directly
usage of tagging systems is the user interface which varies from requested tags or query terms from a request. In tagging systems,
system to system. For example, BibSonomy always links the entities one could leverage the semantic context of such tags (arising from
of the folksonomy, while other systems may exclude some links or the co-occurrence with other tags on the same resources). Finally,
place links on different positions within a page. instead of exploiting retrieval interaction on the level of individual
Transferability of the Approach: While the above mentioned is- requests, one could attempt to identify sessions. These might yield
sues are limitations to the generalizability of the results in our study, a more comprehensive understanding of the current user context
our methods of extracting tag recommendations from interactions than considering individual requests independently.
can easily be adapted to other tagging systems.
Leveraging User-Interactions for Time-Aware Tag Recommendations RecTemp Workshop @ ACM RecSys ’17, August 2017, Como, Italy
REFERENCES [11] Marek Lipczak, Yeming Hu, Yael Kollet, and Evangelos Milios. 2009. Tag Sources
[1] Dominik Benz, Andreas Hotho, Robert Jäschke, Beate Krause, Folke Mitzlaff, for Recommendation in Collaborative Tagging Systems, See [6], 157–172.
Christoph Schmitz, and Gerd Stumme. 2010. The social bookmark and publication [12] Feichao Ma, Wenqing Wang, and Zhihong Deng. 2013. TagRank: A new tag
management system BibSonomy. The VLDB Journal 19, 6 (2010), 849–875. DOI: recommendation algorithm and recommender enhancement with data fusion
https://doi.org/10.1007/s00778-010-0208-4 techniques. In Social Media Retrieval and Mining, Shuigeng Zhou and Zhiang Wu
[2] Robin Burke. 2010. Evaluating the Dynamic Properties of Recommendation (Eds.). Communications in Computer and Information Science, Vol. 387. Springer,
Algorithms. In Proceedings of the Fourth ACM Conference on Recommender Sys- Berlin/Heidelberg, 80–91. DOI:https://doi.org/10.1007/978-3-642-41629-3_7
tems. ACM, New York, NY, USA, 225–228. DOI:https://doi.org/10.1145/1864708. [13] Thomas Niebler, Martin Becker, Daniel Zoller, Stephan Doerfel, and Andreas
1864753 Hotho. 2016. FolkTrails: Interpreting Navigation Behavior in a Social Tagging
[3] Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of System. In Proceedings of the 25th ACM International on Conference on Information
Recommender Algorithms on Top-n Recommendation Tasks. In Proceedings of and Knowledge Management. ACM, New York, NY, USA. DOI:https://doi.org/10.
the Fourth ACM Conference on Recommender Systems. ACM, New York, NY, USA, 1145/2983323.2983686
39–46. DOI:https://doi.org/10.1145/1864708.1864721 [14] Maryam Ramezani. 2011. Improving graph-based approaches for personalized
[4] Stephan Doerfel, Robert Jäschke, and Gerd Stumme. 2016. The Role of Cores in tag recommendation. Journal of Emerging Technologies in Web Intelligence 3, 2
Recommender Benchmarking for Social Bookmarking Systems. ACM Transac- (2011), 168–176. DOI:https://doi.org/10.4304/jetwi.3.2.168-176
tions on Intelligent Systems and Technology 7, 3 (February 2016), 40:1–40:33. DOI: [15] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Schmidt-Thieme
https://doi.org/10.1145/2700485 Lars. 2009. BPR: Bayesian personalized ranking from implicit feedback. In
[5] Stephan Doerfel, Daniel Zoller, Philipp Singer, Thomas Niebler, Andreas Hotho, Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. AUAI
and Markus Strohmaier. 2016. What Users Actually do in a Social Tagging Press, Arlington, Virginia, United States, 452–461.
System: A Study of User Behavior in BibSonomy. ACM Transactions on the Web [16] Steffen Rendle and Lars Schmidt-Thieme. 2009. Factor Models for Tag Recom-
10, 2 (2016), 14:1–14:32. DOI:https://doi.org/10.1145/2896821 mendation in BibSonomy, See [6], 235–242.
[6] Folke Eisterlehner, Andreas Hotho, and Robert Jäschke (Eds.). 2009. ECML PKDD [17] Paul Seitlinger, Dominik Kowald, Christoph Trattner, and Tobias Ley. 2013. Rec-
Discovery Challenge 2009 (DC09). CEUR-WS.org, Vol. 497. ommending tags with a model of human categorization. In Proceedings of the
[7] Asela Gunawardana and Guy Shani. 2009. A Survey of Accuracy Evaluation 22nd International Conference on Conference on Information and Knowledge Man-
Metrics of Recommendation Tasks. Journal of Machine Learning Research 10 agement. ACM, New York, NY, USA, 2381–2386. DOI:https://doi.org/10.1145/
(2009), 2935–2962. 2505515.2505625
[8] M. Heckner, M. Heilemann, and C. Wolff. 2009. Personal Information Manage- [18] Sanjay Sood, Sara Owsley, Kristian Hammond, and Larry Birnbaum. 2007. TagAs-
ment vs. Resource Sharing: Towards a Model of Information Behaviour in Social sist: Automatic Tag Suggestion for Blog Posts. In Proceedings of the 1st Interna-
Tagging Systems. In Proceedings of the thrid AAAI Conference on Weblogs and tional Conference on Weblogs and Social Media. Boulder, Colorado, USA.
Social Media. San Jose, CA, USA. [19] Christoph Trattner, Dominik Kowald, Paul Seitlinger, Tobias Ley, and Simone
[9] Andreas Hotho, Robert Jäschke, Christoph Schmitz, and Gerd Stumme. 2006. Kopeinik. 2016. Modeling Activation Processes in Human Memory to Predict the
Information Retrieval in Folksonomies: Search and Ranking. In The Semantic Use of Tags in Social Bookmarking Systems. Journal of Web Science 2, 1 (2016),
Web: Research and Applications: Third European Semantic Web Conference, ESWC 1–16. DOI:https://doi.org/10.1561/106.00000004
2006 Budva, Montenegro, June 11-14, 2006 Proceedings (LNCS), York Sure and [20] Dawei Yin, Liangjie Hong, Zhenzhen Xue, and Brian D. Davison. 2011. Temporal
John Domingue (Eds.), Vol. 4011. Springer Berlin Heidelberg, Berlin/Heidelberg, Dynamics of User Interests in Tagging Systems. In Proceedings of the 25th AAAI
411–426. DOI:https://doi.org/10.1007/11762256_31 Conference on Artificial Intelligence. AAAI.
[21] Lei Zhang, Jian Tang, and Ming Zhang. 2012. Integrating Temporal Usage
[10] Robert Jäschke, Leandro Marinho, Andreas Hotho, Lars Schmidt-Thieme, and
Pattern into Personalized Tag Prediction. In Proceedings of the 14th Asia-
Gerd Stumme. 2008. Tag Recommendations in Social Bookmarking Sys-
Pacific International Conference on Web Technologies and Applications (AP-
tems. AI Communications 21, 4 (2008), 231–247. DOI:https://doi.org/10.3233/
Web’12). Springer-Verlag, Berlin, Heidelberg, 354–365. DOI:https://doi.org/10.
AIC-2008-0438
1007/978-3-642-29253-8_30