<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Factor Models for Tag Recommendation in BibSonomy</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ste en Rendle</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lars Schmidt-Thieme</string-name>
          <email>schmidt-thiemeg@ismll.uni-hildesheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Machine Learning Lab, University of Hildesheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes our approach to the ECML/PKDD Discovery Challenge 2009. Our approach is a pure statistical model taking no content information into account. It tries to nd latent interactions between users, items and tags by factorizing the observed tagging data. The factorization model is learned by the Bayesian Personal Ranking method (BPR) which is inspired by a Bayesian analysis of personalized ranking with missing data. To prevent over tting, we ensemble the models over several iterations and hyperparameters. Finally, we enhance the top-n lists by estimating how many tags to recommend.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In this paper, we describe our approach to task 2 of the ECML/PKDD Discovery
Challenge 2009. The setting of the challenge is personalized tag recommendation
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. An example is a social bookmark site where a user wants to tag one of his
bookmark and the tag recommender suggest the user a personalized list of tags
he might want to use for this item.
      </p>
      <p>
        Our approach to this problem is a pure statistical model using no content
information. It relies on a factor model related to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] where the model parameters
are optimized for the maximum likelihood estimator for personalized pairwise
ranking [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Furthermore, we use a smoothing method for reducing the variance
in the factor models. Finally, we provide a method for estimating how many tags
should be recommended for a given post. This method is model independent and
can be applied to any tag recommender.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Terminology and Formalization</title>
      <p>
        We follow the terminology of [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]: U is the set of all users, I the set of all items/
resources and T the set of all tags. The tagging information of the past is
represented as the ternary relation S U I T . A tagging triple (u; i; t) 2 S means
that user u has tagged an item i with the tag t. The posts PS denotes the set of
all distinct user/ item combinations in S:
Where U^ , I^, T^U and T^I are feature matrices capturing the latent interactions.
They have the following types:
y^u;i;t =
      </p>
      <p>X u^u;f t^tU;f + X ^ii;f tt;f</p>
      <p>^I
f f
^</p>
      <p>U 2 RjUj k;
T^U 2 RjT j k;
^</p>
      <p>I 2 RjIj k;
T^I 2 RjT j k
Our models calculate an estimator Y^ for S. Given such a predictor Y^ the list
Top of the N highest scoring items for a given user u and an item i can be
calculated by:</p>
      <p>Top(u; i; N ) := argNmax y^u;i;t</p>
      <p>t2T
r^u;i;t := jft0 : y^u;i;t0 &gt; y^u;i;tgj
where the superscript N denotes the number of tags to return. Besides y^u;i;t we
also use the notation of a rank r^u;i;t which is the position of t in a post (u; i)
after sorting all tags by y^u;i;t:
3</p>
    </sec>
    <sec id="sec-3">
      <title>Factor Model</title>
      <p>
        Our factorization model (FM) captures the interactions between users and tags
as well as between items and tags. The model equation is given by:
(1)
(2)
Note that this model di ers from the factorization model in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] where the model
equation is the Tucker Decomposition.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Optimization Criterion</title>
        <p>
          Our optimization criterion is an adaption of the BPR criterion (Bayesian
Personalized Ranking) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The criterion presented in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is derived for the task of item
recommendation. Adapted to tag recommendation, the optimization function for
our factor model is:
        </p>
        <p>BPR-Opt :=</p>
        <p>X</p>
        <p>X</p>
        <p>
          X
(jjU^ jj2 + jjI^jj2 + jjT^U jj2 + jjT^I jj2) (3)
That means BPR-Opt tries to optimize the pairwise classi cation accuracy
within observed posts. Note that it di ers from [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] by optimizing for pairwise
classi cation (log-sigmoid) instead of AUC (sigmoid).
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Learning</title>
        <p>
          The model is learned by the LearnBPR algorithm [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] which is a stochastic
gradient descent algorithm where cases are sampled by bootstrapping. In the
following, we show how we apply this generic algorithm to the task of optimzing
our model paramaters for the task of tag recommendation. The gradients of our
model equation (2) with respect to the model parameters = fU^ ; I^; T^U ; T^I g
are:
/
=
X
        </p>
        <p>X</p>
        <p>X</p>
        <p>X</p>
        <p>2
e (y^u;i;t+ y^u;i;t )
y^u;i;t )
That means, we only have to compute the derivations of our model equation
y^u;i;t with respect to each model parameter from = fU^ ; I^; T^U ; T^I g:</p>
        <p>y^u;i;t = ^ii;f
These derivations are used in the stochastic gradient descent algorithm shown
in gure 1.</p>
        <p>The method presented so far has the following hyperparameters:
{ 2 R+ learning rate
{ 2 R0+ regularization parameter
{ 2 R mean value for initialization of model parameters
{ 2 2 R0+ standard deviation for initialization of model parameters
{ k 2 N+ feature dimensionality of factorization
Reasonable values for all parameters can be searched on a holdout set. The
learning rate and the initialization parameters are only important for the learning
algorithm but are not part of the optimization criterion or model equation.
Usually, the values found for ; ; 2 on the holdout generalize well.</p>
        <p>In contrast to this, the regularization and dimensionality are more important
for the prediction quality. In general, when the regularization is chosen properly,
the higher the dimensionality the better. In our submitted result, we use an
ensemble of models with di erent regularization and dimensionality.
1: procedure LearnBPR(PS; U^ ; I^; T^U ; T^I )
2: draw U^ ; I^; T^U ; T^I from N ( ; 2)
3: repeat
4: draw (u; i; t+; t ) uniformly from PS
5: d y^u;i;t+ y^u;i;t
6: for f 2 1; : : : ; k do
7: u^u;f u^u;f +
8:
9:
10:
11:
12: t^tI ;f
13: end for
14: until convergence
15: return U^ ; I^; T^U ; T^I
16: end procedure
^ii;f
t^tU+;f
t^tU ;f
t^tI+;f
^ii;f +
t^tU+;f +
t^tU ;f +
t^tI+;f +
t^tI ;f +
e d
1+e d (t^tU+;f
e d
1+e d (t^tI+;f
e d
1+e d u^u;f +
e d
1+e d
e d
1+e d ^ii;f +
e d
1+e d
^ii;f +
u^u;f +
t^tU+;f</p>
        <p>t^tU ;f
t^tI+;f</p>
        <p>t^tI ;f
Ensembling factor models with di erent regularization and dimensionality is
supposed to remove variance from the ranking estimates. There are basically
two simple approaches of ensembling predictions y^ul;i;t of l models:
1. Ensemble of the value estimates y^ul;i;t:
2. Ensemble of the rank estimates r^ul;i;t:
y^uev;i;t :=</p>
        <p>X wl y^u;i;t</p>
        <p>l
l
y^uer;i;t :=</p>
        <p>X wl (jT j
l
l
r^u;i;t)
(4)
(5)</p>
        <p>That means tags with a high rank (low r^) will get a high score y^.
Where wl is the weighting parameter for each model.</p>
        <p>Whereas ensembling value estimates is e ective for models with predictions
on the same scale, rank estimates are favorable in cases where the y^ values of
the di erent models have no direct relationship.</p>
        <p>Ensembling Di erent Factor Models For our factor models the scales of y^
depend both on the dimensionality and the regularization parameter. Thus we
use the rank estimates for ensembling factor models with di erent dimensionality
and regularization. In our approach we use a dimensionality of k 2 f64; 128; 256g
and regularization of 2 f10 4; 5 10 5g. As the prediction quality of all of our
factor models are comparable, we have chosen identical weights wl = 1.
Ensembling Iterations Within each factor model we use a second ensembling
strategy to remove variance. Besides the hyperparameters, another problem is
the stopping criterion of the learning algorithm (see gure 1). We stop after
a prede ned number of iterations (2000) { we have chosen an iteration size of
10 jSj single draws. In our experiments the models usually converged already
after about 500 iterations but in the following iterations the ranking alternates
still a little bit. To remove the variance, we create many value estimates from
di erent iterations and ensemble them. I.e. after the rst 500 iterations we create
each 50 iterations a value estimate for each tag in all test posts and ensemble
these estimates with (4). Again there is no reason to favor an iteration over
another, so we use identical weights wl = 1. This gives the nal estimates for
each model. The models with di erent dimensionality and regularization are
ensembled as described above.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Baseline Models</title>
      <p>Besides our factorization model we also consider several baseline models and
ensembles of these models. The models we pick as baselines are most-popular by
item (mpi), most-popular by user (mpu), item-based knn (knni) and user-based
knn (knnu).</p>
      <p>The most-popular models are de ned as follows:
The k-nearest-neighbour models (knn) are de ned as follows:
y^um;pi;it = jfu0 2 U : (u0; i; t)gj
y^um;pi;ut = jfi0 2 I : (u; i0; t)gj</p>
      <p>y^ukn;in;ti =
y^ukn;in;tu =</p>
      <p>X
(u;i0;t)2S</p>
      <p>X
simi;i0
simu;u0
(u0;i;t)2S
fiI;t = jfu : (u; i; t) 2 Sgj
fuU;t = jfi : (u; i; t) 2 Sgj
To measure simi;i0 and simu;u0 respectively, we rst fold/ project the observed
data tensor in a two dimensional matrix F U and F I :
After the folding we apply cosine similarity to compare two tag vectors:</p>
      <p>We tried di erent weighted ensembles of the baseline models using the value
estimate ensembling method. Even though these ensembles produce quite good
results, in our experiments they did not outperform the factor models and
furthermore adding baselines to the factor models did not result in a signi cant
improvement of the factor models. Thus our nal submission only consists of
the factor models.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Adaptive List Length</title>
      <p>In contrast to the usual evaluation scheme of tag recommendation, in this
challenge the recommender was free to choose the length of the list of the
recommendations in a range from a length of 0 to 5. The evaluation functions are:</p>
      <p>Prec(Stest) :=
Recall(Stest) :=</p>
      <p>F1(Stest) :=</p>
      <p>avg
(u;i)2PStest</p>
      <p>avg
(u;i)2PStest
2 Prec(Stest) Recall(Stest)
Prec(Stest) + Recall(Stest)
j Top(u; i; min(5; #u;i)) \ ftj(u; i; t) 2 Stestgj</p>
      <p>min(5; #u;i)
j Top(u; i; min(5; #u;i)) \ ftj(u; i; t) 2 Stestgj
jftj(u; i; t) 2 Stestgj
Where #u;i is the number of tags the recommender estimates for a post.</p>
      <p>There are three simple ways to estimate #u;i:
{ Global estimate:
{ User estimate:
{ Item estimate:
#uG;i := jSj</p>
      <p>jPS j
#uU;i := jf(i0; t) : (u; i0; t) 2 Sgj</p>
      <p>jfi0 : (u; i0; t) 2 Sgj
#Iu;i := jf(u0; t) : (u0; i; t) 2 Sgj</p>
      <p>jfu0 : (u0; i; t) 2 Sgj
Based on these simple estimators, a combined post size can be produced by a
linear combination:
#uE;i :=
0 +</p>
      <p>G#uG;i +</p>
      <p>U #uU;i + I #Iu;i
In our approach we use #uE;i and optimize on the holdout set for maximal F1.
We found that choosing an adaptive length of the recommender list signi cantly
improved the results over a xed number.
6
6.1</p>
    </sec>
    <sec id="sec-6">
      <title>Experimental Results</title>
      <sec id="sec-6-1">
        <title>Sampling of Holdout Set</title>
        <p>
          As the test of the challenge was released two days before the submission
deadline, we tried to generate representative holdout-sets. We created two test sets,
one following the leave-one-post-per-user-out protocol [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and a second one by
uniformly sampling posts with the constraint that the dataset should remain a
2-core after moving a post into the test set. These two sets were used as holdout
sets for algorithm evaluation and hyperparameter selection. In the following, we
report results for the second holdout set, because its characteristics (in terms of
number of users, items and posts) are closer to the real test set.
6.2
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>Results</title>
        <p>The results of the method presented so far can be found in table 2 and 3. As
you can see, the single baseline models result in low quality but ensembles can
achieve a good quality. In contrast to this, our proposed factor models generate
better recommendations. The best possible ensemble (optimized on test!) of
the baselines achieves a score of 0.330 on the challenge set whereas our factor
ensemble (not optimized on test) results in 0.345.</p>
        <p>mpu mpi mp-ens knni knnu knn-ens knn+mp-ens
holdout 0.249 0.351 -/0.423 0.401 0.371 -/0.445 -/0.473
challenge 0.098 0.288 0.290/0.317 0.209 0.295 0.293/0.320 0.299/0.330</p>
        <p>An interesting nding is that the results on the challenge test set largely
differs from both of our holdout sets. But as all methods su er, we assume that the
tagging behavior in the challenge test set is indeed di erent from the one in the
training set. Especially, the baseline most-popular-by-user dropped largely from
24.9% to 9.8% { this might indicate that personalization is di cult to achieve on
single FM</p>
        <p>FM-ens FM-ens adaptive list length
holdout
challenge
0:495
the challenge test set using the provided training set. Non-personalized
methods or content-based methods could bene t from the di erence in both sets.
Also methods that can handle temporal changes in the tagging behaviour might
improve the scores.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>
        In this paper, we have presented a factor model for the task of tag
recommendation. The model tries to describe the individual tagging behavior by four
low-dimensional matrices. The model parameters are optimized for the
personalized ranking criterion BPR-Opt [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The length of the recommended lists is
adapted both to the user and item. Our evaluation indicates that our approach
outperforms ensembles of baseline models which are known to give high quality
recommendations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
8
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>The authors gratefully acknowledge the partial co-funding of their work through
the European Commission FP7 project MyMedia (www.mymediaproject.org)
under the grant agreement no. 215006. For your inquiries please contact
info@mymediaproject.org.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Jaeschke</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marinho</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hotho</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt-Thieme</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stumme</surname>
          </string-name>
          , G.:
          <article-title>Tag recommendations in social bookmarking systems</article-title>
          .
          <source>AICOM</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Rendle</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marinho</surname>
            ,
            <given-names>L.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nanopoulos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thieme</surname>
            ,
            <given-names>L.S.</given-names>
          </string-name>
          :
          <article-title>Learning optimal ranking with tensor factorization for tag recommendation</article-title>
          .
          <source>In: KDD '09:</source>
          <article-title>Proceeding of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining</article-title>
          , New York, NY, USA, ACM (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Rendle</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freudenthaler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gantner</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thieme</surname>
            ,
            <given-names>L.S.</given-names>
          </string-name>
          : Bpr:
          <article-title>Bayesian personalized ranking from implicit feedback</article-title>
          .
          <source>In: Proceedings of the 25th Conference on Uncertainty in Arti cial Intelligence (UAI</source>
          <year>2009</year>
          ).
          <article-title>(</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>