<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Relational Classification for Personalized Tag Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leandro Balby Marinho</string-name>
          <email>marinho@ismll.uni-hildesheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christine Preisach</string-name>
          <email>preisach@ismll.uni-hildesheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lars Schmidt-Thieme</string-name>
          <email>schmidt-thieme@ismll.uni-hildesheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Systems and Machine Learning Lab (ISMLL) Samelsonplatz 1, University of Hildesheim</institution>
          ,
          <addr-line>D-31141 Hildesheim</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Folksonomy data is relational by nature, and therefore methods that directly exploit these relations are prominent for the tag recommendation problem. Relational methods have been successfully applied to areas in which entities are linked in an explicit manner, like hypertext documents and scientific publications. For approaching the graph-based tag recommendation task of the ECML PKDD Discovery Challenge 2009, we propose to turn the folksonomy graph into a homogeneous post graph and use relational classification techniques for predicting tags. Our approach features adherence to multiple kinds of relations, semi-supervised learning and fast predictions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>One might want tag recommendations for several reasons, as for example: simplifying
the tagging process for the user, exposing different facets of a resource and helping
the tag vocabulary to converge. Given that users are free to tag, i.e., the same resource
can be tagged differently by different people, it is important to personalize the
recommended tags for an individual user.</p>
      <p>
        Tagging data forms a ternary relation between users, resources and tags, differently
from typical recommender systems in which the relation is usually binary between users
and resources. The best methods presented so far explore this ternary relation to
compute tag predictions, either by means of tensor factorization [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or PageRank [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], on the
hypergraph induced by the ternary relational data. We, on the other hand, propose to
explore the underlying relational graph between posts by means of relational
classification.
      </p>
      <p>In this paper we describe our approaches for addressing the graph-based tag
recommendation task of the ECML PKDD Discovery Challenge 2009. We present two
basic algorithms: PWA* (probabilistic weighted average), an iterative relational
classification algorithm enhanced with relaxation labelling, and WA* (weighted average),
an iterative relational classification method without relaxation labelling. These
methods feature: adherence to multiple kinds of relations, training free, fast predictions, and
semi-supervised classification. Semi-supervised classification is particularly appealing
because it allows us to evtl. benefit from the information contained in the test dataset.
Furthermore, we propose to combine these methods through unweighted voting.</p>
      <p>The paper is organized as follows. Section 2 presents the notation used throughout
the paper. In Section 3 we show how we turned the folksonomy into a post relational
graph. Section 4 introduces the individual classifiers and the ensemble technique we
used. In Section 5 we elaborate on the evaluation and experiments conducted for tuning
the parameters of our models, and report the results obtained on the test dataset released
for the challenge. The paper closes with conclusions and directions for future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Notation</title>
      <p>Foksonomy data usually comprises a set of users U , a set of resources R, a set of tags
T , and a set Y of ternary relations between them, i.e., Y U R T .</p>
      <p>Let</p>
      <p>X := f(u; r) j 9t 2 T : (u; r; t) 2 Y g
be the set of all unique user/resources combinations in the data, where each pair is called
a post. For convenience, let T (x = (u; r)) := ft 2 T j (u; r; t) 2 Y g be the set of all
tags assigned to a given post x 2 X. We consider train/test splits based on posts, i.e.,
Xtrain; Xtest X disjoint and covering all of X:</p>
      <p>Xtrain [_Xtest = X</p>
      <p>For training, the learner has access to the set Xtrain of training posts and their true
tags T jXtrain . The tag recommendation task is then to predict, for a given x 2 Xtest, a set
T^(x) T of tags that are most likely to be used by the resp. user for the resp. resource.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Relation Engineering</title>
      <p>We propose to represent folksonomy data as a homogeneous, undirected relational
graph over the post set, i.e., G := (X; E) in which edges are annotated with a weight
w : X X ! R denoting the strength of the relation. Besides making the input data
more compact – we have only a binary relation R X X between objects of the same
type – this representation will allow us to trivially cast the problem of personalized tag
recommendations as a relational classification problem.</p>
      <p>
        Relational classifiers usually consider, additionally to the typical attribute-value data
of objects, relational information. A scientific paper, for example, can be connected to
another paper that has been written by the same author or because they share common
citations. It has been shown in many classification problems that relational classifiers
perform better than purely attribute-based classifiers [
        <xref ref-type="bibr" rid="ref1 ref4 ref6">1, 4, 6</xref>
        ].
      </p>
      <p>In our case, we assume that posts are related to each other if they share the same
user: Ruser := f(x; x0) 2 X X j user(x) = user(x0)g, the same resource: Rres :=
f(x; x0) 2 X Xjres(x) = res(x0)g, or either share the same user or resource:
Rruesser := Ruser [ Rres (see Figure 1). For convenience, let user(x) and res(x) denote
the user and resource of post x respectively. Thus, each post is connected to each other
either in terms of other users that tagged the same resource, or the resources tagged by
the same user. Weights are discussed in Section 4.</p>
      <p>
        Note that it may happen that some of the related posts belong themselves to the
test dataset, allowing us to evtl. profit from the unlabeled information of test nodes
through, e.g., collective inference (see Section 4). Thus, differently from other
approaches (e.g., [
        <xref ref-type="bibr" rid="ref3 ref8">3, 8</xref>
        ]) that are only restricted to Xtrain, we can also exploit the set Xtest
of test posts, but of course not their associated true tags.
      </p>
      <p>Now, for a given x 2 Xtest, one can use the tagging information of related instances
to estimate T^(x). A simple way to do that is, e.g., through tag frequencies of related
posts:</p>
      <p>P (tjx) := jfx0 2 Nxjt 2 T (x0)gj ; x 2 X; t 2 T</p>
      <p>jNxj
while Nx is the neighborhood of x:</p>
      <p>Nx := fx0 2 X j (x; x0) 2 R; T (x) 6= ;g</p>
      <p>In section 4 we will present the actual relational classifiers we have used to approach
the challenge.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Relational Classification for Tag Recommendation</title>
      <p>We extract the relational information by adapting simple statistical relational methods,
usually used for classification of hypertext documents, scientific publications or movies,
to the tag recommendation scenario. The aim is to recommend tags to users by using the
neighborhood encoded in the homogeneous graph G(X; E). Therefore we described a
very simple method in eq. (1), where the probability for a tag t 2 T given a node x
(post) is computed by counting the frequency of neighboring posts x0 2 Nx that have
used the same tag t. In this case the strength of the relations is not taken into account,
i.e., all considered neighbors of x have the same influence on the probability of tag t
given x. But this is not an optimal solution, the more similar posts are to each other the
higher the weight of this edge should be.</p>
      <p>Hence, a more suitable relational method for tag recommendation is the
WeightedAverage (WA) which sums up all the weights of posts x0 2 Nx that share the
same tag t 2 T and normalizes this by the sum over all weights in the neighborhood.</p>
      <p>P (tjx) = PxP02Nx0x2jtN2xT w(x(0)xw;x(x0); x0) (3)
Thus, WA does only consider neighbors that belong to the training set.</p>
      <p>
        A more sophisticated relational method that takes probabilities into account is the
probabilistic weighted average (PWA), it calculates the probability of t given x by
building the weighted average of the tag probabilities of neighbor nodes x0 2 Nx:
As mentioned before, we would like additionally, to exploit unlabeled information
contained in the graph and use the test nodes that have not been tagged yet, but are related
to other nodes. This can be achieved by applying collective inference methods, being
iterative procedures, which classify related nodes simultaneously and exploit relational
autocorrelation and unlabeled data. Relational autocorrelation is the correlation among
a variable of an entity to the same variable (here the class) of a related entity, i.e.,
connected entities are likely to have the same classes assigned. Collective Classification is
semi-supervised by nature, since one exploits the unlabeled part of the data. One of this
semi-supervised methods is relaxation labeling [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], it can be formally expressed as:
P (tjx)(i+1) = M (P (tjx0)(xi0)2Nx )
(5)
      </p>
      <p>We first initialize the unlabeled nodes with the prior probability calculated using the
train set, then compute the probability of tag t given x iteratively using a relational
classification method M based on the neighborhood Nx in the inner loop. The procedure
stops when the algorithm converges (i.e., the difference of the tag probability between
iteration i and i + 1 is less than a very small ) or a certain number of iterations is
reached.</p>
      <p>We used eq. (4) as relational method inside the loop, where we do not require that
the neighbors x0 are in the training set, but are using the probabilities of unlabeled
nodes. For PWA this means that in each iteration we use the probabilities of the
neighborhood estimated in the previous iteration collectively. PWA combined with collective
inference is denoted as PWA* in the following.</p>
      <p>
        For WeightedAverage we did not use relaxation labeling but applied a so called
oneshot-estimation [
        <xref ref-type="bibr" rid="ref5 ref7">5, 7</xref>
        ]. We did only use the neighbors with known classes, i.e., in the first
iteration we exploit only nodes from the training set, while in the next iteration we used
also test nodes that have been classified in the previous iterations. The procedure stops
when all test nodes could be classified or a specific number of iterations is reached.
Hence, the tag probabilities are not being re-estimated like for the relaxation labeling
but only estimated once. Thus, WA combined with the one-shot-estimation procedure is
denoted as WA*.
4.2
      </p>
      <p>
        Ensemble
Ensemble classification may lead to significant improvement on classification
accuracy, since uncorrelated errors made by the individual classifiers are removed by the
combination of different classifiers [
        <xref ref-type="bibr" rid="ref2 ref6">2, 6</xref>
        ]. Furthermore, ensemble classification reduces
variance and bias.
      </p>
      <p>We have decided to combine WA* and PWA* through a simple unweighted voting,
since voting performs particularly well when the results of individual classifiers are
similar; as we will see in Section 5, WA* and PWA* yielded very similar results in our
holdout set.</p>
      <p>After performing the individual classifiers, we receive probability distributions for
each classifier Kl as output and build the arithmetic mean of the tag-assignment
probabilities for each test post and tag:</p>
      <p>P (tjx) =
The weight w in eq. (3) and (4) is an important factor in the estimation of tag
probabilities, since it describes the strength of the relation between x and x0. There are several
ways to estimate these weights:
1. For two nodes (x; x0) 2 Rres, compute their similarity by representing x and x0 as
user-tag profile vectors. Each component of the profile vector corresponds to the
count of co-occurrences between users and tags:
user-tag := (jY \ (fuser(x)g
2. Similarly to 1, for two nodes (x; x0) 2 Ruser, the node similarity is computed by
representing x and x0 as resource-tag profile vectors:
res-tag := (jY \ (U
fres(x)g
ftg)j)t2T
3. Similar to 2, but x and x0 are represented as resource-user profile vectors where
each component corresponds to the count of co-occurrences between resources and
users:
res-user := (jY \ (fug
fres(x)g</p>
      <p>T )j)u2U
4. The same as in 1, but the node similarity is computed w.r.t. to user-resource profile
vectors:
user-res := (jY \ (fuser(x)g
frg</p>
      <p>T )j)r2R</p>
      <p>The edge weight is finally computed by applying the cosine similarity over the
desired profile vectors:
(7)
(8)
5</p>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>All the results presented in this section are reported in terms of F1-score, the official
measure used by the graph-based tag recommendation task of the ECML PKDD
Discovery Challenge 2009. For a given x 2 Xtest the F1-Score is computed as follows:
sim( (x); (x0)) :=
h (x); (x0)i
k (x)kk (x0)k</p>
      <p>In our experiments we basically used the scheme 1, since there is no new user in the
data and therefore we can always build user-tag profile vectors.</p>
      <p>F1-score T^(x) =</p>
      <sec id="sec-5-1">
        <title>2 Recall T^(x)</title>
      </sec>
      <sec id="sec-5-2">
        <title>Precision T^(x)</title>
      </sec>
      <sec id="sec-5-3">
        <title>Recall T^(x) + Precision T^(x)</title>
        <p>Although the methods presented in Section 4 usually do not have free parameters,
we realized that Ruser and Rres can have a different impact in the recommendation
quality (cf. Figures 2 and 3), and thereby we introduced a parameter to reward the best
relations in Rruesser by a factor c 2 N: if Rres yields better recommendations than Ruser
for example, all edge weights in Rruesser that refer to Rres are multiplied by c.</p>
        <p>For searching the best c value we performed a greedy search over the factor range
f1; :::; 4g on a holdout set created by randomly selecting 800 posts from the training
data. Tables 1 and 2 show the characteristics of the training and test/holdout datasets
respectively. Figure 2 presents the results of WA*-Full1, i.e., WA* performed over Rruesser,
for different c values on the holdout set according to the F1-score. We also plot the
results of WA*-Res and WA*-Usr (i.e., WA* on Rres and Ruser resp.).</p>
        <p>After finding the best c value on the holdout set, we applied WA*-Full, PWA*-Full,
and the ensemble (c.f. eq. 6) to the challenge test dataset (see Figure 3). Note that the
1 Since the results of PWA* and WA* are very similar, we just report on WA*.
dataset jU j jRj jT j jY j jXj
BibSonomy 1,185 22,389 13,276 253,615 64,406
results on the challenge test dataset are much lower than those on the holdout set. It may
indicate that either our holdout set was not a good representative of the population or
that the challenge test dataset represents a concept drift. We plan to further investigate
the reasons underlying this large deviation.</p>
        <p>According to the rules of the challenge, the F1-score is measured over the Top-5
recommended tags, even though one is not forced to always recommend 5 tags. This is
an important remark because if one recommends more tags than the true number of tags
attached to a particular test post, one can lower precision. Therefore, we estimate the
number of tags to be recommended to each test post by taking the average number of
tags used by each test user to his resources. If a given test user has tagged his resources
with 3 tags in average, for example, we recommend the Top-3 tags delivered by our
algorithms for all test posts in which this user appears.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper we proposed to approach the graph-based tag recommendation task of
the ECML PKDD Discovery Challenge 2009 by means of relational classification. We
first turned the usual tripartite graph of social tagging systems into a homogeneous post
graph, whereby simple statistical relational methods can be easily applied. Our methods
are training free and the prediction runtime only depends on the number of neighbors
and tags, which is fast since the training data is sparse. The models we presented also
incorporate a semi-supervised component that can evtl. benefit from test data. We
presented two relational classification methods, namely WA* and PWA*, and one ensemble
based on unweighted voting over the tag probabilities delivered by these methods.</p>
      <p>We also introduced a parameter in order to reward more informative relations, which
was learned through a greedy search in a holdout set.</p>
      <p>In future work we want to investigate new kinds of relations between the posts (e.g.
content-based), other ensemble techniques, and new methods for automatically learning
more informative weights.</p>
      <p>1
F
0.5
0.4
0.3
0.2
0.1
0
1</p>
      <p>Parameter tuning on the Holdout Test Data
2</p>
      <p>3
Top-n</p>
      <p>WA*-Full (c=1)</p>
      <p>WA*-Full (c=2)
WA*-Full (c=3.5)</p>
      <p>WA*-Full (c=4)</p>
      <p>WA*-Res
WA*-Usr
4
5</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work is supported by CNPq, an institution of Brazilian Government for
scientific and technologic development, and the X-Media project (www.x-media-project.org)
sponsored by the European Commission as part of the Information Society
Technologies (IST) programme under EC grant number IST-FP6-026978. The authors also
gratefully acknowledge the partial co-funding of their work through the European
Commission FP7 project MyMedia (www.mymediaproject.org) under the grant agreement no.
215006. For your inquiries please contact info@mymediaproject.org.
2</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Soumen</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Byron E.</given-names>
            <surname>Dom</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Indyk</surname>
          </string-name>
          .
          <article-title>Enhanced hypertext categorization using hyperlinks</article-title>
          . In Laura M.
          <article-title>Haas</article-title>
          and Ashutosh Tiwary, editors,
          <source>Proceedings of SIGMOD98</source>
          , pages
          <fpage>307</fpage>
          -
          <lpage>318</lpage>
          . ACM Press, New York, US,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Thomas</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Dietterich</surname>
          </string-name>
          .
          <article-title>Machine-learning research: Four current directions</article-title>
          .
          <source>The AI Magazine</source>
          ,
          <volume>18</volume>
          (
          <issue>4</issue>
          ):
          <fpage>97</fpage>
          -
          <lpage>136</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Robert</given-names>
            <surname>Jaeschke</surname>
          </string-name>
          , Leandro Marinho, Andreas Hotho, Lars Schmidt-Thieme, and
          <string-name>
            <given-names>Gerd</given-names>
            <surname>Stumme</surname>
          </string-name>
          .
          <article-title>Tag recommendations in social bookmarking systems</article-title>
          .
          <source>AI Communications</source>
          , pages
          <fpage>231</fpage>
          -
          <lpage>247</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Qing</given-names>
            <surname>Lu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lise</given-names>
            <surname>Getoor</surname>
          </string-name>
          .
          <article-title>Link-based classification using labeled and unlabeled data. icml 2003 workshop on the continuum from labeled to unlabeled data</article-title>
          .
          <source>In in Machine Learning and Data Mining</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>S.</given-names>
            <surname>Macskassy</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Provost</surname>
          </string-name>
          .
          <article-title>A simple relational classifier</article-title>
          .
          <source>In Proceedings of the 2nd Workshop on Multi-Relational Data Mining, KDD2003</source>
          , pages
          <fpage>64</fpage>
          -
          <lpage>76</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Christine</given-names>
            <surname>Preisach</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lars</given-names>
            <surname>Schmidt-Thieme</surname>
          </string-name>
          .
          <article-title>Relational ensemble classification</article-title>
          .
          <source>In ICDM '06: Proceedings of the Sixth International Conference on Data Mining</source>
          , pages
          <fpage>499</fpage>
          -
          <lpage>509</lpage>
          , Washington, DC, USA,
          <year>2006</year>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Christine</given-names>
            <surname>Preisach</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lars</given-names>
            <surname>Schmidt-Thieme</surname>
          </string-name>
          .
          <article-title>Ensembles of relational classifiers</article-title>
          .
          <source>Knowledge and Information Systems</source>
          ,
          <volume>14</volume>
          :
          <fpage>249</fpage>
          -
          <lpage>272</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Steffen</surname>
            <given-names>Rendle</given-names>
          </string-name>
          , Leandro B.
          <string-name>
            <surname>Marinho</surname>
          </string-name>
          , Alexandros Nanopoulos, and
          <string-name>
            <surname>Lars</surname>
          </string-name>
          Schimdt-Thieme.
          <article-title>Learning optimal ranking with tensor factorization for tag recommendation</article-title>
          .
          <source>In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , pages
          <fpage>727</fpage>
          -
          <lpage>736</lpage>
          . ACM,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>