<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Content- and Graph-based Tag Recommendation: Two Variations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Johannes Mrosek</string-name>
          <email>mrosek@internet-sicherheit.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Bussmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hendrik Albers</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kai Posdziech</string-name>
          <email>kaip@gmx.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benedikt Hengefeld</string-name>
          <email>hengefeld@web.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nils Opperman</string-name>
          <email>nilsoppermann@web.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Robert</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gerrit Spira</string-name>
          <email>gerrit.spira@gmx.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FH Gelsenkirchen, Department of Computer Sciences</institution>
          ,
          <addr-line>Neidenburger Strasse 43, 45877 Gelsenkirchen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe two variants of our approach to tackle the task 1 &amp; 2 of the ECML PKDD Discovery Challenge 2009 where each contenter had to identify up to 5 tags for each resource of a given set of either bibtex-like references to publications or bookmarks. The quality of the results was measured against the tags that users of the data source (www.bibsonomy.org) had originally assigned to the resources (F1 measure). In our approach, we either generate tags (from the content of the given resource data or after crawling additional resources) or we request tags from tagging services. We call each of this tag sources a tag recommender. We then combine the results of the tag recommenders based on weighting factors. The weighting factors are determined experimentally by comparing generated and expected tags based on the available training data. This general idea is also used for the graph-based approach required to solve task 2. Here again, the nal tag recommendations are computed from the individual results of the di erent tag-recommending algorithms. In the preliminary result list, we ranked second for task 1 (Group 2) and nineth for task 2 (Group 1).</p>
      </abstract>
      <kwd-group>
        <kwd>content-based graph-based tag recommendation bibsonomy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Assigning tags to resources can be an e ective instrument to organize an
information space. Users interacting with this information space may utilize the tags
to identify relevant resources or groups of resources. User-driven tag assignment
is a popular way to ensure at least some sort of tag quality1. Often, the users
assigning tags are supported by algorithmic tag recommendation. This may be as
simple as o ering auto-complete elds for entering tags that display tags already
assigned by users, or it may be based on a full- edged analysis of the resource
1 in the sense that the tags indeed help users to nd or organize resources, for recent
results on tag quality compare [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
the user wants to tag. This analysis may present a set of automatically identi ed
tags to the user. Within the later context, compare [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we describe two
variants of our approach to content-based tag recommendation which we applied to
task 1 and 2 of the ECML PKDD Discovery Challenge 2009. We acted as two
teams (with completely independent implementations), each team deploying its
own variation of the overall approach { and each with di erent success. In the
following we rst describe the solution and results of group 12 for tasks 1. This
will be followed by a brief presentation of the di erences of the solution for task
1 of group 23 and their results. We conclude with details of the solution for task
2 of group 1.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Content-based Tag Recommendation</title>
      <p>This attempt on content-based tag recommendation uses di erent sources to
generate tag candidates. These candidates are combined on the basis of
appropriate weighting factors which have been assigned to the particular sources ex
ante. The rst kind of source are web services, which o er information for known
resources. This information contains tags which already have been assigned to
those resources. In this case we query del.icio.us and citeulike.org. The second
kind of source are the resources themselves. In case of bookmarks the content
of the according websites is crawled and analyzed. For bibtex entries the
information contained in the bibtex table is taken into account. So tag candidates
are determined without using any external service. The last source is also a web
service called tagthe.net. It already has an engine, which recommends tags for a
given URL.
2.1</p>
      <sec id="sec-2-1">
        <title>Harvesting Tag Candidates</title>
        <p>The rst step to a tag recommendation is the accumulation of candidates and
additional information from the several sources. For every URL in the bookmark
table, the del.icio.us service is queried. It can be accessed via a feed. The URL
md5 hash of the resource is inserted into a URL-pattern. When the resulting
URL is accessed, all available information is returned. It includes a count which
speci es how often the URL was posted, and a list of top tags assigned to it.
Each tag is supplemented with an information about the frequency with witch
it has been assigned to the given resource.</p>
        <p>In some cases it is possible to nd tag candidates for bibtex entries at
citeulike.org. The description or misc eld of the bibtex table often contain a
citeulikeid. With this id the according citeulike page can be called and crawled for the
assigned tags. These tags are already classi ed as "tag", "publisher" and
"author". Numerical information like a count cannot be obtained.</p>
        <p>A similar kind of data is provided by the service tagthe.net. It generates
classied tags for a submitted URL automatically and independent of the document
format behind it. The available categories are "tag", "author", "person" and
"location". Like citeulike.org, tagthe.net returns no numerical information. Anyway
this service can be seen as a backup system, because it can provide tag
candidates for every URL. Thus it is called for all URLs in the bookmark and the
bibtex table. Unfortunately there is no information about the algorithmics of
the service available.</p>
        <p>In addition to the citeulike-ids the bibtex table contains further interesting
information. The title, journal and description elds contain words that can be
potentially used as concise tags. Hence after comparison with a stop word list
all remaining words in these elds are interpreted as tag candidates.
The last source for tag candidates are the websites behind the URLs in the
bookmark table. After crawling and parsing the site's source code the words are
counted and checked against a stop word list. Furthermore they are classi ed by
the location of their appearance like in "meta keywords", "meta description",
"title" and "body".
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Selection Tags from the Candidates</title>
        <p>The recommendation system has a hierarchical structure. This means there is
one meta recommender which relies on several separate source recommenders,
one for each source described in section 1.1. These recommenders provide up to
20 tag candidates for a queried content id. Every tag candidate is complemented
by a score the according recommender assigns to it. This score can vary between
0 and 1. How the recommender constitutes that score depends on the source
and is described in section 1.3. After the source recommenders have made their
suggestions, the meta recommender takes all of the intermediate results and
determines the actual tags. Therefor it combines a candidate's frequency of
appearance in all sources k with the score si (1 i k) provided by the source
recommenders. si is already in uenced by a weighting factor for the individual
sources. How this factor is determined will be described in section 1.4. The nal
score s for a tag candidate can be determined by the formula
s = k (s1 + s2 + ::: + sk):
(1)</p>
        <p>Note that s can be a value greater than 1. When the meta recommender has
calculated these aggregated scores for all candidates, they are ordered by this new
information. The ve tags with the highest scores are selected as recommended
tags. In order to prohibit recommendation of tags with a very low score, an
optional lter can be set. This helps to enhance the precision.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Calculating the Individual Scores</title>
        <p>As already stated the calculation of the single scores di ers depending on the
source. The reason for this is the additional information, which is provided
besides the tags. Every source o ers a di erent kind of information. To calculate
a del.icio.us score sdelicious for a tag, the number of posts n which assigned
it to the resource is devided by the total number p of posts for the resource at
del.icio.us. After that the result is multiplied by the weight wdelicious constituted
for del.icio.us.</p>
        <p>sdelicious = wdelicious
n
p</p>
        <p>The scores for tagthe.net and citeulike candidates are determined as
follows: For each of the provided categories a weighting factor ccategorie is de ned:
ctag = 1:0, cauthor = 0:3, cpublisher=0:2, clocation = 0:2, cperson = 0:1. The score
stagthe=cite of each tag candidate is the result of the product of the source's
weight wtagthe=cite and the categorie's weight ccategorie. In one case there is an
exception from this practice. As can be detected in the training data the tag
"juergen" appears pretty often. So this special tag is always handled with a
score of 1:0. Otherwise all scores are determined by
stagthe=cite = wtagthe=cite ccategorie:
(2)
(3)</p>
        <p>The tag candidates generated from the bibtex entries are treated in a similar
way. For some elds a weighting factor cfield is set: ctitle = 0:55, cjournal = 0:25,
cdescription = 0:2. Only tags from the title eld are used as a suggestion for the
meta recommender. These candidates get a higher score if they also appear in
the journal or description eld. The individual scores for these elds are added
to the initial title-score of 0:55.</p>
        <p>The scoring process for the crawled content of a website is a little more complex.
It is based on the information about the location of appearance as described in
section 1.1. Three steps are passed until the nal score is determined. In the
rst step a tag's frequency of appearance nlocation in the individual locations
is counted. This value is weighted by a factor clocation which is related to the
location's importance. E.g. a word from the title or the keyword-list is rather a
good tag than one from the body. All counts are multiplied by their weighting
factors get summed up to a kind of weighted frequency nwfreq for the whole
resource
nwfreq = ntitle ctitle + ndescription cdescription +
nkeywords ckeywords + nbody cbody:
(4)</p>
        <p>If a tag appears at di erent locations, its score is raised in step two. This
takes into account the fact that such a tag is a good tag in most cases. To raise
the score, nwfreq is multiplied by a factor fcat which is related to the number of
categories the tag appeared in.</p>
        <p>The modi ed weighted frequency nwfreq is determined by
fcat =
8 1; 1 categorie
&gt;
&gt;&lt; 1:5; 2 categories</p>
        <p>3; 3 categories
&gt;
&gt;: 5; 4 categories
nwfreq = nwfreq fcat:
(5)
(6)</p>
        <p>In the last step, nwfreq is normalized to the common score interval [0; 1].
Therefor every nwfreq is divided by the highest nwfreq which is reached for the
crawled resource. The result is a score scc for a tag from the crawled content.
The number of tags which are returned to the meta recommender is limited.
Only the 20 tags with the highest scores are chosen.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Calculating the Source Weights</title>
        <p>A key point in this two-stage recommendation approach is the calculation of
suitable weighting factors for the di erent sources. Their tag-quality varies in
a wide range. Thus handling the candidates of the individual recommenders
as if they were of similar quality leads to bad results. As a foundation for the
weighting factor calculation the postcore-2 of the training dataset was used. For
every source the set of tag assignments it can supply was determined and an
according tas le was generated. E.g. to calculate a factor for citeulike, a tas
le with all content ids, which are associated with bibtex entries containing a
citeulike-id, was generated. Corresponding to the tas les a result le was created
for every source. This result was independent of the meta recommender and all
the other sources. The result les were measured with f1-score against the tas
les created before. The rst attempt was to use the f1 score a result le achieved
as the weighting factor for the according source recommender.</p>
        <p>However when testing the complete recommendation process against postcore-2,
experiments with various weighting factors pointed out a better choice. Using
the precision value which has been computed for the various result les to get
the f1-score, provides better overall results. Thus all the weighting factors for
the particular sources assumed in the meta recommender match the precision
the respective source recommender reaches for the subset of tag assignments it
can supply.
2.5</p>
      </sec>
      <sec id="sec-2-5">
        <title>Results</title>
        <p>Table 1 displays the results each recommender achieved for his own subset of
the postcore-2 training data. The intermediate results in the rst ve rows show
Test
del.icio.us
tagthe.net
bibtex
web content
meta
meta
with lter
Meta/Overall
with lter</p>
        <p>5285 0.446
/ 6372
38383 0.418
/ 40882</p>
        <p>40468 0.066
/ 51580</p>
        <p>22341 0.155
/ 22341</p>
        <p>33193 0.150
/ 40882</p>
        <p>63107 0.350
/ 64120</p>
        <p>62104 0.344
/ 64120</p>
        <p>30844 0.132
/ 43002
0.134
0.353
0.053
0.127
0.123
0.254
0.269
0.103
0.206
0.383
0.059
0.139
0.135
0.294
0.302
0.116
recall, precision and f1-score for every source recommender. The weighting factor
later used for the meta recommender is the particular precision as described in
section 1.4.</p>
        <p>Row ve and six present the results of the meta recommender for the training
data. The rst is generated without a lter whereas the second one uses a lter
to avoid tag recommendations with very low scores as described in section 1.2.</p>
        <p>The table shows that del.icio.us and citeulike produce good tag candidates.
Recommendations made by other sources have a much lower quality. The meta
recommenders result for the test dataset is far below the result of the training
data set.
2.6</p>
      </sec>
      <sec id="sec-2-6">
        <title>Conclusion</title>
        <p>The comparison of the meta recommender's results for training and test data
leads to the conclusion that the choice of sources was suboptimal. As the training
results were computed on the postcore-2 the web services provided tags with a
high quality. Every resource was posted in bibsonomy at least two times. So the
probability that it is contained in the web services' databases as well, was pretty
good. Thus the decrease in score might be explainable by the unpopularity of
the resources in the test dataset.</p>
        <p>In conclusion the two stage approach is a good attempt for popular resources.
To raise the result quality for unpopular resources too, further sources with
previously assigned tags have to be added.
2.7</p>
      </sec>
      <sec id="sec-2-7">
        <title>Group 2: Variations for Task 1</title>
        <p>Harvesting Tags: In addition to citeulike and del.icio.us, group 2 implemented
a tag recommender using Google Scholar for bibtex entries. Using the title data
only, links to the resource itself or similar resources were harvested. From the
rst ten entries three were selected by counting identic words in the titles of
the referenced documents (ignoring certain stop words). In the competition,
the Google scholar tag recommender harvested roughly 60.000 links for 24.000
bibtex data entries. Subsequently, the Web content crawler of group 2 was used
to obtain candidate tags from the three selected resources.</p>
        <p>Furthermore, the web content crawler was able (to a certain extend) to parse
PDF documents in addition to HTML documents. Also the implementation of
the citeulike tag recommender di ered: a recent database dump of the citeulike
data was used which made it possible to search for DOI-ids or URLs directly and
to determine the overall frequency of tags in the citeulike data set. TagTheNet
was not used. A straighforward tag recommender (called Data set recommender
below) that analysed the relevant elds of the data entries complemented the
set of recommenders.</p>
        <p>Selecting Tags: No additional ltering for small scores was used. If less than 5
tags were harvested, the remaining slots for tags were lled by randomly drawing
tags from the 30 tags most often used in bibsonomy4. The weights for a
recommender could be di erent for bibtex and bookmark context. The weights for the
4 Not a very successful idea { rst evaluations show that this was rather
counterproductive.
recommenders were chosen manually, based on experiences while experimenting
with the training data. Tags recommended by more than one recommender were
rated signi cantly higher by multiplying their score with a factor which depends
exponentially on the number of recommendations.5
Results: The results of group 2 for task 1 are presented in table 2. Additional
quantitative data are shown in table 3.</p>
        <p>Please note that due to a persistent problem with the CiteULike tag
recommender, it is not listed in the training data and haven't been used in the actual
competition. Therefore, the relevant F1-score for the ranking in the competition
is 0.180 (second place). Some more details on the impact of the di erent tag
recommenders will be presented at the workshop.
The graph-based approach adapts the same hierarchical recommender structure
the content-based approach is based on6. This means that there is also one meta
recommender which combines the results of subordinated recommenders.
Instead of using external sources, these recommenders implement di erent
algorithms. Every algorithm processes bibsonomy's graph structure and the
contained user information in a di erent way to generate a set of tag
recommendations.</p>
        <p>Like for the content-based approach every tag is valuated with a suitable
weighting factor. This weighting factor also depends on the quality of the subordinated
recommender. It is determined in the same way as for the source recommenders
in Task 1 (cmp. section 1.4). The used benchmark data set contains the whole
postcore-2. For processing the nal test data set, three di erent algorithms were
used to supply the meta recommender. All of these algorithms have a di erent
focus on evaluating the graph-structure. The speci c operation methods are
presented in sections 2.1 to 2.3.
5 value = value * Math.pow((1.0 + 3*(count - 1)/10), count);
6 The following section documents the work of group 1 only
The rst algorithm selects tags based on the resource information. Therefor all
tags which have already been assigned to a queried resource are determined.
Additionally each tag's frequency of appearance nlocal in combination with the
resource is counted. To calculate the score str, this value is taken and divided
by the number npost the resource was posted. The result is multiplied by the
weighting factor wtr for this recommender. Like in section 1, the recommenders
precision for postcore-2 is used for this purpose.</p>
        <p>str = wtr
nlocal
npost
(7)
(8)</p>
        <p>If two tags reach the same score, the one with the higher frequency of
appearance in the entire postcore is preferred.
3.2</p>
      </sec>
      <sec id="sec-2-8">
        <title>Tag by User</title>
        <p>The second algorithm recommends and scores tags with a special relation to the
user. Algorithmically this approach is very similar to the one described in section
2.1. At rst the set of tags is identi ed which the user who makes a post has
assigned to other posts previously. The frequency of usage for the tags nlocal is
also determined. From this set of tags the nlocal-value for the most used one is
taken as a reference value nmax. The score stu this recommender calculates for
a tag is as follows
stu = wtu
nlocal
nmax</p>
        <p>The weighting factor wtu is determined as usual (cmp. Section 2.1).
3.3</p>
      </sec>
      <sec id="sec-2-9">
        <title>Tag by User Similarity</title>
        <p>The last algorithm determines tags, which have been used by similar users. The
similiarity between two users is de ned over the number of equal resources posted
by them.</p>
        <p>Therefor the rst task is the determination of similarities between all users.
These are calculated by the Tanimoto-Score under consideration of the posted
resources. These resources are used as comparison criterion for di erent users.
In the recommending process the top ve of the similar users are identi ed. The
tags they assigned to the posted resource are accumulated and sorted by their
frequency of appearance ncount in the posts of the top ve users group. As a
reference value the maximum frequency of appearance nmax of a tag in the same
group is utilized. The nal score sus for a tag generated by this recommender is
calculated with the following formula:
sus = wus
ncount
nmax
(9)
The results for Task two are arranged the same way as the results for the
contentbased approach in task one. The recall, precision and f1-score values for the
training datatset are shown individually for every algorithm. Furthermore
results are visualized for meta recommenders which use di erent combinations of
the subordinated recommenders. A lter like the one described in 1.2 is tested
too.</p>
        <p>The recommendations for the test dataset have been made by a meta
recommender which uses all of the three subordinated recommenders and a lter. They
are a good deal worse than the results for the training dataset.
Like for task 1 there is a dramatic decrease of the f1-score in the test. In this
case the explanation can be found in the composition of the training data les.
When evaluating the methods for task 2 with the training data, the post, a
recommendation is made for, is not excluded from the dataset. Thus the tags of the
post are de nitely in the training dataset. This increases the probability that
the expected tags are recommended. As a consequence, the scores are higher as
for the test data which are not included in post-core 2.</p>
        <p>In conclusion, the evaluation of the training data is not valid, but the algorithm
works correct for the test data.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Final remarks</title>
      <p>The competition o ered an excellent test bed for our approach to tag
recommendation. Though we are pretty happy with the results obtained, much work
remains to be done:
{ setting the weights for the di erent recommenders could be improved
(iterative algorithmic optimisation; cleaning/choosing training data),
{ relations to similar resources in the bibsonomy data set (identi ed for
example with the help of the recommended tags or based on Web crawling /
Google scholar results) could be explored: if, for a given resource x, a similar
resource y in the bibsonomy data sets is identi ed, tag candidates can be
drawn from resource y (either directly or via further graph-based or recursive
similarity-based analysis),
{ ne-tuning of the individual tag recommenders, for example by coupling the</p>
      <p>Web crawler to the tag database,
to name just a few open topics. We want to thank the organizers7 and to express
our hope that this stimulating and exciting competition will be continued in the
future and that some of our results and suggestions may help others to develop
further improved tag recommenders.
7 and the anonymous reviewers and Wolfram Conen for their valuable comments.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Jaschke</given-names>
            , R.,
            <surname>Marinho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.B.</given-names>
            ,
            <surname>Hotho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Schmidt-Thieme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Stumme</surname>
          </string-name>
          , G.:
          <article-title>Tag Recommendations in Folksonomies</article-title>
          .
          <source>In: PKDD 2007</source>
          , Springer (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Sen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vig</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rield</surname>
          </string-name>
          , J.:
          <article-title>Learning to Recognize Valuable Tags</article-title>
          .
          <source>In: IUT'09</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farooq</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carroll</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          :
          <article-title>Enhancing Information Scent: Identifying and Recommending Quality Tags</article-title>
          . In: GROUP'09,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>