=Paper= {{Paper |id=None |storemode=property |title=Associating Semantics to Multilingual Tags in Folksonomies |pdfUrl=https://ceur-ws.org/Vol-674/Paper121.pdf |volume=Vol-674 |dblpUrl=https://dblp.org/rec/conf/ekaw/Garcia-SilvaCG10 }} ==Associating Semantics to Multilingual Tags in Folksonomies== https://ceur-ws.org/Vol-674/Paper121.pdf
              Associating Semantics to Multilingual Tags in
                         Folksonomies (Poster)

             Andrés García-Silva                        Jorge Gracia                      Oscar Corcho
          Ontology Engineering Group            Ontology Engineering Group         Ontology Engineering Group
           Universidad Politécnica de            Universidad Politécnica de         Universidad Politécnica de
                    Madrid                                Madrid                             Madrid
              hgarcia@fi.upm.es                     jgracia@fi.upm.es                 ocorcho@fi.upm.es

ABSTRACT                                                         measures among tags has been studied in [3]. However we
Tagging systems are nowadays a common feature in web             have not found research works addressing multilingual tags.
sites where user-generated content plays an important role.
However, the lack of semantics and multilinguality hamper        3.   MULTILINGUAL SENSE REPOSITORY
information retrieval process based on folksonomies. In this        Inspired by the Tagora sense repository1 we designed MSR,
paper we propose an approach to bring semantics to multi-        a multilingual sense repository for English and Spanish based
lingual folksonomies. This approach includes a sense disam-      on Wikipedia and DBpedia information. MSR uses: 1) ar-
biguation activity and takes advantage from knowledge gen-       ticle URLs as sense identifiers, and article words along with
erated by the masses in the form of articles, redirection and    their frequency as keywords associated with the sense, 2)
disambiguation links, and translations in Wikipedia. We use      articles listed in disambiguation pages as possible senses for
DBpedia[2] as semantic resource to define the tag meanings.      ambiguous words, 3) the explicit translations among articles
                                                                 to link senses in languages different from English to English
1.   INTRODUCTION                                                senses, and 4) DBpedia resources2 to define formally each
                                                                 sense. For each tag to be analyzed the population process is
   The term folksonomy is normally used to refer to the clas-
                                                                 carry out:
sification schemes that emerge from the tagging activity of
                                                                    Create disambiguation list: First, the list of candidate
a user community. Hence folksonomies represent consen-
                                                                 senses is created. We look for a disambiguation page related
sual knowledge, but they are still affected by the lack of
                                                                 to the tag. If this page exists then we extract the possible
semantics. Tagging systems are not aware of: 1) possibly
                                                                 meanings. Otherwise, we look for a content page related to
related tags due to relations such as synonyms, broader-
                                                                 the tag.
than, narrower-than, and spelling variation, or 2) the use
                                                                    Extract sense information: Then, for each candidate
of ambiguous tags.
                                                                 sense we extract the keywords and their frequency from the
   Despite the fact that tagging systems are web applica-
                                                                 corresponding article.
tions with a world wide scope and thus reaching users with
                                                                    Get translations: In addition, for tags in languages
multiple languages, semantics of multilingual tags has not
                                                                 different than English, we look for English translations in
been researched. We propose a novel solution for the asso-
                                                                 Wikipedia and using the LabelTranslator tool3 .
ciation of semantics to multilingual tags. Our contribution
                                                                    Associate semantic entities: Finally, we extract from
is twofold: 1) a multilingual sense repository, initially for
                                                                 DBpedia the resources related to the candidate senses. En-
English and Spanish languages, and 2) Sem4Tags a process
                                                                 glish and Spanish Wikipedia articles are linked to DBpedia
for the association of semantics to multilingual tags.
                                                                 resources by means of the page4 and the wikipage-es5 re-
                                                                 lations. In case the wikipage-es relation does not exists
2.   RELATED WORK                                                for an Spanish article, we use the translation found in the
  The problem of identifying tag semantics in Folksonomies       previous activity and use the page relation.
has been addressed by researchers in two complementary
ways: 1) by identifying groups of related tags using clus-       4.   SEM4TAGS: A PROCESS FOR THE AS-
tering techniques in the hope of such grouping expose the
meaning of the tags [6], and 2) by relating Folksonomies
                                                                      SOCIATION OF SEMANTICS TO MUL-
with ontologies [1]. In addition, the semantics of relatedness        TILINGUAL TAGS
                                                                   We designed Sem4Tags, a process aiming at associating
                                                                 tags with semantic resources relying on MSR. The input is
                                                                 a tag, its context, and optionally the language of the tag.
                                                                 As context we use the set of user tags co-occurring when
                                                                 1
                                                                   http://tagora.ecs.soton.ac.uk
                                                                 2
                                                                   http://dbpedia.org/
                                                                 3
                                                                   http://neon-toolkit.org/wiki/LabelTranslator
                                                                 4
                                                                   http://xmlns.com/foaf/0.1/page
                                                                 5
                                                                   http://dbpedia.org/property/wikipage-es
annotating a resource. The output is a DBpedia resource
representing the intended meaning of the input tag. The           Table 1: Coverage and accuracy of the analyzed ap-
Sem4Tags process includes the following activities:               proaches
   Preprocessing: The tag is preprocessed to find a nor-                               Coverage
malized representation based on Wikipedia article titles. We         Approach\language          English Spanish
benefit from Wikipedia redirection pages when the tag has                    Base line          51%      32%
been considered as an alternative to an article title. In ad-                Sem4Tags           83%      89%
dition, we modify morphologically the tag according to the                             Accuracy
article title notation. Finally, if after those modifications        Approach\language          English Spanish
we have not found a Wikipedia article, we use the Yahoo!                     Base line          79%      79%
spelling service6 to find an alternative representation.                     Sem4Tags           81%      80%
   Active Context Selection: The context is filtered to              Sem4Tags & Active Context 86%       85%
get rid of tags that can affect the disambiguation activity.
The active context contains the set of most highly semanti-
cally related tags to the input tag according to a web-based      of information in MSR, specifically the information about
relatedness measure[5].                                           the possible meaning of tags. On the contrary, the baseline
   Sense Retrieval: We select from MSR the set of can-            approach has that low coverage because tags are directly re-
didate senses for the tag. We query MSR using the tag             lated to Wikipedia content pages, and therefore ambiguous
normalized version. If the tag is ambiguous the output of         tags, lacking a default article in Wikipedia, are not pro-
this activity is a set of senses. Otherwise, the output is a      cessed. With respect to accuracy Sem4Tags with Active
unique sense.                                                     Context presents the highest value. The use of active con-
   Disambiguation activity: This activity select the most         text allows us to increase the accuracy in both languages
probable sense for a tag from a set of senses. The idea is        with respect to Sem4Tags. On the other hand, the accuracy
that the tag and its context can be compared against each         of the baseline is very similar to the achieved by Sem4Tags.
one of the senses measuring the overlapping of the terms in       This fact can suggest that most of the tags are used in the
the context with the terms in the Wikipedia pages related to      most frequent meaning presented in Wikipedia.
the senses. We use the vector space model to represent the
senses and the tag context [4]. The vector components are         7.   ACKNOWLEDGMENTS
the set of most frequent terms appearing in the Wikipedia           This work is supported by the GeoBuddies (TSI2007-65677C02)
pages related to the candidate senses. In the case of the sense   and España Virtual (ALT0317) projects, and the FPI grant
vectors the values of these components are calculated using       (BES-2008-007622).
TF-IDF. In the case of the tag context vector the values
of these components are 1 or 0 whether the corresponding
term appears in the tag context or not. Then we compare           8.   REFERENCES
the tag context vector against each sense vector using the        [1] Angeletou, S., Sabou, M., Motta, E.: Semantically
cosine. Finally we choose the sense vector most similar to            Enriching Folksonomies with FLOR. In 1st
the input tag as the one representing the intended meaning            International Workshop on Collective Semantics:
of the tag.                                                           Collective Intelligence & the Semantic Web (CISWeb
                                                                      2008). Tenerife, Spain (2008)
5.     EXPERIMENT                                                 [2] Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker,
   To evaluate Sem4Tags we carried out an experiment using            C., Cyganiak, R., Hellmann, S.: DBpedia - A
data extracted from Flickr. We gathered 759 photos tagged             Crystallization Point for the Web of Data. Journal of
with tourist cities in Spain (e.g., Barcelona, Ibiza, etc.). On       Web Semantics: Science, Services and Agents on the
average those photos were annotated with 12.4 tags.                   World Wide Web, 7, 154-165 (2009)
   Our baseline attempts to associate directly tags with DB-      [3] Cattuto, C., Benz, D., Hotho, A., and Stumme, G.:
pedia resources. For doing this we create an URI of the form          Semantic Grounding of Tag Relatedness in Social
http://en.wikipedia.org/wiki/tag for English tags and                 Bookmarking Systems. In Proceedings of the 7th
of the form http://es.wikipedia.org/wiki/tag for Span-                international Conference on the Semantic Web,
ish tags. Then we query DBpedia for the resource directly             Karlsruhe, Germany (2008)
related to that URI. For each one of the 2318 tagging activ-      [4] Garcı́a-Silva, A., Szomszor, M., Alani, H., Corcho, O.:
ities (i.e., triples of the form huser, tag, photoi) we run the       Preliminary Results in Tag Disambiguation using
baseline, Sem4Tags without selecting the active context, and          DBpedia. In 1st International Workshop in Collective
Sem4Tags selecting the Active context. The semantic associ-           Knowledge Capturing and Representation (CKCaR09).
ations between tags and DBpedia resources where evaluated             California, USA (2009).
by 14 users. For the 15% of tagging activities the evalua-        [5] Gracia, J., Mena, E.: Web-based measure of semantic
tors were not able to identify the meaning. For the rest of           relatedness. In Proc. of 9th International Conference on
tagging activites the results are shown in table 1.                   Web Information Systems Engineering (WISE 08),
                                                                      Auckland, New Zealand.(2008).
6.In terms
     CONCLUSIONS
           of coverage Sem4Tags is clearly superior to the
                                                                  [6] Mika, P.: Ontologies are us: A unified model of social
                                                                      networks and semantics. Journal of Web Semantics
baseline. This high coverage is due to: 1) the preprocess-            5(1), 5–15 (2007)
ing activity where tags are normalized, and 2) the amount
6
    http://developer.yahoo.com/search/boss/