=Paper=
{{Paper
|id=None
|storemode=property
|title=Associating Semantics to Multilingual Tags in Folksonomies
|pdfUrl=https://ceur-ws.org/Vol-674/Paper121.pdf
|volume=Vol-674
|dblpUrl=https://dblp.org/rec/conf/ekaw/Garcia-SilvaCG10
}}
==Associating Semantics to Multilingual Tags in Folksonomies==
Associating Semantics to Multilingual Tags in
Folksonomies (Poster)
Andrés García-Silva Jorge Gracia Oscar Corcho
Ontology Engineering Group Ontology Engineering Group Ontology Engineering Group
Universidad Politécnica de Universidad Politécnica de Universidad Politécnica de
Madrid Madrid Madrid
hgarcia@fi.upm.es jgracia@fi.upm.es ocorcho@fi.upm.es
ABSTRACT measures among tags has been studied in [3]. However we
Tagging systems are nowadays a common feature in web have not found research works addressing multilingual tags.
sites where user-generated content plays an important role.
However, the lack of semantics and multilinguality hamper 3. MULTILINGUAL SENSE REPOSITORY
information retrieval process based on folksonomies. In this Inspired by the Tagora sense repository1 we designed MSR,
paper we propose an approach to bring semantics to multi- a multilingual sense repository for English and Spanish based
lingual folksonomies. This approach includes a sense disam- on Wikipedia and DBpedia information. MSR uses: 1) ar-
biguation activity and takes advantage from knowledge gen- ticle URLs as sense identifiers, and article words along with
erated by the masses in the form of articles, redirection and their frequency as keywords associated with the sense, 2)
disambiguation links, and translations in Wikipedia. We use articles listed in disambiguation pages as possible senses for
DBpedia[2] as semantic resource to define the tag meanings. ambiguous words, 3) the explicit translations among articles
to link senses in languages different from English to English
1. INTRODUCTION senses, and 4) DBpedia resources2 to define formally each
sense. For each tag to be analyzed the population process is
The term folksonomy is normally used to refer to the clas-
carry out:
sification schemes that emerge from the tagging activity of
Create disambiguation list: First, the list of candidate
a user community. Hence folksonomies represent consen-
senses is created. We look for a disambiguation page related
sual knowledge, but they are still affected by the lack of
to the tag. If this page exists then we extract the possible
semantics. Tagging systems are not aware of: 1) possibly
meanings. Otherwise, we look for a content page related to
related tags due to relations such as synonyms, broader-
the tag.
than, narrower-than, and spelling variation, or 2) the use
Extract sense information: Then, for each candidate
of ambiguous tags.
sense we extract the keywords and their frequency from the
Despite the fact that tagging systems are web applica-
corresponding article.
tions with a world wide scope and thus reaching users with
Get translations: In addition, for tags in languages
multiple languages, semantics of multilingual tags has not
different than English, we look for English translations in
been researched. We propose a novel solution for the asso-
Wikipedia and using the LabelTranslator tool3 .
ciation of semantics to multilingual tags. Our contribution
Associate semantic entities: Finally, we extract from
is twofold: 1) a multilingual sense repository, initially for
DBpedia the resources related to the candidate senses. En-
English and Spanish languages, and 2) Sem4Tags a process
glish and Spanish Wikipedia articles are linked to DBpedia
for the association of semantics to multilingual tags.
resources by means of the page4 and the wikipage-es5 re-
lations. In case the wikipage-es relation does not exists
2. RELATED WORK for an Spanish article, we use the translation found in the
The problem of identifying tag semantics in Folksonomies previous activity and use the page relation.
has been addressed by researchers in two complementary
ways: 1) by identifying groups of related tags using clus- 4. SEM4TAGS: A PROCESS FOR THE AS-
tering techniques in the hope of such grouping expose the
meaning of the tags [6], and 2) by relating Folksonomies
SOCIATION OF SEMANTICS TO MUL-
with ontologies [1]. In addition, the semantics of relatedness TILINGUAL TAGS
We designed Sem4Tags, a process aiming at associating
tags with semantic resources relying on MSR. The input is
a tag, its context, and optionally the language of the tag.
As context we use the set of user tags co-occurring when
1
http://tagora.ecs.soton.ac.uk
2
http://dbpedia.org/
3
http://neon-toolkit.org/wiki/LabelTranslator
4
http://xmlns.com/foaf/0.1/page
5
http://dbpedia.org/property/wikipage-es
annotating a resource. The output is a DBpedia resource
representing the intended meaning of the input tag. The Table 1: Coverage and accuracy of the analyzed ap-
Sem4Tags process includes the following activities: proaches
Preprocessing: The tag is preprocessed to find a nor- Coverage
malized representation based on Wikipedia article titles. We Approach\language English Spanish
benefit from Wikipedia redirection pages when the tag has Base line 51% 32%
been considered as an alternative to an article title. In ad- Sem4Tags 83% 89%
dition, we modify morphologically the tag according to the Accuracy
article title notation. Finally, if after those modifications Approach\language English Spanish
we have not found a Wikipedia article, we use the Yahoo! Base line 79% 79%
spelling service6 to find an alternative representation. Sem4Tags 81% 80%
Active Context Selection: The context is filtered to Sem4Tags & Active Context 86% 85%
get rid of tags that can affect the disambiguation activity.
The active context contains the set of most highly semanti-
cally related tags to the input tag according to a web-based of information in MSR, specifically the information about
relatedness measure[5]. the possible meaning of tags. On the contrary, the baseline
Sense Retrieval: We select from MSR the set of can- approach has that low coverage because tags are directly re-
didate senses for the tag. We query MSR using the tag lated to Wikipedia content pages, and therefore ambiguous
normalized version. If the tag is ambiguous the output of tags, lacking a default article in Wikipedia, are not pro-
this activity is a set of senses. Otherwise, the output is a cessed. With respect to accuracy Sem4Tags with Active
unique sense. Context presents the highest value. The use of active con-
Disambiguation activity: This activity select the most text allows us to increase the accuracy in both languages
probable sense for a tag from a set of senses. The idea is with respect to Sem4Tags. On the other hand, the accuracy
that the tag and its context can be compared against each of the baseline is very similar to the achieved by Sem4Tags.
one of the senses measuring the overlapping of the terms in This fact can suggest that most of the tags are used in the
the context with the terms in the Wikipedia pages related to most frequent meaning presented in Wikipedia.
the senses. We use the vector space model to represent the
senses and the tag context [4]. The vector components are 7. ACKNOWLEDGMENTS
the set of most frequent terms appearing in the Wikipedia This work is supported by the GeoBuddies (TSI2007-65677C02)
pages related to the candidate senses. In the case of the sense and España Virtual (ALT0317) projects, and the FPI grant
vectors the values of these components are calculated using (BES-2008-007622).
TF-IDF. In the case of the tag context vector the values
of these components are 1 or 0 whether the corresponding
term appears in the tag context or not. Then we compare 8. REFERENCES
the tag context vector against each sense vector using the [1] Angeletou, S., Sabou, M., Motta, E.: Semantically
cosine. Finally we choose the sense vector most similar to Enriching Folksonomies with FLOR. In 1st
the input tag as the one representing the intended meaning International Workshop on Collective Semantics:
of the tag. Collective Intelligence & the Semantic Web (CISWeb
2008). Tenerife, Spain (2008)
5. EXPERIMENT [2] Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker,
To evaluate Sem4Tags we carried out an experiment using C., Cyganiak, R., Hellmann, S.: DBpedia - A
data extracted from Flickr. We gathered 759 photos tagged Crystallization Point for the Web of Data. Journal of
with tourist cities in Spain (e.g., Barcelona, Ibiza, etc.). On Web Semantics: Science, Services and Agents on the
average those photos were annotated with 12.4 tags. World Wide Web, 7, 154-165 (2009)
Our baseline attempts to associate directly tags with DB- [3] Cattuto, C., Benz, D., Hotho, A., and Stumme, G.:
pedia resources. For doing this we create an URI of the form Semantic Grounding of Tag Relatedness in Social
http://en.wikipedia.org/wiki/tag for English tags and Bookmarking Systems. In Proceedings of the 7th
of the form http://es.wikipedia.org/wiki/tag for Span- international Conference on the Semantic Web,
ish tags. Then we query DBpedia for the resource directly Karlsruhe, Germany (2008)
related to that URI. For each one of the 2318 tagging activ- [4] Garcı́a-Silva, A., Szomszor, M., Alani, H., Corcho, O.:
ities (i.e., triples of the form huser, tag, photoi) we run the Preliminary Results in Tag Disambiguation using
baseline, Sem4Tags without selecting the active context, and DBpedia. In 1st International Workshop in Collective
Sem4Tags selecting the Active context. The semantic associ- Knowledge Capturing and Representation (CKCaR09).
ations between tags and DBpedia resources where evaluated California, USA (2009).
by 14 users. For the 15% of tagging activities the evalua- [5] Gracia, J., Mena, E.: Web-based measure of semantic
tors were not able to identify the meaning. For the rest of relatedness. In Proc. of 9th International Conference on
tagging activites the results are shown in table 1. Web Information Systems Engineering (WISE 08),
Auckland, New Zealand.(2008).
6.In terms
CONCLUSIONS
of coverage Sem4Tags is clearly superior to the
[6] Mika, P.: Ontologies are us: A unified model of social
networks and semantics. Journal of Web Semantics
baseline. This high coverage is due to: 1) the preprocess- 5(1), 5–15 (2007)
ing activity where tags are normalized, and 2) the amount
6
http://developer.yahoo.com/search/boss/