=Paper=
{{Paper
|id=Vol-292/paper-11
|storemode=property
|title=Understanding the Semantics of Ambiguous Tags in Folksonomies
|pdfUrl=https://ceur-ws.org/Vol-292/paper11.pdf
|volume=Vol-292
|authors=Ching-man Au Yeung,Nicholas Gibbins,and Nigel Shadbolt,pages 108-121
|dblpUrl=https://dblp.org/rec/conf/semweb/YeungGS07a
}}
==Understanding the Semantics of Ambiguous Tags in Folksonomies==
<pdf width="1500px">https://ceur-ws.org/Vol-292/paper11.pdf</pdf>
<pre>
           Understanding the Semantics of Ambiguous
                     Tags in Folksonomies

                 Ching-man Au Yeung, Nicholas Gibbins, and Nigel Shadbolt

                        Intelligence, Agents and Multimedia Group (IAM),
                           School of Electronics and Computer Science,
                                     University of Southampton,
                                    Southampton SO17 1BJ, UK
                               {cmay06r,nmg,nrs}@ecs.soton.ac.uk


              Abstract. The use of tags to describe Web resources in a collaborative
              manner has experienced rising popularity among Web users in recent
              years. The product of such activity is given the name folksonomy, which
              can be considered as a scheme of organizing information in the users’ own
              way. In this paper, we present a possible way to analyze the tripartite
              graphs – graphs involving users, tags and resources – of folksonomies and
              discuss how these elements acquire their meanings through their associ-
              ations with other elements, a process we call mutual contextualization.
              In particular, we demonstrate how diﬀerent meanings of ambiguous tags
              can be discovered through such analysis of the tripartite graph by study-
              ing the tag sf. We also discuss how the result can be used as a basis to
              better understand the nature of folksonomies.


      1     Introduction

      The use of freely-chosen words or phrases called tags to classify Web resources
      has experienced rising popularity among Web users in recent years. Through the
      use of tags, Web users come to share and organize their favourite Web resources
      in diﬀerent social tagging systems, such as del.icio.us1 and Flickr2 . The result of
      this collaborative and social tagging activity is given the name folksonomy, which
      refers to the classiﬁcation system evolved from the individual contributions of
      tags from the users [1].
          Collaborative tagging possesses a number of advantages which account for
      its popularity. These include its simplicity as well as the freedom enjoyed by the
      users to choose their own tags. However, some limitations and shortcomings, such
      as the problem of ambiguous meanings of tags and the existence of synonyms,
      also aﬀect its eﬀectiveness to organize resources on the Web. As collaborative
      tagging attracts the attentions of researchers, methods on how useful information
      can be discovered from the seemingly chaotic folksonomies have been developed.
      In particular, some focus on discovering similar documents or communities of
      1
          http://del.icio.us/
      2
          http://www.ﬂickr.com/


108          International Workshop on Emergent Semantics and Ontology Evolution
shared interests [17, 13], while some perform analysis on the aﬃliation between
entities to ﬁnd out diﬀerent relations between tags [10, 14].
    In this paper we focus on analysis of tripartite graphs of folksonomies, graphs
which involve the three basic elements of collaborative tagging, namely users,
tags and resources. We present how these elements come to acquire their own
semantics through their connections with other elements in the graphs, a process
which we call mutual contextualization. In particular, we carry out a preliminary
study on tripartite graphs with data obtained from del.icio.us, and demonstrate
how we can understand the semantics of ambiguous tags by examining the struc-
tures of these graphs. We also discuss how the result can be used as a basis to
acquire a better understanding of the nature of folksonomies.
    The rest of this paper is structured as follows. Section 2 gives some back-
ground information on collaborative tagging systems and folksonomies. We de-
scribe the process of mutual contextualization between the three basic elements
in Section 3. We detail the preliminary study on tripartite graphs of folksonomies
in Section 4, followed by discussions in Section 5. Finally we present our conclu-
sions and discuss possible future research directions in Section 6.


2     Background
2.1   Collaborative Tagging Systems
Tagging originates from the idea of using keywords to describe and classify re-
sources. These keywords are descriptive terms which indicate the topics ad-
dressed by the resources. Collaborative tagging systems emerged in recent years
have taken this idea further by allowing general users to assign tags, which are
freely-chosen keywords, to resources on the Web. For example, one can store
a bookmark of the page “http://www.google.com/” on a collaborative tagging
system, and assign to it the tags google, search and useful. As the tags of diﬀerent
users are aggregated, the tags form a kind of signature of the document, which
can be used for future retrieval or indication of the nature of the page.
    Collaborative tagging systems have started to thrive and grow in number
since late 2003 and early 2004 [6]. As one of the earliest initiative of collaborative
tagging, del.icio.us provides a kind of social bookmarking service, which allows
users to store their bookmarks on the Web, and use tags to describe them.
Other services focusing on diﬀerent forms of Web resources appeared shortly.
For example, Flickr allows users to tag digital photos uploaded by themselves.
    Collaborative tagging are generally considered to have a number of advan-
tages over traditional methods of organizing information, as evidently shown
by its popularity among general Web users and its application on a wide range
of Web resources. The following features of collaborative tagging are generally
attributed to their success and popularity [1, 15, 18].

Low cognitive cost and entry barriers The simplicity of tagging allows any Web
user to classify their favourite Web resources by using keywords that are not
constrained by predeﬁned vocabularies.


                     ESOE, Busan - Korea, November 2007                                  109
      Immediate feedback and communication Tag suggestions in collaborative tagging
      systems provide mechanisms for users to communicate implicitly with each other
      through tag suggestions to describe resources on the Web.

      Quick Adaptation to Changes in Vocabulary The freedom provided by tagging
      allows fast response to changes in the use of language and the emergency of new
      words. Terms like AJAX, Web2.0, ontologies and social network can be used
      readily by the users without the need to modify any pre-deﬁned schemes.

      Individual needs and formation of organization Tagging systems provide a con-
      venient means for Web users to organize their favourite Web resources. Besides,
      as the systems develop, users are able to discover other people who are also in-
      terested in similar items.

         On the other hand, limitations and problems of existing collaborative tagging
      systems have also been identiﬁed [1, 13, 18]. These issues hinder the growth or
      aﬀect the usefulness of the systems.

      Tag Ambiguity Since vocabulary is uncontrolled in collaborative tagging systems,
      there is no way to make sure that a tag is corresponding to a single and well-
      deﬁned concept. For an example, items being tagged by the term sf may either
      be related to something about science ﬁction or the city San Francisco.

      The use of multiple words and spaces Some systems allow users to input tags
      separated by spaces. Problems arise when users would like to use phrases with
      multiple words to describe the Web resources.

      The problem of synonyms Diﬀerent tags can be used to refer to the same concept
      in a tagging system. For example, “mac,” “macintosh,” and “apple” can all be
      used to describe Web resources related to Apple Macintosh computers[1]. The
      use of diﬀerent word forms such as plurals and parts of speech also exacerbate
      the problem.

      Lack of semantics A tag provides limited information about the documents
      being tagged. For example, when tagging an URL with the tag “podcast,” one
      can mean that the website provides podcast, describes the use of podcast, or
      provides details on the history of podcasting.

      2.2   Folksonomies
      As more tags are contributed to a collaborative tagging system by the users,
      a form of classiﬁcation scheme will take shape. Such scheme emerges from the
      collective eﬀorts of the participating users, reﬂecting their own viewpoints on
      how the shared resources on the Web should be described using various tags.
      This product of collaborative tagging is now commonly referred to as folksonomy
      [16]. A folksonomy is generally agreed to be consisting of at least the following
      three sets of entities [9, 10, 18].


110         International Workshop on Emergent Semantics and Ontology Evolution
Users Users are the ones who assign tags to Web resources in social tagging
systems. They are also referred to as actors, as in social network analysis.

Tags Tags are keywords chosen by users to describe and categorize resources.
Depending on systems, tags can be a single word, a phase or a combination of
symbols and alphabets. Tags are referred to as concepts in some works [10].

Resources Resources refer to the objects that are being tagged by the users in
the social tagging systems. Depending on the system, resources can be used to
refer to Web pages (bookmarks) as in del.icio.us or photos as in Flickr. Resources
are also referred to as instances, objects or documents, depending on the context.

    Quite a number of research works perform analysis on social tagging systems.
However, even though most works adopt a model involving the above three
entities, with a few mentioning extra dimensions such as the time of tagging,
there is actually not a common consensus on the formal deﬁnition of folksonomy.
Below we summarize the attempts in this respect.
    Mika [10] represents a social tagging system as a tripartite graph, in which
the set of vertices can be partitioned into three disjoint sets A, C and I, corre-
sponding to the set of actors, the set of concepts and the set of objects being
tagged. A folksonomy is then deﬁned by a set of annotations T ⊆ A × C × I,
an element of which is a triple representing an actor assigning a concept to an
object being tagged.
    Gruber [5] proposes a “tag ontology” which formalizes the activity of tag-
ging through the use of an ontology. He suggests that tagging can be deﬁned
using a ﬁve-place relation: T agging(object, tag, tagger, source, [+/−]), with ob-
ject being the Web resources being tagged, tagger being the user who assigns
tags, source being the system from which this annotation originates, and [+/−]
representing either a positive or negative vote placed on this annotation by the
tagger. Newman [12] also developed a similar ontology for tagging. The act of
tagging is modelled as a relation T (Resource, T agging(T ag, Agent, T ime)).
    Hotho et al. [7] deﬁne a folksonomy as a tuple F := (U, T, R, Y, ≺). The
ﬁnite sets U , T and R correspond to the set of users, tags and resources respec-
tively. Y refers to the tag assignments, which are ternary relation between the
above three sets: Y ⊆ U × T × R. ≺ is a user-speciﬁc relation which deﬁnes the
sub/superordinate relations between tags. By dropping ≺, the folksonomy can
be reduced to a tripartite graph, which is equivalent to Mika’s model.


3   Mutual Contextualization in Folksonomies

The power of folksonomies lies in the interrelations between the three elements.
A tag is only a symbol if it is not assigned to some Web resources. A tag is also
ambiguous without a user’s own interpretation of its meaning. Similarly, a user,
though identiﬁed by its username, is characterized by the tags it uses and the
resources it tags. Finally, a document is given semantics because tags act as a


                    ESOE, Busan - Korea, November 2007                               111
      form of metadata annotation. Hence, it is obvious that each of these elements in
      a folksonomy would be meaningless, or at least ambiguous in meaning, if they are
      considered independently. In other words, the semantics of one element depends
      on the context given by the other two, or all, elements that are related to it.
          To further understand this kind of mutual contextualization, we examine each
      of the three elements in a folksonomy in detail. For more speciﬁc discussions, we
      assume that the Web resources involved are all Web documents. In addition, we
      deﬁne the data in a social tagging system, a folksonomy, as follows.
      Deﬁnition 1. A folksonomy F is a tuple F = (U, T, D, A), where U is a set of
      users, T is a set of tags, D is a set of Web documents, and A ⊆ U × T × D is
      a set of annotations.
      By adopting this deﬁnition, we are actually using the model described by Mika
      [10]. Since we are mainly focusing on the associations between the three elements
      and are obtaining data from a single social bookmarking site, information such
      as the time stamps and sources of tagging is irrelevant here. Thus, the deﬁnition
      we used here is a simple but suﬃcient one for our work presented here.
          As we have mentioned, the three elements forming the tripartite graph of a
      social tagging system are users, tags and documents (resources). The tripartite
      graph can be reduced into a bipartite graph if, for example, we focus on a
      particular tag and extract only the users and documents associated with it.
      Since there are three types of elements, there can be three diﬀerent types of
      bipartite graphs. This step is similar to the method introduced by Mika [10].
      However, we distinguish our method from that presented by Mika by focusing
      on only one instance of a type (e.g. tags), instead of all the items of the same
      type, allowing us to acquire more speciﬁc understanding of the semantics of the
      instance.

      3.1   Users
      By focusing on a single user u, we obtain a bipartite graph T Du deﬁned as
      follows:
                    T Du = T ∪ D, Etd , Etd = {{t, d}|(u, t, d) ∈ A}
      In other words, an edge exists between a tag and a document if the user has
      assigned the tag to the document. The graph can be represented in matrix form,
      which we denote as X = {xij }, xij = 1 if there is an edge connecting ti and dj .
      The bipartite graph represented by the matrix can be folded into two one-mode
      networks [10]. We denote one of them as P = XX , and another as R = X X.
          P represents a kind of semantic network which shows the associations be-
      tween diﬀerent tags. It should be note that this is unlike the lightweight ontology
      mentioned in [10], as it only involves tags used by a single user. In other words,
      this is the personal vocabulary, a personomy [7], of a particular user.
          The matrix R represents the personal repository of the user. Links between
      documents are weighted by the number of tags that have been assigned to both
      documents. Thus, documents having higher weights on the links between them
      are those that are considered by the particular user as more related.


112         International Workshop on Emergent Semantics and Ontology Evolution
3.2   Tags
By using a similar method as described above, we can obtain a bipartite graph
U Dt regarding to a particular tag t:

                U Dt = U ∪ D, Eud , Eud = {{u, d}|(u, t, d) ∈ A}

In words, an edge exists between a user and a document if the user has assigned
the tag t to the document. The graph can once again be represented in matrix
form, which we denote as Y = {yij }, yij = 1 if there is an edge connecting ui
and dj . This bipartite graph can be folded into two one-mode networks, which
we denote as S = YY  , and C = Y Y.
    The matrix S shows the aﬃliation between the users who have used the tag
t, weighted by the number of documents to which they have both assigned the
tag. Since a tag can be used to represent diﬀerent concepts (such as sf for San
Francisco or Science Fiction), and a document provides the necessary content
to identify the contextual meaning of the tag, this network is likely to connect
users who use the tag for the same meaning.
    C can be considered as another angle of viewing the issue of polysemous
or homonymous tags. Thus, with the edges weighted by the number of users
who have assigned tag t to both documents, this network is likely to connect
documents which are related to the same sense of the given tag.

3.3   Documents
Finally, a bipartite graph U Td can also be obtained by considering a particular
document d. The graph is deﬁned as follows:

                 U Td = U ∪ T, Eut , Eut = {{u, t}|(u, t, d) ∈ A}

In words, an edge exists between a user and a tag if the user has assigned the
tag to the document d. The graph can be represented in matrix form, which we
denote as Z = {zij }, zij = 1 if there is an edge connecting ui and tj . Like in the
cases of a single user and a single tag, this bipartite graph can be folded into
two one-mode networks, which we denote as M = ZZ , and V = Z Z.
    The matrix M represent a network in which users are connected based on
the documents commonly tagged by them. Since a document may provide more
than one kind of information, and users do not interpret the content from a single
perspective, the tags assigned by diﬀerent users will be diﬀerent, although tags
related to the main theme of the document are likely to be used by most users.
Hence, users linked to each other by edges of higher weights in this network
are more likely to share a common perspective, or are more likely to concern a
particular piece of information provided by the document.
    On the other hand, the matrix V represents a network in which tags are
connected and weighted by the number of users who have assigned them to the
document. Hence, the network is likely to reveal the diﬀerent perspective of the
users from which they interpret the content of the document.


                     ESOE, Busan - Korea, November 2007                                113
          We can see that diﬀerent relations between the users, the tags and the docu-
      ments in a folksonomy will aﬀect how a single user, tag or document is interpreted
      in the system. Each of these elements provide an appropriate context such that
      the semantics of the elements can be understood without ambiguity.


      4     Semantics of Ambiguous Tags

      One problem in the existing collaborative tagging system is the existence of
      ambiguous tags. By “ambiguous tags,” we refer to tags that are intended to
      represent diﬀerent concepts by the users. For example, in del.icio.us the tag sf
      has been used to describe documents which are related to science ﬁction and
      San Francisco. Another example is the tag opera, which are used for describing
      contents related to opera as a kind of musical performance as well as those
      related to the WWW browser which is named “Opera.”3
          As we have discussed, the semantics of a tag depends on the context given by
      the users who have used it as well as the documents being tagged. By studying
      the associations between the tag, the users and the documents, we may determine
      the diﬀerent meanings of a tag by placing it in the right context. As an illustrative
      example, we present an analysis of the bipartite graphs obtained from a single
      tag, which we have chosen for its common occurrence and multiple equally-
      frequent meanings in order to preserve the clarity of the example. In particular,
      we would like to ﬁnd out if it is possible to disambiguate a tag by studying its
      association with diﬀerent users and documents.


      4.1    Understanding a Single Tag

      In the experiment described below, we try to examine the networks of users and
      documents associated with the tag sf, and attempt to understand how diﬀerent
      interpretation of the tag can be discovered from the analysis of the networks.
          The reasons of choosing the tag sf as an illustrating example are twofold.
      Firstly, sf is a tag used very frequently by users in del.icio.us. Although the exact
      number of times that the tag has been used cannot be known from the system,
      we are able to collect over 5000 triples which involves the tag sf. Secondly, by
      observation, the tag sf has been used by users to refer to two very distinctive
      concepts, namely “science ﬁction” and “San Francisco.” We expect that users
      using the tag to refer to one of the two concepts do not use it to refer to the
      other one. Hence, the tag sf is more worthwhile to be examined, and we expect
      that experiments on the tag can produce clearer results for performing analysis.
          In March 2007, data was collected from the del.icio.us website by using a
      crawler program written in Python. The program retrieved pages listing all book-
      marks that have been tagged with sf, and subsequently retrieved the published
      RSS ﬁle of each bookmark to obtain the corresponding users and tags associated
      with it. In other words, the crawler retrieved bookmarks in del.icio.us which have
      3
          http://www.opera.com/


114          International Workshop on Emergent Semantics and Ontology Evolution
                   Fig. 1. A network of documents tagged by sf.


been tagged with sf, along with the users who tagged the page, and the tags,
including sf, they used. In total, 238,117 triples were obtained, each involving a
user, an URL of the bookmark, and a tag. A total of 427 distinctive URLs and
19979 users are involved. Out of these triples, 5852 involves the tag sf.
    We extract all those triples that involve the tag sf, and construct the matrix
Y, representing the associations between users and bookmarks (documents). We
then construct the matrices S = YY , corresponding to the network of users,
and C = Y Y, corresponding to the network of documents.
    The matrices S and C are feeded into the network analysis package Pajek
[3], and visualized as networks. Since some users do not have any associations
with other users, as in the case of documents, isolated nodes are removed from
the networks. The results are shown in Fig 1 and Fig 2. In Fig 1, nodes represent
documents, and two nodes are connected by an edge if a user has tagged both
documents with the tag sf. Edges are weighted by the number of such users,
and is not shown in the ﬁgure. In Fig 2, nodes represent users, and two nodes
are connected by an edge if both users have tagged a document with the tag
sf. Edges are weighted by the number of such documents. The networks are
visualized using the Kamada-Kawai layout algorithm [8] implemented in Pajek.
    Two large clusters of nodes can be observed in both of the networks in Fig 1
and Fig 2. However, as shown in the two ﬁgures, there are more connections
between the two clusters in the network of documents than in that of users. One
hypothesis that can be used to explain the existence of clusters in the network
of documents is that they correspond to groups of documents related to the
diﬀerent senses of the tag sf. A similar hypothesis that can be applied to the
network of users is that the diﬀerent clusters corresponds to groups of users who
have used the tag sf to represent diﬀerent concepts.
    Since documents are connected if a user tagged them with the tag sf, it
implies that connected documents are considered by the user as all related to
certain concept represented by the tag sf. In addition, if we assume that a user


                    ESOE, Busan - Korea, November 2007                               115
                       Fig. 2. The network of users who used the tag sf.


      would be consistent in using the same tag for the same concept, it is reasonable
      to suggest that documents in diﬀerent clusters would address a diﬀerent concept
      represented by the tag sf. As we understand through observation that two major
      concepts – “science ﬁction” and “San Francisco” are associated with the tag sf,
      we can further suggest that the two major clusters in the network correspond
      to documents on science ﬁction and San Francisco respectively. To testify this
      hypothesis, we perform further analysis on the tagging data.
          Firstly, we manually examine all the 357 websites represented by the nodes in
      the network of documents. We classify the websites into either related to science
      ﬁction or San Francisco, based on the content of the website as well as other tags
      used by the users. We indicate that the website cannot be classiﬁed into either
      of these categories if not enough information or evidence is available. After that,
      we combine the information with the original network, and use Pajek to draw a
      new network, as shown in Fig 3.
          In the ﬁgure, circular nodes represent documents related to science ﬁctions,
      and triangular nodes represent documents related to San Francisco. Documents
      that cannot be classiﬁed are represented by rectangular nodes. We can see that
      these two types of nodes are clearly grouped into two clusters. The result shows
      that the two clusters indeed correspond to two sets of documents related to two
      distinctive meaning of the tag sf.
          However, it is interesting to note that there are actually a lot of edges con-
      necting nodes from diﬀerent clusters. Since nodes are connected if a user tagged
      them with the tag sf, these connections imply that some users actually used the
      same tag to represent two distinctive concepts. This also explains why the two
      clusters in the network of users are connected by a few edges. The documents
      connected by edges between clusters in the network of documents are then re-
      sponsible for the edges connecting the users from diﬀerent clusters in the network
      of users. However, since it would be very diﬃcult to judge accurately whether


116         International Workshop on Emergent Semantics and Ontology Evolution
       Fig. 3. The network of documents tagged by sf with classiﬁed nodes.


a user always uses the tag sf to refer to science ﬁctions or San Francisco, we
refrain from performing a similar classiﬁcation of the users.

    To further investigate whether there are many users who actually used the tag
to refer to more than one concept, we construct one more network of documents.
Based on the data which generates Fig 3, we remove edges which has a weight
less than 2. By doing that we eﬀectively ignore all the edges which correspond
to cases in which only one user has used the tag sf on both of the documents
connected by an edge. We also remove nodes that are not connected to any other
nodes afterwards. The result is shown in Fig 4. it can be seen that there remains
only one edge which connects nodes across the two clusters.

    Finally, we examine how diﬀerent tags are associated with each other given
this set of documents and users. Since the documents are all tagged by the tag sf,
all the other tags can be considered to be related to it. Given the two distinctive
concepts represented by the tag, it is reasonable to hypothesize that the tags
related to it can also be divided into two groups, one being related to science
ﬁctions, and another to San Francisco. We construct a matrix T = {tij } to
represent the associations between the tags. tij is the number of times tagi and
tagj have been used on the same document. Since there are over 8000 unique
tags in the data, and many of them have been only used on a few documents,
we only concentrate on 35 tags which are used most frequently along with sf.
The associations between the tags are visualized in Fig 5. We can see that
tags which are related to San Francisco are grouped in one cluster while tags
related to science ﬁctions are grouped in another cluster. This suggests that we
can examine the related tags in order to obtain the diﬀerent meanings of an
ambiguous tag.


                     ESOE, Busan - Korea, November 2007                               117
      Fig. 4. The network of documents tagged by sf after removal of edges with weights
      less than two and unconnected nodes.


            Fig. 5. The network of 35 tags which are most frequently used along with sf.


      5     Discussions
      The experiment results show that by analyzing the tripartite graph of folksonomy
      and the relations between tags, users and documents, we can discover how tags
      are being used, and better understand the meanings of the tags which are used
      for multiple meanings. Hence, although the same tag can be used to represent
      diﬀerent concepts, the documents and the users still provide the context for
      understanding speciﬁc meanings of the tag. Given the above results, we come to
      understand more about the characteristics of folksonomies.

      5.1     Ambiguous Tags from Users’ Point of View
      Based on the facts that documents of similar topics are clustered together, and
      that documents are connected by users who have applied the tag sf, we see that


118           International Workshop on Emergent Semantics and Ontology Evolution
the majority of users use the tag to refer to one concept only. This is because if
users use the tag arbitrarily to refer to any of the two concepts, we would not be
able to observe two clusters in the network. Hence, although a tag can possess
several distinctive meanings, users tend to be consistent in referring to the same
meaning when they use the tag. One may also suggest that users interested in one
concept represented by the tag are not interested in the other, thus producing
the two clusters of documents. However, given that the diﬀerent senses of the
tags we examined do not actually have conﬂicts with each other, and that the
experiments actually involves quite a large number of users, it is more reasonable
to suggest that consistence in usage is the reason of the clear distinction that we
have observed. Hence, this shows that it is possible to understand whether a tag
has multiple senses by examining the associations between users and documents.

5.2   Existence of Sub-communities
In the experiment, in addition to the two large clusters of nodes, we can also
observe within the clusters that there are some nodes which tend to be grouped
with each other to form smaller clusters. For example, in Fig 3 on the left and
right ends of the clusters of triangular nodes, we can observe that some nodes are
more connected with each other than with the rest of the nodes. This is probably
because even if we consider all documents that are related to “San Francisco,”
there are still actually a wide range of documents related to diﬀerent aspects of
“San Francisco.” If we look at the network of tags, we can see that tags related
to “San Francisco” include food, travel and culture. Thus, these smaller clusters
probably correspond to documents with more speciﬁc topics. More analysis will
be performed in the future to verify this hypothesis.

5.3   Identifying the Topics of Documents
There are some documents (rectangular nodes in the network) which we cannot
classify them into either the category of “science ﬁction” or “San Francisco.”
This is because either the documents are only very loosely related to one of these
topics, or the tags associated with it are not indicative enough. However, as these
rectangular nodes are located in one of the clusters we have observed, it becomes
possible to judge, with high probability, the topics of these documents. Also,
folksonomies reﬂect the classiﬁcation scheme evolving from the collaborative
eﬀort of users. Hence, this judgement is not necessarily aligned with the intention
of the author of the document. Rather, by saying that a document is related to
a certain topic as judged by its location in the network, we are reﬂecting the
opinions of the users. Thus, by constructing and examining the networks of
documents, we are able to place the documents into the appropriate context,
allowing us to understand what it is about from the viewpoint of users.

5.4   Related Works
Research on folksonomies mainly focuses on relations between tags instead of
the semantics of individual tags. For example, Begelman et al. [2] propose an


                     ESOE, Busan - Korea, November 2007                               119
      automatic tag clustering algorithm to tackle the problem of synonyms. A more
      comprehensive method proposed by [14] is able to discover four diﬀerent kinds
      of relations – relevant, conﬂicting, synonymous and unrelated – between tags.
      Mika [10] proposes to generate lightweight ontologies which are more meaning-
      ful by examining tag relations in the social context instead of studying their
      co-occurrences in documents. One piece of work which is closely related to topic
      presented here is that by Wu et al. [18], in which the authors investigate how
      emergent semantics can be derived from folksonomies. They employ statistical
      analysis on folksonomies, and study the conditional probabilities of tags in dif-
      ferent conceptual dimensions. Tags with multiple meanings will then score high
      in more than one dimensions in the conceptual space. However, one limitation of
      their method is that the number of dimensions must be determined beforehand.


      6   Conclusions and Future Work

      Our study shows that mutual contextualization does occur among the three basic
      elements in a folksonomy, and that it is possible to acquire a better understanding
      of the semantics of ambiguous tags by constructing and studying the networks
      of documents and users associated with the tag.
           Currently, many research works focus on how tagging data in folksonomies
      can be utilized to provide other services, such as identifying user interests, recom-
      mending relevant documents or constructing light-weight ontologies. However,
      all these applications require a better understanding of the semantics of tags
      in order to provide accurate and useful results. For example, it would not be
      wise to match users based on the tags they used without knowing that tags may
      possess diﬀerent meanings. Hence, the work presented here can be considered as
      a ﬁrst step to acquire a better understanding of folksonomies.
           However, challenge remains in that while we can identify diﬀerent groups of
      users and documents which correspond to diﬀerent usage of an ambiguous tag,
      we still need other methods to integrate these diﬀerent pieces of information
      to acquire the full picture. For example, how can we know, without examining
      every documents, which groups of users and documents are associated with a
      particular sense of a tag? This will be further investigated in our future work.
           Speciﬁcally, in the future we will apply our method on other ambiguous tags
      to observe its performance. We hope to gain more insight on how to devise some
      automatic algorithms to perform tag meaning disambiguation. We will also study
      diﬀerent methods of hierarchical clustering or community-discovering algorithms
      [4, 11], and investigate how these techniques can be applied to discover clusters
      of documents and users. It is hope that, by further examining the tags associated
      with diﬀerent clusters, we can discover the diﬀerent senses of a tag, probably by
      examining the tags being used most frequently in the clusters. Finally, we will
      extend our study to users as well as documents, and investigate how analysis
      on tripartite graphs can help discover useful information such as communities of
      users or clusters of documents with similar topics, which will be very useful in
      applications such as Web page recommendation or social network analysis.


120         International Workshop on Emergent Semantics and Ontology Evolution
References
 1. Mathes Adam.         Folksonomies - cooperative classiﬁcation and communica-
    tion through shared metadata. http://www.adammathes.com/academic/computer-
    mediated-communication/folksonomies.html, 2004.
 2. Grigory Begelman, Philipp Keller, and Frank Smadja. Automated tag clustering:
    Improving search and exploration in the tag space. In Collaborative Web Tagging
    Workshop at WWW2006, Edinburgh, Scotland, 2006.
 3. Wouter de Nooy, Andrej Mrvar, and Vladimir Batagelj. Exploratory Social Net-
    work Analysis with Pajek (Structural Analysis in the Social Sciences). Cambridge
    University Press, January 2005.
 4. Michelle Girvan and M. E. J. Newman. Community structure in social and bio-
    logical networks. PROC.NATL.ACAD.SCI.USA, 99:7821, 2002.
 5. Thomas Gruber. Ontology of folksonomy: A mash-up of apples and oranges.
    http://tomgruber.org/writing/mtsr05-ontology-of-folksonomy.htm, 2005.
 6. T. Hammond, T. Hannay, B. Lund, and J. Scott. Social bookmarking tools (i): A
    general review. D-Lib Magazine, 11(4), April 2005.
 7. Andreas Hotho, Robert Jäschke, Christoph Schmitz, and Gerd Stumme. Infor-
    mation retrieval in folksonomies: Search and ranking. In York Sure and John
    Domingue, editors, The Semantic Web: Research and Applications, volume 4011
    of Lecture Notes in Computer Science, pages 411–426. Springer, June 2006.
 8. T. Kamada and S. Kawai. An algorithm for drawing general undirected graphs.
    Inf. Process. Lett., 31(1):7–15, 1989.
 9. Cameron Marlow, Mor Naaman, Danah Boyd, and Marc Davis. Ht06, tagging
    paper, taxonomy, ﬂickr, academic article, to read. In HYPERTEXT ’06: Proceed-
    ings of the seventeenth conference on Hypertext and hypermedia, pages 31–40, New
    York, NY, USA, 2006.
10. Peter Mika. Ontologies are us: A uniﬁed model of social networks and semantics.
    In International Semantic Web Conference, pages 522–536, 2005.
11. M.E.J. Newman. Analysis of weighted networks. Physical Review E, 70:056131,
    2004.
12. Richard Newman. Tag ontology design. http://www.holygoat.co.uk/projects/tags/,
    2004.
13. S. Niwa, Takuo Doi, and S. Honiden. Web page recommender system based on
    folksonomy mining for itng’06 submissions. In ITNG 2006. Third International
    Conference on Information Technology: New Generations, pages 388–393, 2006.
14. Satoshi Niwa, Takuo Doi, and Shinichi Honiden. Folksonomy tag organization
    method based on the tripartite graph analysis. In IJCAI Workshop on Semantic
    Web for Collaborative Knowledge Acquisition, January 2007.
15. Emanuele Quintarelli. Folksonomies: power to the people. ISKO Italy-UniMIB
    meeting, June 2005.
16. G.      Smith.               Atomiq:    Folksonomy:      Social     classiﬁcation.
    http://atomiq.org/archives/2004/08/folksonomy social classiﬁcation.html, 2004.
17. Harris Wu, Mohammad Zubair, and Kurt Maly. Harvesting social knowledge from
    folksonomies. In HYPERTEXT ’06: Proceedings of the seventeenth conference on
    Hypertext and hypermedia, pages 111–114, New York, NY, USA, 2006. ACM Press.
18. Xian Wu, Lei Zhang, and Yong Yu. Exploring social annotations for the semantic
    web. In WWW ’06: Proceedings of the 15th international conference on World
    Wide Web, pages 417–426, New York, NY, USA, 2006. ACM Press.


                     ESOE, Busan - Korea, November 2007                                  121

</pre>