<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>S. and B. A. Huberman. Usage Patterns of
Collaborative Tagging Systems. Journal of Information
Science</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Semantic Cloud: An Enhanced Browsing Interface for Exploring Resources in Folksonomy Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hidir Aras</string-name>
          <email>aras@tzi.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sandra Siegel</string-name>
          <email>siegel@tzi.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rainer Malaka</string-name>
          <email>malaka@tzi.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Research Group Digital Media, TZI, University of Bremen</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>[10] Manning</institution>
          ,
          <addr-line>C.D., Raghavan</addr-line>
          ,
          <institution>P. and H. Schütze. Introduction to Information Retrieval. Cambridge University Press</institution>
          ,
          <addr-line>Cambridge, 2008</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>[11] Mathes, A. (2004). Folksonomies - Cooperative Classification and Communication through Shared Metadata, Technical report, University of Illinois</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <volume>12</volume>
      <fpage>4</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>Popular Web folksonomy systems such as Delicious or Flickr allow users to create web contents and annotate them with a set of freely chosen keywords (tags) in order to organize them for later retrieval. Unfortunately, existing user interfaces of folksonomy systems have limited browsing capabilities and do not exploit tag semantics sufficiently for browsing linked data. In this paper, we present Semantic Cloud, an approach for exploring data in folksonomy systems based on a hierarchical semantic representation of the tag-space, which is obtained by analyzing folksonomy data.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Tags</kwd>
        <kwd>Folksonomies</kwd>
        <kwd>Tag Clouds</kwd>
        <kwd>Browsing Interface</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>In recent years, the development of ‘Web 2.0’ or ‘Social
Web’ applications has led to an increase in user
participation on the World Wide Web as users themselves
are now able to easily create and share contents. Services
such as Flickr and YouTube enjoy great popularity for
uploading and presenting photos or videos and social
bookmarking tools like Delicious have facilitated saving
and sharing of website references online. Along with the
increasing amount of user-generated content on the Web, a
new form of manual content classification – social tagging
– has been established, which is directly performed by
users in order to organize their contents for later retrieval
by either themselves or others. Tags can be freely chosen
from users’ own vocabulary and thus, in contrast to
predefined taxonomies or ontologies, the folksonomy, i.e.
the classification vocabulary in folksonomy systems,
emerges automatically in the process of annotating and
classifying [17].</p>
      <p>Although many users – especially in social bookmarking
systems – create and annotate mainly for their own
purpose, they still produce a collective value, thus, helping
other users to retrieve and browse user-generated contents.
For searching and exploring content, users can enter search
tags directly into traditional search interfaces as well as
select them from dedicated interface elements such as tag
clouds. In tag clouds visual features such as font size, font
weight and intensity are utilized in order to visualize the tag
space. Usually a small selection of the most often used tags
within the system is displayed. This rough overview is
particularly helpful for unspecific retrieval tasks and serves
as a starting point for browsing, when users have no initial
appropriate tag to start searching or browsing. Using the tag
cloud implies less cognitive and physical workload than
thinking of a search tag that defines the thematic field one
likes to explore and entering it into the search field [15].
After having found an initial tag and associated resources
users can start browsing using the interlinked structure of
resources and tags or make use of related tag lists offered
by most folksonomy systems. In context with these
browsing structures, 'serendipity’ is a term often used [11],
referring to possible unexpected findings during browsing
tags.</p>
      <p>However, determined and structured ways of exploration
are hardly provided and user interfaces of folksonomy
systems often fail to sufficiently support users in finding
appropriate search tags and creating efficient queries for
discovering interesting contents. For users, it is difficult to
gain a full impression of tags used in the overall system or
within their field of interest as tag clouds as well as related
tags cover only a very small subset of popular tags.
Furthermore, users are often confronted with general
semantic problems of folksonomies [5, 7], e.g. different
spelling or lexical forms, homonymous or polysemous tags,
“basic level problem”, etc. leading to incomplete or
unexpected results. Given this uncontrolled nature of tags,
it might seem difficult to solve these problems. But
folksonomies hold inherent semantic structures which can
be extracted and used by means of tag co-occurrence
analysis and clustering. In this context, various approaches
have already been researched and presented. However,
there are only few works applying them to concrete user
interfaces for folksonomy systems.</p>
      <p>In this paper, we present a similarity-based browsing
interface for enhanced exploration in folksonomy systems.
We enhance tag-cloud based user interfaces by
semantically arranging related tags that are determined by
co-occurrence analysis of folksonomy data and applying
hierarchical clustering for multiple topic cloud exploration.
Our prototype was evaluated in a short-term user study and
yields promising results.</p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
      <p>Besides the usual alphabetic and random arrangements of
tags in tag clouds semantic arrangements have been studied
by researchers recently. Schrammel et al. [14] describe a
series of experiments of clustered presentation approaches
indicating that semantically clustered tag clouds can
provide improvements over other layouts in specific search
tasks. Task-related performance for visual exploration and
tag cloud perception was also assessed by Lohmann et al.
[13]. The results showed differences in task performance
for different layouts, leading to the conclusion that
interface designers should carefully select the appropriate
tag cloud layout according to the expected user goals.
Hassan-Montero and Herrero-Solana [8] choose a layout
similar to common tag clouds but each row of the tag cloud
includes tags of a different main topic field. Fujimura et al.
[3] present an approach for creating overviews of large
scale tag sets by mapping them on a scrollable topographic
image, where central tags are located in ‘highest’ regions
and related more special tags are placed around in ‘lower’
regions. Hoare and Sorensen [9] describe an information
foraging tool based on 2-dimensional proximity-based
visualizations. Their layout technique is based on a
graphtheoretic force-directed visualization.</p>
      <p>Approaches that extract relatedness of tags directly from
folksonomy data are usually based on the assumption that
tags are related when they occur in a similar context. Tag
relatedness has to be determined statistically on this broad
basis of data by considering the whole set of annotations.
Mika [12] describes how the original graph representing the
complete annotation structure can be transformed in order
to obtain a tag co-occurrence graph including the
cooccurrence counts for each pair of tags. Several approaches
have been proposed that adapt or extend absolute
cooccurrence in order to obtain more balanced results of tag
similarity by calculating relative co-occurrence and using
different types of metrics [1, 2, 8].</p>
      <p>Grahl et al. [6] as well as Gemmell et al. [4] present
algorithms for establishing hierarchical structures from
folksonomies that can provide a basis for more structured
browsing or personalized navigation, respectively. Specia
and Motta [16] apply a non-exclusive agglomerative
clustering method in order to map groups of tags onto
ontological concepts. Further work makes use of a divisive
k-means algorithm [8] in order to provide a semantically
ordered tag cloud or suggest clustering the tag space using
graph-based clustering, splitting the co-occurrence graph
where the edges are weakest [1].</p>
      <p>In our work, we avoid typical limitations and problems in
exploring tagged data by enhancing and integrating existing
interface concepts and applying proposed approaches for
extracting semantics to a concrete browsing interface. A
hierarchical semantic arrangement of tags is used for
supporting the creation of complex queries, i.e. queries that
can be built from tags at different levels of the tag
hierarchy. Finally, our interface provides an integrated UI
approach for query refinement and results for
complementing missing interactivity support in existing
systems.</p>
    </sec>
    <sec id="sec-3">
      <title>3. SEMANTIC CLOUD</title>
    </sec>
    <sec id="sec-4">
      <title>3.1 Overview</title>
      <p>Semantic Cloud (SC) integrates and enhances main
concepts of current interfaces in one single interaction
structure: tag clouds as a means of initial orientation within
the overall tag space, related tags as a way of browsing
related items and refining queries and manual tag search for
specific information needs. For avoiding the typical
problems and limitations of classic folksonomy retrieval
interfaces, SC is based on a semantic arrangement of tags,
i.e. similar tags are physically located near to each other in
contrast to the alphabetic or random arrangement of classic
tag clouds. Once a user has found a tag which seems
interesting to her, she can easily find other potentially
interesting tags by scanning neighboring tags. Secondly,
instead of one single limited tag cloud, an extensive
structure of multiple topic clouds is proposed which can be
explored hierarchically. This way, users can get a fast
overview of topics in a small representative overview tag
cloud but retrieve more focused tag clouds with a higher
semantic density on demand for creating more specific
queries. The classic ‘related tags’ list is therefore directly
integrated into the tag cloud. Finally, the interface showing
tag cloud and results at the same time allows for
simultaneously composing queries from the tag cloud,
consulting results and refining the query afterwards.
Additionally, query tags can be added by manual input.</p>
    </sec>
    <sec id="sec-5">
      <title>3.2 Exploiting Tag Semantics</title>
      <p>SC is internally based on a semantic representation of a
folksonomy tag space that contains information about the
semantic relatedness of tags needed for a semantic
arrangement of tags within the tag cloud, as well as the
hierarchical structure of thematic groups of tags. It can be
acquired by analyzing a representative set of annotation
data, i.e. calculating tag similarity and clustering tags.</p>
      <sec id="sec-5-1">
        <title>3.2.1. Data Sample</title>
        <p>For our user study we extracted a sample set of Delicious
bookmarking data consisting of 870,500 annotation triples
on 119,817 distinct URLs with 42,373 distinct tags. In
order to obtain characteristic structures and relations, the
data set was initially filtered by deleting rarely used tags
and not representative annotations. Potentially
representative annotations can be identified easily in a
collaborative tagging system such as Delicious by
consulting the frequency of how often a specific tag was
used for annotating a specific resource. Rarely used tags
may be meaningful only to some users while often used
tags can be assumed to be commonly agreed on as a fitting
description for that resource.</p>
      </sec>
      <sec id="sec-5-2">
        <title>3.2.2 Tag Similarity Analysis</title>
        <p>In order to determine the semantic relatedness of tags, we
calculate the normalized co-occurrence for each pair of tags
using the cosine similarity metric applied to tags being
vectors defined as follows: Tag ti is an n-dimensional vector
with n being the number of total distinct tags in the data set
and ti[k] being the absolute co-occurrence count of ti and tk,
i.e. the number of resources tagged with both tags. The
relatedness of two tags can then be computed as follows:
Within an assessment of different similarity metrics
including absolute co-occurrence (1), the
Jaccardcoefficient (2) and a cosine metric applied to tag vectors (3)
featuring their occurrences on resources, the chosen metric
(4) turned out to produce most appropriate results, i.e. most
effectively identifies tags as related. This counts especially
for tags with a small number of occurrences in the data set.
This is due to the fact, that the metric not only considers
directly co-occurring tags but also their context [16]. It can
therefore also identify relations between tags which do not
occur together on a document of the (not necessarily
complete) data set used for analysis.</p>
        <p>Metric 1
food cooking
recipe baking
blog dessert
reference
vegetarian
howto blogs
foodblog health
chocolate
nutrition
vegan bread
diet cake diy
tips cookies
vegetables
soup fun
breakfast
Rating: 18</p>
        <p>Metric 2
recipe cooking
food baking
dessert
vegetarian
foodblog
chocolate vegan
bread nutrition
health diet cake
cookies
vegetables soup
breakfast
desserts
pumpkin
foodblogs cheese
blogs chicken
dinner
Rating: 24</p>
        <p>Metric 3
recipe cooking
food gourmet
foodblog
cookbooks drink
recipies drinks
alcohol
vegetarian
kitchen vegan
baking useful
meals dessert
bread cook
healthy
reference veg
chocolate
cupcakes blog</p>
        <p>Metric 4
recipe cooking
food baking
breakfast
dessert appetizer
vegetarian
casserole cheese
bread pie pasta
sauce dinner
soup foodblog
salad dough
mexican beef
tofu crockpot
desserts beans
Rating: 24</p>
        <p>Rating: 26
The evaluation in order to determine the most appropriate
metric (Table 1 and 2) for the analysis of the data set was
based on a manual inspection of fifteen random sample
tags, which stem from different thematic fields and have
different frequencies of use in the overall annotation set.
The rating was based on whether the found tags were
semantically or lexically similar or strongly related (2
points), whether they were related (1 point) or whether the
indicated relationship was only accidental or very general
(0 points).</p>
        <p>Metric 1</p>
        <p>Metric 2</p>
        <p>Metric 3</p>
        <p>Metric 4</p>
      </sec>
      <sec id="sec-5-3">
        <title>3.2.3. Clustering</title>
        <p>Given the similarity values for all pairs of tags, the tag
space can be clustered into thematic groups, i.e. groups of
highly related tags. For this task, we use an agglomerative
hierarchical clustering algorithm, similar to the approach
used by Gemmell et al. [4]: Starting with each tag being a
single cluster, in each iteration, the most similar clusters are
joined until there is only one cluster left. Inter-cluster
similarity is calculated with the centroid method, which
computes the average similarity between every tag in the
first cluster with every tag in the second cluster [10].
Representing the clustering process in a binary tree, any
structure of clusters and sub-clusters can be obtained, as the
tree can be cut according to a minimal threshold of cluster
similarity or a maximum number of clusters. Figure 1
depicts an example of a hierarchical structure, which could
be reasonably split into 4 top-level clusters below the grey
line. The resulting clusters can again be decomposed into a
set of sub-clusters.</p>
        <p>This algorithm was chosen as it has several advantages
over other known clustering algorithms. On the one hand, it
works unsupervised and without any prerequisites, i.e. it is
neither necessary to give a selection of desired topics
(initial cluster-centers) nor to define a final number of
clusters. On the other hand, the result is very flexible and
leads to the hierarchical structure of clusters and
subclusters needed for the interface: A first level of clusters
forms the high-level topics, each represented by their most
popular tags in the representative overview tag cloud of the
interface; their sub-clusters form the lower levels of the
cloud, which can be consolidated hierarchically. In our
work, the selection of clusters and sub-clusters, i.e. the
determination of cutting thresholds, from the clustered tree
was performed manually to extract most reasonable
semantic groups for evaluating the interface concept. An
automatic approach is yet to be developed.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>3.3 The Semantic Cloud User Interface</title>
      <p>The User Interface of SC (Figure 2) is logically divided
into three areas, which stay visible all the time: tag cloud,
results and tools (adjustment, reset/back buttons etc.). The
tag cloud area initially displays a first overview of most
popular tags. Due to each topic, i.e. top level cluster, being
represented in equal measure, this overview is more
balanced than in traditional tag clouds, where tags are
chosen by absolute popularity. Topics are divided spatially
and by color into different semantic regions. As usual,
variation in font size indicates the popularity of the tags. An
internal semantic arrangement of tags is achieved using
graph visualization with force directed layout based on the
described similarity metric. Zooming into topics, i.e.
viewing sub-clusters can be carried out by clicking on a
magnifier icon which is placed in the center of each
semantic region in case this region contains further
subregions. So, once a primary field of interest was found via
the most general tags, a specific thematic field can be
brought into focus by obtaining a new semantic tag cloud
with a higher semantic density and more specific tags. The
hierarchy of semantic tag clouds can include several levels
(Figure 3). Tags can be selected either by clicking on them
within the respective semantic region or by using the
manual tag input field for further refinement or a new
search.</p>
      <p>Furthermore, tags can be selected from the results list,
which displays a set of popular tags with each found
resource. All selected tags are highlighted in the tag cloud
(if available in the currently displayed cloud) and
furthermore appear in a compact list to the right of the
cloud. They can be deselected directly within the cloud as
well as in the list. An additional option provided in the list
is (re)locating a specific query tag within the cloud by
using the magnifying glass besides each tag. This can be
particularly helpful when users have entered a query tag
themselves and directly want to consult and select related
tags without browsing the tag cloud hierarchy manually.
Basically, queries are composed from the selected tags by
applying the AND operator. Whenever the query selection
is changed, the result list is updated. Hence, users are able
to dynamically remove or replace tags while getting
immediate feedback for their actions. They can consult
results immediately and adjust their query if results are not
yet appropriate. They can change focus of search at any
time by replacing tags for related tags.</p>
    </sec>
    <sec id="sec-7">
      <title>4. EVALUATION</title>
    </sec>
    <sec id="sec-8">
      <title>4.1 Test design and user study</title>
      <p>In order to evaluate the concepts behind SC we conducted a
user study (based on the previously described data set)
comprising 9 participants (2 female, 7 male) aged between
22 and 33. Having a computer science background, all
participants were secure in using a computer and Web
browser. While the traditional tag cloud concept was
wellknown to all of them, none of them regularly used
browsing structures of folksonomy systems for finding
contents. We used the Delicious user interface as a baseline
for the evaluation, i.e. tested Delicious vs. SC. A set of
three tasks was assigned to the participants, which had to
be solved first by using the Delicious interface and then the
alternative approach SC. This setup (“within-subjects
testing”) was chosen for first creating a basic common
understanding on current browsing interfaces and
afterwards letting users judge about both interfaces in
comparison based on their impressions from the tests. The
tasks were chosen to simulate an undetermined browsing
scenario: Users were first asked to look for any website
they would find interesting and afterwards – more
specifically – for a website presenting any interesting
‘cooking recipe’ and any website dealing with ‘music’
respectively1. Afterwards users were asked to assess both
interfaces regarding three usability criteria on a five-point
Likert scale 2 : whether it was easy to understand the
interface (Q1), whether the system was supportive in
solving the test tasks (Q2) and whether it was pleasant to
use the system (Q3). In a fourth question (Q4), participants
should state whether they would use the interfaces for in
real life scenarios, i.e. whether they would estimate it
useful. This rating was statistically analyzed and finally
used to draw a conclusion, if the SC user interface concept
is a significant enhancement compared to the standard user
interface structures of folksonomy systems. For
understanding possible interface order effects besides
asking users to ‘think aloud’, they were also interviewed
for their reasons while rating the interfaces.</p>
    </sec>
    <sec id="sec-9">
      <title>4.2 Results</title>
      <p>For analyzing the answers of the final questionnaire, we
calculated the mean (μ ) and standard deviation (σ) for each
question and system. Moreover, the paired student’s t-test
was applied in order to test the statistic significance
comparing Delicious and SC: For each question, the null
hypothesis predicated that the mean rating for both
interfaces was equal and differences only due to chance. It
was rejected for a probability lower than 0.05, which was
the case for question 2, 3 and 4. Only in case of question 1
the null hypothesis was not rejected, thus, the differences
are not significant. All in all, the empirical results (overall
average scores) indicate enhanced support and user
1 Subjects are chosen such that they are comparably present in the tag
clouds of both user interfaces to ensure an adequate starting point.
2 http://en.wikipedia.org/wiki/Likert_scale
experience of the new interface (μ =4.16, σ=0.825, for
Delicious: μ =2.94, σ=0.94). More expressive explanations
why the systems were rated in a particular way could be
inferred from the comments of participants. Basically, both
interfaces were assessed easy to understand and no major
problems occurred during testing. However, the thinking
aloud protocol revealed limitations of classic interfaces as
expected. Users criticized the limited number of related
tags which forced them to enter tags manually in order to
refine their queries. Regarding the SC Interface, the users
stated that the interface was more supportive since
providing more tags and respective related tags to select
from. Also, the breakdown of topics was estimated useful
as well as the possibility to edit queries all the time. For
Q3, users stated, that SC was visually more attractive and
transparent due to use of color and spatial semantic
arrangement.</p>
    </sec>
    <sec id="sec-10">
      <title>4.3 Discussion</title>
      <p>For a full practical deployment of the concept, there are
still some problems that need to be resolved. This primarily
concerns the clustering method, which has to be enhanced
in order to be executed fully automatically. A manual
selection of clusters became necessary to achieve a
satisfying result for evaluating the user interface concept.
Furthermore, in future research cross-topic exploration
needs to be enhanced. The current concept is limited in this
regard as a query covering two topics (e.g. travel and
photography) has to be either set up by exploring two
semantic clouds one after the other or by entering tags
manually. Here, it would be beneficial to either have very
general tags displayed within every cluster using a
nonexclusive clustering method or to develop an approach for
simultaneously exploring multiple different topics. A
nonexclusive clustering approach would also be beneficial in
case of fuzzy cluster borders, where tags relate to different
topics in the cloud. Moreover, users that participated in the
evaluation suggested several ideas for improvement,
ranging from small extensions, e.g. additional information
on results, towards larger challenges like including a more
extensive set of tags ‘behind the scenes’.</p>
    </sec>
    <sec id="sec-11">
      <title>5. CONCLUSIONS</title>
      <p>In this paper, a new user interface approach called Semantic
Cloud3 was presented. It allows users to explore the tag
space of a folksonomy system within a hierarchical
structure of semantically arranged tag clouds representing
different topics and their subtopics. This way, users are
able to gain a fast impression of general topics as well as
detailed insights into the tag space of their special field of
interest, i.e. finding appropriate search tags. Observations,
user comments and questionnaire answers in a user study
indicate that users were more satisfied in using Semantic
Cloud, than existing wide-spread user interface concepts
for folksonomy systems.</p>
    </sec>
    <sec id="sec-12">
      <title>REFERENCES</title>
      <p>[1] Begelman, G.; Keller, P., Smadja, F. Automated Tag</p>
      <sec id="sec-12-1">
        <title>Clustering: Improving search and exploration in the tag</title>
        <p>space. In: Proceedings of the Collaborative Web Tagging
Workshop at the WWW 2006.
[9] Hoare, C. and Sorensen, H. Information Foraging with a</p>
      </sec>
      <sec id="sec-12-2">
        <title>Proximity-Based Browsing Tool. Artif. Intell. Rev. 24, Nov</title>
        <p>2005.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>