<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Piloted Search and Recommendation with Social Tag Cloud-Based Navigation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cédric Mesnage</string-name>
          <email>cedric.mesnage@usi.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark Carman</string-name>
          <email>mark.carman@usi.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Informatics, University of Lugano</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We investigate the generation of tag clouds using Bayesian models and test the hypothesis that social network information is better than overall popularity for ranking new and relevant information. We propose three tag cloud generation models based on popularity, topics and social structure. We conducted two user evaluations to compare the models for search and recommendation of music with social network data gathered from "Last.fm". Our survey shows that search with tag clouds is not practical whereas recommendation is promising. We report statistical results and compare the performance of the models in generating tag clouds that lead users to discover songs that they liked and were new to them. We nd statistically signi cant evidence at 5% con dence level that the topic and social models outperform the popular model.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>We investigate mechanisms to explore social network
information. Our current focus is to use contextual tag clouds
as a mean to navigate through the data and control a
recommendation system.</p>
      <p>Figure 1 shows the screen of the Web application we
developed to evaluate our models. The goal is to nd the
displayed track using the tag cloud. The tag cloud is
generated according to a randomly selected model and the current
query. Participants in the evaluation can add terms to the
query by clicking on tags which generates a new tag cloud
and changes the list of results. Once the track is found, the
user clicks on its title and goes to the next task.</p>
      <p>Figure 2 shows the principle of our controlled
recommendation experiment. The participant sees a tag cloud, by
clicking a tag she is recommended with a song. Once the
song is rated, a new tag cloud is given according to the
previously selected tags.</p>
      <p>This paper is structured as follows. We rst discuss
related work in the area of tag cloud-based navigation. We
then detail models for generating context-aware tag clouds
WOMRAD 2010 Workshop on Music Recommendation and Discovery,
colocated with ACM RecSys 2010 (Barcelona, SPAIN)
Copyright c . This is an open-access article distributed under the terms
of the Creative Commons Attribution License 3.0 Unported, which permits
unrestricted use, distribution, and reproduction in any medium, provided
the original author and source are credited.
2.
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Social Tagging and its Motivations</title>
      <p>
        Research in social tagging is relatively recent with the rst
tagging applications appearing in the late nineties [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The
system called Webtagger relied on a proxy to enable users
to share bookmarks and assign tags to them. The approach
was novel compared to storing bookmarks in the browser's
folder in the sense that bookmarks were shared and belonged
to multiple categories (instead of being placed in a single
folder). The creators argued that hierarchical browsing was
tedious and frustrating when information is nested several
layers deep.
      </p>
      <p>
        By 2004, social tagging had reached a point where it was
becoming more and more popular, initially on bookmarking
sites like Delicious and then later on social media sharing
sites such as Flickr and Youtube. Research in social tagging
started with Hammond [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] who gave an overview of social
bookmarking tools and was continued by Golder et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
who provided the rst analysis of tagging as a process using
tag data from Delicious. They showed that tag data
follows a power law distribution, gave a taxonomy of tagging
incentives, and looked at the convergence of tag
descriptions over time for resources on Delicious. The paper lead
to the rst workshop on tagging [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], where papers mainly
discussed tagging incentives, tagging applications (in
museums and enterprises), tag recommendation and knowledge
extraction. Following this workshop, research in tagging has
spread in various already established areas namely in Web
search, social dynamics, the Semantic Web, information
retrieval, human computer interaction and data mining.
      </p>
      <p>
        Sen et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] examine factors that in uence the way
people choose tags and the degree to which community
members share a vocabulary. The three factors they focus on are
personal tendency, community in uence and the tag
selection algorithm (used to recommend tags). Their study
focuses on the MovieLens system that consists of user reviews
for movies. They categorize tags into three categories:
factual, subjective and personal. They then divided users of
the system into four groups each with a di erent user
interface: the unshared group didn't see any community tags; the
shared group saw random tags from their group; the popular
group saw the most popular tags; and the recommendation
group used a recommendation algorithm (that selected tags
most commonly applied to the target movie and to
similar movies). They nd that habit and investment in uence
the users' tag applications, while the community in uences
a user's personal vocabulary. The shared group produced
more subjective tags, while the popular and
recommendation group produced more factual tags. The authors also
conducted a user survey in which they asked users whether
they thought tagging was useful for di erent tasks:
selfexpression (50%), organizing (44%), learning (23%), nding
(27%), and decision support (21%).
      </p>
      <p>
        Marlow et al. [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ] de ne a taxonomy of design aspects
of tagging systems that in uence the content and
usefulness of tags, namely tagging rights (who can tag), tagging
support (suggestion algorithms), aggregation model (bag or
set), resource type (web pages, images, etc.), source of
content (participants, Web, etc.), resource connectivity (linked
or not), and social connectivity (linked or not). They also
propose aspects of user incentives expressing the di erent
motivations for tagging: future retrieval, contribution and
sharing, attracting attention, playing and competition, self
presentation, opinion expression.
      </p>
      <p>
        Cattuto et al. [
        <xref ref-type="bibr" rid="ref1 ref2">2, 1</xref>
        ] perform an empirical study of tag
data from Delicious and nd that the distribution of tags
over time follows a power law distribution. More speci
cally they nd that the frequency of tags obeys a Zipf's law
which is characteristic of self-organized communication
systems and is commonly observed in natural language data.
They reproduced the phenomenon using a stochastic model,
leading to a model of user behavior in collaborative tagging
systems.
2.2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Browsing with Tags</title>
      <p>
        Fokker et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] present a tool to navigate Wikipedia
using tag clouds. Their approach enables the user to select
di erent views on the tag cloud, such as recent tags,
popular tags, personal tags or friends tags. They display related
tags when the user \mouses over" a tag in the cloud. They do
not, however, generate new contextually relevant tag clouds
when the user clicks on a tag.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], Millen et al. investigate browsing behavior in
their Dogear social bookmarking application. The
application allows users to browse other peoples' bookmark
collections by clicking on their username. They nd that most
browsing activity of the web site is done through
exploring peoples' bookmarks and then tags. They compare the
10 most browsed tags with 10 most used tags applied and
nd that there is a strong correlation. While their
ndings do not show that tagging improves social navigation in
general, they do show that browsing tags helps users to
navigate the bookmark collections of others. Following on from
this, Ishikawa et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] studied the navigation e ciency
when browsing other users' bookmarks. The idea is to
decide which user to browse rst in order to discover faster
the desired information. While relevant to tag-based
navigation, this study does not deal with the problem of how
best to rank tags in order to improve cloud-based navigation
in general.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], Li et al. propose various algorithms to browse
social annotations in a more e cient way. They extract
hierarchies from clusters and propose to browse social
annotations in a hierarchical manner. They also propose a way
to browse tags based on time. As discussed by Keller et al.
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] a single taxonomy is not necessarily the best way to
navigate a corpus, however.
      </p>
      <p>
        A more comprehensive study was performed by Sinclair
et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] to examine the usefulness of tag clouds for
information seeking. They asked participants to perform
information seeking tasks on a folksonomy like dataset, providing
them with an interface consisting of a tag cloud and a search
box. The folksonomy was created by the same participants
who were asked to tag ten articles at the beginning of the
study, leading to a small scale folksonomy. The tag cloud
displayed 70 terms in alphabetical order with varying font
size proportional to the log of its frequency. The authors
give the following equation for the font size:
      </p>
      <p>T agSize = 1 + C</p>
      <p>
        log(fi
log(fmax
fmin + 1)
fmin + 1)
(1)
where C corresponds to the maximum font desired, fi to
the frequency of the tag to be displayed, fmin and fmax
to the minimum and maximum frequencies of the displayed
tags. Clicking on a tag in the cloud brings the user to a
new page listing articles annotated with that tag and a new
tag cloud of co-occurring tags. Clicking again on a tag
restricts the list to the articles tagged with both tags and so
on. The search is based on a TF-IDF ranking. Participants
were asked 10 questions about the articles and then to tell
if they preferred using the search box or the tag cloud and
why. They found that the tag cloud performed better when
people are asked general questions, for information-seeking,
people preferred to use the search box. They conclude that
the tag cloud is better for browsing, enhancing
serendipity. The participants commented that the search box allows
for more speci c queries. While similar to our study on tag
cloud-based navigation, the work of Sinclair et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] di ers
in a number of important ways: (i) Their aim was to
compare tag-based navigation directly with search, while ours is
to compare di erent tag cloud generation methods, based on
social network information and topic modeling techniques.
(ii) In their study the folksonomy was generated by the
participants and is quite small as result, while we rely on an
external folksonomy for which scaling becomes an
important issue.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Hassan-Montero et al. propose an improvement
to tag clouds by ordering the tags according to similarity
rather than alphabetically. They use the Jaccard coe cient
to measure similarity between tags, which is the ratio
between the number of resources in which the two tags both
occur and the number in which either one occurs. If D(w)
denotes all resources (documents) annotated with tag (word)
w, then the similarity is given by:
(2)
RC(w1; w2) = jD(w1) \ D(w2)j
      </p>
      <p>jD(w1) [ D(w2)j</p>
      <p>The authors then de ne an additional metric to select
which tags to display in each cloud (so as to maximize the
number of resources \covered by the cloud"). Their method
provided, however, little improvement on the coverage of the
selected tags. The tag cloud layout is based on the similarity
coe cient. The authors also do not provide a user evaluation
of the tag cloud generated.</p>
      <p>
        Kaser et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] propose a di erent algorithm for
displaying tag clouds. Their methods concern how to produce
HTML in various situations. They also give an algorithm
to display tags in nested tables. They do not provide an
evaluation regarding the usefulness of the new visual
representations.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], Sen et al. investigate the question tag quality.
Tagging systems must often select a subset of available tags
to display to users due to limited screen space. Knowing
the quality of tags helps in writing a tag selection algorithm.
They conduct a study on the MovieLens movie reviews
system, adding to the interface di erent mechanisms for users
to rate the quality of tags. All tags can not be rated,
therefore they look for ways of predicting tag quality, based on
aggregate user behavior, on a user's own ratings and on
aggregate users' ratings. They nd that tag selection methods
that normalize by user, such as the numbers of users who
applied a tag, perform the best.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Heymann et al. investigate the social tag
prediction problem, the purpose of which is to predict future tags
for a particular resource. The ability to predict tag
applications can lead to various enhancements, such as increased
recall, inter-user agreement, tag disambiguation,
bootstrapping and system suggestion. They collected tag data from
Delicious and fetched the web pages for each bookmark.
They analyze two methods: The rst applies only when the
bookmarked items are web pages (and not images, songs,
videos, etc.). They develop an entropy based metric which
measures how much a tag is predictable. They then extract
association rules based on tag co-occurrence and give
measurements of their interest and con dence. They nd that
many tags do not contribute substantial additional
information beyond page text, anchor text and surrounding hosts.
Therefore this extra information are good tag predictors. In
the case of using only tags, predictability is related to
generality in the sense that the more information is known about
a tag (i.e. the more popular it is), the more predictable it
is. They add that these measures could be used by system
designers to improve system suggestion or tag browsing.
      </p>
      <p>
        Ramage et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] compare two methods to cluster web
pages using tag data. Their goal is to see whether
tagging data can be used to improve web document
clustering. This work is based on the clustering hypothesis from
information retrieval, that \the associations between
documents convey information about the relevance of documents
to requests". The document clusters are used to solve the
problem of query ambiguity by including di erent clusters
in search results.
      </p>
      <p>All of the above mentioned work di ers from our current
study of tag cloud-based navigation in the following ways:
(i) Previous studies have investigated the usefulness of tag
clouds primarily from the basic visualization rather than
the navigation standpoint. (ii) Those studies explicitly
investigating tag cloud based navigation, have concentrated
on simple algorithms for generating tag clouds. (iii)
Previous studies investigating more sophisticated algorithms for
tag prediction have evaluated those algorithms by assessing
prediction accuracy on held-out data rather than \in situ"
evaluation with real users for a particular application (tag
cloud based navigation).
3.</p>
    </sec>
    <sec id="sec-4">
      <title>TAG CLOUD BASED NAVIGATION</title>
      <p>In this section we describe algorithms for generating
contextaware tag clouds and query results list for tag cloud based
navigation. Generating a tag cloud simply involves
selecting the one hundred tags which are the most probable (to
be clicked on by the user) given the current context (query).
Estimating which terms are most probable depends on the
model used as we discuss below.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Generating Context Aware Tag Clouds</title>
      <p>We now investigate three di erent models for generating
context-aware tag clouds. For each model we describe rst
how an initial context-independent cloud is generated. We
then describe how the context dependent cloud is generated
in such a way as to take the current query (context) tags
into account.
3.1.1</p>
      <p>Popularity based Cloud Generation Model</p>
      <p>The rst and simplest tag cloud generation model is based
on the popularity of the tags across all documents in the
corpus. We rst describe a query independent tag cloud,
which can be used as the initial cloud for popularity based
navigation.</p>
      <p>Ranking tags by popularity on the home page gives users
a global access point to the most proli c sections of the
portal. The most popular tags are reachable from the popular
tag cloud and displayed with a font size proportional to the
amount of activity on that tag. A measure of the popularity
of a tag across the corpus is given in the following:
p(w) =</p>
      <p>Pd2D Nw;d</p>
      <p>Pd2D Nd
where Nw;d is the count of occurrences of tag w for resource
(document) d and Nd = Pw2V Nw;d is the total count for
the document.</p>
      <p>We can now compute a context sensitive version of the
popular tag cloud quite simply as follows:
p(wjQ) =</p>
      <p>Pd2D(Q) Nw;d
P
d2D(Q) Nd
Where D(Q) = [w2QD(w) is the union of all resources that
have been tagged with words from the query Q.
3.1.2</p>
      <p>Social Network Structure based Cloud
Generation Model</p>
      <p>We are interested in taking advantage of additional
information contained in the social network of users (friendships)
in order to improve the quality of the tag cloud. We assume
that the friends of a user are likely to share similar interests
and thus we can use the tag description of a user's friends
to smooth the tag description of the user.</p>
      <p>We calculate an entry (context independent) social tag
cloud as follows:
p(w) = X</p>
      <p>X
u2U u02f(u)</p>
      <p>Nw;u0
Pw2W Nw;u0
where f (u) is the set of friends of user u and U denotes the
set of all users in the social network.</p>
      <p>We apply a slightly di erent derivation to calculate the
context dependent social tag cloud. We estimate the
probability p(wjw0) given the context tag w0. These probabilities
are precomputed and combined depending on the query at
run time. We hypothesize that users who are friends on a
social tagging website are likely to have similar interests (likes
&amp; dislikes) and that we can use the social network structure
to improve contextual tag cloud generation. We can
leverage the social network (by marginalizing out the user u) as
follows:
p(wjw0) =</p>
      <p>X p(w; ujw0)
u2U
u2U
=</p>
      <p>X p(wju) p(w0ju)p(u)
p(w0)
Calculating p(w0) and p(u) = Nu= Pu02U Nu0 is
straightforward. We compute p(wju) by summing over tag counts
Nw;u0 for users in the social network of the user u:
p(wju) =</p>
      <p>Pu02f(u) Nw;u0
P</p>
      <p>u02f(u) Nu0</p>
      <p>Note that since the summation in Equation 7 over all users
involves a very large computation, we perform the
summation only over the top 200 users as ranked according to the
frequency p(wju).
3.1.3</p>
      <p>
        Topic Model based Cloud Generation Model
Another way to smooth the relative term frequency
estimates and thereby improve the quality of the tag clouds
generated is to rely on latent topic modeling techniques [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Using these techniques we can extract semantic topics
representing user tagging behavior (aka user interests) from a
matrix of relationships between tags and people. Topic
models are term probability distributions over documents (in this
case users) that are often used to represent text corpora. We
apply a commonly used topic modeling technique called
latent Dirichlet allocation (LDA) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to extract 100 topics by
considering people as documents (and tags as their content).
      </p>
      <p>The entry (context independent) tag cloud based on topic
modeling is de ned as follows:
p(w) =</p>
      <p>X p(wjz)p(z)
z2Z
Where p(wjz) denotes the probability of the tag w to belong
to (being generated by) topic z, its value is given as an
output of the LDA algorithm. p(z) is the relative frequency
of the topic z across all users in the corpus.</p>
      <p>To compute the context aware tag cloud based on topic
modeling, we simply marginalize over topics (instead of users):
(3)
(4)
(6)
(7)
(8)
(5)
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Ranking Resources</title>
      <p>p(wjw0) =</p>
      <p>X p(wjz)p(zjw0)
z2Z
z2Z
=</p>
      <p>X p(wjz)p(w0jz)p(z)
p(w0)</p>
      <p>
        We follow a standard Language Modeling [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] approach to
ranking resources (documents) according to a query. Thus
we rank resources according to the likelihood that they would
be generated by the query, namely the probability p(djQ),
where d is a resource and Q the query as a set of tags. We
give here the derivation of p(djQ) by applying Bayes' rule.
p(djQ) =
p(Qjd)p(d)
p(Q)
For ranking we can drop the normalization by p(Q) as it is
the same for each resource d, which gives us:
      </p>
      <p>score(djQ) = p(Qjd)p(d)
We apply the naive Bayes assumption and consider the words
in the query to be independent given the document d. Thus
p(Qjd) factorizes into the product of word probabilities p(wjd):
p(d) Y p(wjd)</p>
      <p>w2Q
X log p(wjd)
w2Q
(9)
(10)
(11)
(12)
(13)
(14)
(15)
score(djQ) = p(Qjd)p(d)
score(djQ) =ranking log p(d) +
This product is equivalent in terms of ranking to the sum of
the corresponding log probabilities. Thus we compute the
score for a particular tag as follows :
Computing p(d) is straightforward, we can either use the
length of the tag description of the resource d or the uniform
distribution p(d) = 1=D where D is the count of documents
in the corpus.</p>
      <p>For the browsing experiment, the log probabilities within
the summation are exponentially weighted so as to give
preference to the most recently clicked tags, as follows:
jQj
browsing score(djQ) = log p(d) + X
i=1
i 1 log p(wijd) (16)</p>
      <p>Here wi denotes the ith most recent term in the query Q,
and is a decay parameter set to 0:8 in our experiments.
3.3</p>
    </sec>
    <sec id="sec-7">
      <title>Precomputation</title>
      <p>For each model we precompute the values for p(wjw0)
which gives us three matrices of relations between tags. At
run time we rank the tags to generate a contextual tag cloud
according to a query of multiple tags as follows:
p(wjQ) = log p(w) +</p>
      <p>X log p(wjw0)
w02Q
(17)</p>
      <sec id="sec-7-1">
        <title>In our experiments we set the parameter to 0:5.</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>EMPIRICAL SETUP</title>
      <p>We choose "Last.fm" to fetch our experimental dataset.
"Last.fm" is a music sharing online social network which
allows one to get social network data and tagging data from
their application programming interface (API). To our
knowledge it is the only network which enables researchers to fetch
the friends of any user in the system. Fetching the social
network is essential for experiments with social tag clouds.</p>
      <p>We gather tag data by crawling users via their friend
relationships. Once a new user is fetched, we download her
own tags and then recursively fetch her friends and so on.
We start by fetching the network of the author. In order
to get a complete subset of the social network of "Last.fm",
we apply a breadth rst search by exploring recursively the
relations of each user. Once we have a substantial subset
of the social network and tags, we fetch the tracks assigned
to the tags. For each tag fetched, we get the 50 top tracks
annotated with this tag.</p>
      <sec id="sec-8-1">
        <title>Table</title>
      </sec>
      <sec id="sec-8-2">
        <title>People</title>
        <p>Friends
Tags
Tracks
Usages
Tag applications</p>
        <p>Size</p>
        <p>Once the data is fetched by the ruby scripts via the "Last.fm"
Web API, we migrate it to a MySQL database for
processing. We precompute various tables to store data that will
be used multiple times in the calculations. For instance we
compute the term frequency of each tag, the term frequency
for each tag and each user, the frequency of the friends of
a user for a tag. From these tables we can then compute
similarity tables between the probability of one tag given
another for each model which corresponds to p(wjw0), we
do this only for the tags used by at least 5 people which
accounts for about twenty thousand tags.
5.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>EVALUATION</title>
      <p>We built a web application to evaluate our models in a
user study. We conducted a pilot study where tag clouds are
used to search tracks, a user survey and a follow-up study
with the search task and a browsing task where participants
used the tag cloud to pilot a recommendation system. We
nd statistically signi cant evidence that the topic model
and the social model perform better to generate tag clouds
that lead to recommend songs that were liked and unknown
by the participants than our base line, the popular model.
5.1</p>
    </sec>
    <sec id="sec-10">
      <title>Pilot Study</title>
      <p>The pilot study took place at the university of Lugano. We
gathered 17 participants from our Bachelor, Master and Phd
programs. Participants registered on an online form before
the evaluation. They were asked to ll in an entry form and
an exit form to answer general questions. The participants
are asked to perform 20 tasks in which they must nd a
particular track. Tracks are selected randomly from a pool
of the 200 most popular tracks. The tag generation method
is also selected randomly for each task.</p>
      <p>The evaluation is designed as a within subject study. Each
participant is her own control group as a model is randomly
selected for each task and the participant is not directly
informed of which model is used. Each action of the
participants are stored in a log in the database.</p>
      <p>Most participants had fun during the experiment.
Probably listening to the music and discovering new music helps
with this fun aspect and keeps the participants motivated.
A participant noticed that quickly he was selecting popular
tags and quickly browsing for the \red link" to stop the task.
This technique had him nish with the second place, we
believe the rst nishing participant had the same technique
and was rejecting tasks faster if he couldn't nd it with
popular tags. From the comments given, a participant gives as
advantages \you don't have to think about the search terms,
you can just pick one", another one adds \relief from typing".
It seems to be the major advantage of tag navigation, it is
hard for a person to come up with search terms from the
vocabulary he has in mind, whereas when presented with a
vocabulary, it is simple for him to choose what terms to use.
Multiple participants think it would be simpler for them to
type search keywords when they know before hand what
terms they would use rather than browsing the tag cloud
to nd the term they are looking for. Again it seems tag
clouds are good to help remembering terms and when the
participant does not know what terms to use, but in the case
the participant has knowledge of what he is looking for it is
easier for her to type. A participant note \if a tag is not in
the list, I can not use it. Free search would be better from
this point of view".</p>
      <p>Some participants mentioned as an advantage \discovering
new music". Probably the evaluation process by itself makes
the participant discover new music by selecting randomly a
track from the 100 most popular tracks. Also people
discovered new music by reading the list of tracks when they
clicked on tags. A participant mentioned that he would like
a tag cloud to navigate pages from his browsing history in his
web browser. A tag cloud would help remembering topics
he has seen in his browsing life.</p>
      <sec id="sec-10-1">
        <title>Model Popular Topic Social</title>
      </sec>
      <sec id="sec-10-2">
        <title>Started 132 131 158</title>
        <p>A total of 302 tasks were completed and 101 were rejected.
Each time a new task is given the model used to generate
the tag cloud is selected randomly from the three models
available. 94 tasks were completed for the popular tag cloud
and 94 as well for the tag cloud based on topic models. The
tag cloud based on social network lead to 116 completed
tasks. Participants completed more often tasks involving
the social tag cloud rather than the two other tag clouds.
Table 2 summarises the number of started and completed
tasks and gives the relative frequency in percentage for each
model. The relative frequency of completed tasks regarding
the number of started tasks for each model is similar.</p>
        <p>Figures 4 and 5 give an overview of the results. Figure 4
represents the relative frequency, the number of tasks
completed with that number of tags clicked relative to the total
number of clicks for each model. We see that most of the
tasks were completed after the rst click. The tracks to nd
were selected from the top 100 popular tracks in our dataset.
These tracks have a high probability of containing a popular
tag.</p>
        <p>We have graphed the data to show di erences in the
distribution of click-counts (navigation path lengths) and time
to completion (time to nd a song). On average, the time
taken to complete a task is slightly shorter for topic-based
tag clouds than the popular one (390 seconds against 400
seconds) and a bit better for the social based tag cloud (320
seconds against 400 seconds). While the distributions do
vary slightly: the topic based model appears to have slightly
lower navigation path lengths, and time to success values,
the di erences are minimal and the results are not
considered conclusive nor statistically signi cant.
5.2</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>User survey</title>
      <p>We conducted a short user survey together with the pilot
study. Table 3 gives the statements that were asked to be
ranked on a likert scale. Figure 6 represents the answers of
the participants for each question.</p>
      <p>The answers to question 1 clearly shows that our users are
heavy internet users which you would expect when
conducting a survey in a computer science faculty. Eleven
participants mostly disagree with statement 4 and 8 with
statement 3 which are both statements about the usage of tagging
systems, which shows that tagging is still a feature that is
not broadly used by people even in a computer science
department. Answers to statements 5 to 9 are inconclusive,
participants are mostly undecided. No participant strongly
disagree with statement 8 but only 5 mostly agree, nding
items by navigating a tag cloud is a hard task for a human
which shows that improvements regarding searchability are
needed. Eight participants agree with statements 10 and 11
and 9 with statement 12. These three statements are about
using the tag cloud to navigate various resources.</p>
      <p>Most participants nd it easy to navigate the tag cloud
and would use a tag cloud to navigate the Web or their
personal les. Eight participants out of 17 agree with the
13th statement, 13 mostly agree. This con rms the fact that
tag-based navigation improves discovery of new resources.
5.3</p>
    </sec>
    <sec id="sec-12">
      <title>Follow-up study</title>
      <p>We conducted a second study for which we adapted the
system based on the comments we received in the pilot study.
We improved the e ciency of the system by precomputing
term relational matrices (p(wjw0)). For this evaluation we
had 20 participants. None of the participants nished the
evaluation, since the search task was harder than in the pilot
study. Less results were given per query which forced people
to use more precise queries.</p>
      <p>Results in Table 4 show our social model slightly
outperforming the popular and topic models. The results are not
statistically signi cant.</p>
      <p>To complete the tasks participants used multiple tags in
their queries, a total of 54 for the popular model, 66 for the
topic model and 68 for the social model. This suggests that
the social model proposes tags that are more closely related
to each other and therefore enables the user to make longer
queries.
5.4</p>
    </sec>
    <sec id="sec-13">
      <title>Experimenting with recommendation</title>
      <p>The recommendation experiment consisted of tasks in which
participants had to select a tag from the tag cloud and then
listen to a song recommended from the current query (the
query being composed of the tags selected so far),
participants would rate the song (whether they like it or not) and
then go back to the new tag cloud generated according to
the query and the model.</p>
      <sec id="sec-13-1">
        <title>Model Popular Topic Social</title>
        <p>If we look at the relative frequencies of songs that were
new to the participants within the songs that they liked,
we nd that the popular model is the least e cient,
intuitively popular items are liked and already known, which is
why they are popular because so many people know them.
Table 6 shows that the topic model is the best model
followed closely by the social model, both models outperform
quite signi cantly the popular model. These results support
our thesis that using social relationships enhances the
recommendation of new and relevant information. The topic
model performs better than the social model, we believe that
once the social model is personnalized, i.e. uses the actual
social network of the participant instead of an overall
probability from a social network, the social model would perform
even better.</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>Our work has some limitations, the number of participants
of the pilot study and follow-up study is relatively small (17
and 20 participants) which does not allow us to draw strong
conclusions. We focused our attention on only one dataset
from "Last.fm" with online music data, the conclusions can
not be generalised to tag cloud based navigation of other
corpora.</p>
      <p>Our survey shows that search is not practical with tag
clouds whereas recommendation and discovery of new
information is. Our follow-up study shows that in the case of
recommendation of items that people liked and were new to
them, the topic and social models perform much better than
the popularity model.
6.1</p>
    </sec>
    <sec id="sec-15">
      <title>Future Work</title>
      <p>We are working on a new evaluation methodology to
leverage the social model with social network data from the
participants. The rest of the evaluation works as the one
described in this paper. We believe that this personalized
social model will outperform the topic model.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cattuto</surname>
          </string-name>
          .
          <article-title>Semiotic dynamics in online social communities</article-title>
          .
          <source>The European Physical Journal C - Particles and Fields</source>
          ,
          <volume>46</volume>
          (
          <issue>0</issue>
          ):
          <volume>33</volume>
          {37, aug
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cattuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Loreto</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Pietronero</surname>
          </string-name>
          . From the Cover:
          <article-title>Semiotic dynamics and collaborative tagging</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          ,
          <volume>104</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1461</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P. R. Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Schutze</surname>
          </string-name>
          . Introduction to Information Retrieval. Cambridge University Press,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fokker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pouwelse</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Buntine</surname>
          </string-name>
          .
          <article-title>Tag-Based Navigation for Peer-to-Peer Wikipedia</article-title>
          . In Collaborative Web Tagging Workshop at WWW2006, Edinburgh,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Golder</surname>
          </string-name>
          and
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Huberman</surname>
          </string-name>
          .
          <article-title>The structure of collaborative tagging systems</article-title>
          .
          <source>Journal of Information Science</source>
          ,
          <volume>32</volume>
          (
          <issue>2</issue>
          ):
          <volume>198</volume>
          {
          <fpage>208</fpage>
          ,
          <string-name>
            <surname>April</surname>
          </string-name>
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>T. L.</surname>
          </string-name>
          <article-title>Gri ths and M. Steyvers. Finding scienti c topics</article-title>
          .
          <source>Proceedings of the National Academy of Sciences of the United States of America</source>
          ,
          <volume>101</volume>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hammond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hannay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lund</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. Scott. Social</given-names>
            <surname>Bookmarking Tools (I). D-Lib</surname>
          </string-name>
          <string-name>
            <surname>Magazine</surname>
          </string-name>
          ,
          <volume>11</volume>
          (
          <issue>4</issue>
          ):
          <volume>1082</volume>
          {
          <fpage>9873</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hassan-Montero</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Herrero-Solana</surname>
          </string-name>
          .
          <article-title>Improving tag-clouds as visual information retrieval interfaces</article-title>
          .
          <source>In InScit2006: International Conference on Multidisciplinary Information Sciences and Technologies</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Heymann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ramage</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          .
          <article-title>Social tag prediction</article-title>
          .
          <source>In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>531</volume>
          {
          <fpage>538</fpage>
          , New York, NY, USA,
          <year>2008</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ishikawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Klaisubun</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Honma</surname>
          </string-name>
          .
          <article-title>Navigation e ciency of social bookmarking service</article-title>
          . pages
          <volume>280</volume>
          {
          <fpage>283</fpage>
          ,
          <string-name>
            <surname>Nov</surname>
          </string-name>
          .
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Kaser</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Lemire</surname>
          </string-name>
          .
          <article-title>Tag-cloud drawing: Algorithms for cloud visualization</article-title>
          .
          <source>In Tagging and Metadata for Social Information Organization Workshop</source>
          , WWW07,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>R. M. Keller</surname>
            ,
            <given-names>S. R.</given-names>
          </string-name>
          <string-name>
            <surname>Wolfe</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J. L.</given-names>
          </string-name>
          <string-name>
            <surname>Rabinowitz</surname>
            , and
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Mathe</surname>
          </string-name>
          .
          <article-title>A bookmarking service for organizing and sharing urls</article-title>
          .
          <source>In Selected papers from the sixth international conference on World Wide Web</source>
          , pages
          <volume>1103</volume>
          {
          <fpage>1114</fpage>
          ,
          <string-name>
            <surname>Essex</surname>
          </string-name>
          , UK,
          <year>1997</year>
          . Elsevier Science Publishers Ltd.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Su</surname>
          </string-name>
          .
          <article-title>Towards e ective browsing of large scale social annotations</article-title>
          .
          <source>In WWW '07: Proceedings of the 16th international conference on World Wide Web</source>
          , pages
          <volume>943</volume>
          {
          <fpage>952</fpage>
          , New York, NY, USA,
          <year>2007</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Marlow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Boyd</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Davis</surname>
          </string-name>
          . Ht06, tagging paper, taxonomy, ickr, academic article, to read.
          <source>In HYPERTEXT '06: Proceedings of the seventeenth conference on Hypertext and hypermedia</source>
          , pages
          <volume>31</volume>
          {
          <fpage>40</fpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Marlow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Boyd</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Davis</surname>
          </string-name>
          . Position Paper, Tagging, Taxonomy, Flickr, Article, ToRead. In Collaborative Web Tagging Workshop at WWW2006, Edinburgh, Scotland, May
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Millen</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Feinberg</surname>
          </string-name>
          .
          <article-title>Using social tagging to improve social navigation</article-title>
          .
          <source>In Workshop on the Social Navigation and Community based Adaptation Technologies</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ramage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Heymann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          .
          <article-title>Clustering the tagged web</article-title>
          .
          <source>In Second ACM International Conference on Web Search and Data Mining (WSDM</source>
          <year>2009</year>
          ),
          <year>November 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Harper</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>LaPitz, and</article-title>
          <string-name>
            <surname>J. Riedl.</surname>
          </string-name>
          <article-title>The quest for quality tags</article-title>
          .
          <source>In GROUP '07: Proceedings of the 2007 international ACM conference on Supporting group work</source>
          , pages
          <volume>361</volume>
          {
          <fpage>370</fpage>
          , New York, NY, USA,
          <year>2007</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Lam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cosley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Frankowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Osterhouse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Harper</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          . tagging, communities, vocabulary, evolution.
          <source>In CSCW '06: Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work</source>
          , pages
          <volume>181</volume>
          {
          <fpage>190</fpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sinclair</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Cardew-Hall</surname>
          </string-name>
          .
          <article-title>The folksonomy tag cloud: when is it useful? J. Inf</article-title>
          . Sci.,
          <volume>34</volume>
          (
          <issue>1</issue>
          ):
          <volume>15</volume>
          {
          <fpage>29</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>F.</given-names>
            <surname>Smadja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tomkins</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Golder</surname>
          </string-name>
          .
          <article-title>Collaborative web tagging workshop</article-title>
          . In WWW2006, Edinburgh, Scotland,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>