1. INTRODUCTION

Piloted Search and Recommendation with Social Tag Cloud-Based Navigation

Cédric Mesnage

cedric.mesnage@usi.ch 0

Mark Carman

mark.carman@usi.ch 0 0 Faculty of Informatics, University of Lugano

We investigate the generation of tag clouds using Bayesian models and test the hypothesis that social network information is better than overall popularity for ranking new and relevant information. We propose three tag cloud generation models based on popularity, topics and social structure. We conducted two user evaluations to compare the models for search and recommendation of music with social network data gathered from "Last.fm". Our survey shows that search with tag clouds is not practical whereas recommendation is promising. We report statistical results and compare the performance of the models in generating tag clouds that lead users to discover songs that they liked and were new to them. We nd statistically signi cant evidence at 5% con dence level that the topic and social models outperform the popular model.

1. INTRODUCTION

We investigate mechanisms to explore social network information. Our current focus is to use contextual tag clouds as a mean to navigate through the data and control a recommendation system.

Figure 1 shows the screen of the Web application we developed to evaluate our models. The goal is to nd the displayed track using the tag cloud. The tag cloud is generated according to a randomly selected model and the current query. Participants in the evaluation can add terms to the query by clicking on tags which generates a new tag cloud and changes the list of results. Once the track is found, the user clicks on its title and goes to the next task.

Figure 2 shows the principle of our controlled recommendation experiment. The participant sees a tag cloud, by clicking a tag she is recommended with a song. Once the song is rated, a new tag cloud is given according to the previously selected tags.

This paper is structured as follows. We rst discuss related work in the area of tag cloud-based navigation. We then detail models for generating context-aware tag clouds WOMRAD 2010 Workshop on Music Recommendation and Discovery, colocated with ACM RecSys 2010 (Barcelona, SPAIN) Copyright c . This is an open-access article distributed under the terms of the Creative Commons Attribution License 3.0 Unported, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 2. 2.1

Social Tagging and its Motivations

Research in social tagging is relatively recent with the rst tagging applications appearing in the late nineties [ 12 ]. The system called Webtagger relied on a proxy to enable users to share bookmarks and assign tags to them. The approach was novel compared to storing bookmarks in the browser's folder in the sense that bookmarks were shared and belonged to multiple categories (instead of being placed in a single folder). The creators argued that hierarchical browsing was tedious and frustrating when information is nested several layers deep.

By 2004, social tagging had reached a point where it was becoming more and more popular, initially on bookmarking sites like Delicious and then later on social media sharing sites such as Flickr and Youtube. Research in social tagging started with Hammond [ 7 ] who gave an overview of social bookmarking tools and was continued by Golder et al. [ 5 ] who provided the rst analysis of tagging as a process using tag data from Delicious. They showed that tag data follows a power law distribution, gave a taxonomy of tagging incentives, and looked at the convergence of tag descriptions over time for resources on Delicious. The paper lead to the rst workshop on tagging [ 21 ], where papers mainly discussed tagging incentives, tagging applications (in museums and enterprises), tag recommendation and knowledge extraction. Following this workshop, research in tagging has spread in various already established areas namely in Web search, social dynamics, the Semantic Web, information retrieval, human computer interaction and data mining.

Sen et al. [ 19 ] examine factors that in uence the way people choose tags and the degree to which community members share a vocabulary. The three factors they focus on are personal tendency, community in uence and the tag selection algorithm (used to recommend tags). Their study focuses on the MovieLens system that consists of user reviews for movies. They categorize tags into three categories: factual, subjective and personal. They then divided users of the system into four groups each with a di erent user interface: the unshared group didn't see any community tags; the shared group saw random tags from their group; the popular group saw the most popular tags; and the recommendation group used a recommendation algorithm (that selected tags most commonly applied to the target movie and to similar movies). They nd that habit and investment in uence the users' tag applications, while the community in uences a user's personal vocabulary. The shared group produced more subjective tags, while the popular and recommendation group produced more factual tags. The authors also conducted a user survey in which they asked users whether they thought tagging was useful for di erent tasks: selfexpression (50%), organizing (44%), learning (23%), nding (27%), and decision support (21%).

Marlow et al. [ 14, 15 ] de ne a taxonomy of design aspects of tagging systems that in uence the content and usefulness of tags, namely tagging rights (who can tag), tagging support (suggestion algorithms), aggregation model (bag or set), resource type (web pages, images, etc.), source of content (participants, Web, etc.), resource connectivity (linked or not), and social connectivity (linked or not). They also propose aspects of user incentives expressing the di erent motivations for tagging: future retrieval, contribution and sharing, attracting attention, playing and competition, self presentation, opinion expression.

Cattuto et al. [ 2, 1 ] perform an empirical study of tag data from Delicious and nd that the distribution of tags over time follows a power law distribution. More speci cally they nd that the frequency of tags obeys a Zipf's law which is characteristic of self-organized communication systems and is commonly observed in natural language data. They reproduced the phenomenon using a stochastic model, leading to a model of user behavior in collaborative tagging systems. 2.2

Browsing with Tags

Fokker et al. [ 4 ] present a tool to navigate Wikipedia using tag clouds. Their approach enables the user to select di erent views on the tag cloud, such as recent tags, popular tags, personal tags or friends tags. They display related tags when the user \mouses over" a tag in the cloud. They do not, however, generate new contextually relevant tag clouds when the user clicks on a tag.

In [ 16 ], Millen et al. investigate browsing behavior in their Dogear social bookmarking application. The application allows users to browse other peoples' bookmark collections by clicking on their username. They nd that most browsing activity of the web site is done through exploring peoples' bookmarks and then tags. They compare the 10 most browsed tags with 10 most used tags applied and nd that there is a strong correlation. While their ndings do not show that tagging improves social navigation in general, they do show that browsing tags helps users to navigate the bookmark collections of others. Following on from this, Ishikawa et al. [ 10 ] studied the navigation e ciency when browsing other users' bookmarks. The idea is to decide which user to browse rst in order to discover faster the desired information. While relevant to tag-based navigation, this study does not deal with the problem of how best to rank tags in order to improve cloud-based navigation in general.

In [ 13 ], Li et al. propose various algorithms to browse social annotations in a more e cient way. They extract hierarchies from clusters and propose to browse social annotations in a hierarchical manner. They also propose a way to browse tags based on time. As discussed by Keller et al. [ 12 ] a single taxonomy is not necessarily the best way to navigate a corpus, however.

A more comprehensive study was performed by Sinclair et al. [ 20 ] to examine the usefulness of tag clouds for information seeking. They asked participants to perform information seeking tasks on a folksonomy like dataset, providing them with an interface consisting of a tag cloud and a search box. The folksonomy was created by the same participants who were asked to tag ten articles at the beginning of the study, leading to a small scale folksonomy. The tag cloud displayed 70 terms in alphabetical order with varying font size proportional to the log of its frequency. The authors give the following equation for the font size:

T agSize = 1 + C

log(fi log(fmax fmin + 1) fmin + 1) (1) where C corresponds to the maximum font desired, fi to the frequency of the tag to be displayed, fmin and fmax to the minimum and maximum frequencies of the displayed tags. Clicking on a tag in the cloud brings the user to a new page listing articles annotated with that tag and a new tag cloud of co-occurring tags. Clicking again on a tag restricts the list to the articles tagged with both tags and so on. The search is based on a TF-IDF ranking. Participants were asked 10 questions about the articles and then to tell if they preferred using the search box or the tag cloud and why. They found that the tag cloud performed better when people are asked general questions, for information-seeking, people preferred to use the search box. They conclude that the tag cloud is better for browsing, enhancing serendipity. The participants commented that the search box allows for more speci c queries. While similar to our study on tag cloud-based navigation, the work of Sinclair et al. [ 20 ] di ers in a number of important ways: (i) Their aim was to compare tag-based navigation directly with search, while ours is to compare di erent tag cloud generation methods, based on social network information and topic modeling techniques. (ii) In their study the folksonomy was generated by the participants and is quite small as result, while we rely on an external folksonomy for which scaling becomes an important issue.

In [ 8 ], Hassan-Montero et al. propose an improvement to tag clouds by ordering the tags according to similarity rather than alphabetically. They use the Jaccard coe cient to measure similarity between tags, which is the ratio between the number of resources in which the two tags both occur and the number in which either one occurs. If D(w) denotes all resources (documents) annotated with tag (word) w, then the similarity is given by: (2) RC(w1; w2) = jD(w1) \ D(w2)j

jD(w1) [ D(w2)j

The authors then de ne an additional metric to select which tags to display in each cloud (so as to maximize the number of resources \covered by the cloud"). Their method provided, however, little improvement on the coverage of the selected tags. The tag cloud layout is based on the similarity coe cient. The authors also do not provide a user evaluation of the tag cloud generated.

Kaser et al. [ 11 ] propose a di erent algorithm for displaying tag clouds. Their methods concern how to produce HTML in various situations. They also give an algorithm to display tags in nested tables. They do not provide an evaluation regarding the usefulness of the new visual representations.

In [ 18 ], Sen et al. investigate the question tag quality. Tagging systems must often select a subset of available tags to display to users due to limited screen space. Knowing the quality of tags helps in writing a tag selection algorithm. They conduct a study on the MovieLens movie reviews system, adding to the interface di erent mechanisms for users to rate the quality of tags. All tags can not be rated, therefore they look for ways of predicting tag quality, based on aggregate user behavior, on a user's own ratings and on aggregate users' ratings. They nd that tag selection methods that normalize by user, such as the numbers of users who applied a tag, perform the best.

In [ 9 ], Heymann et al. investigate the social tag prediction problem, the purpose of which is to predict future tags for a particular resource. The ability to predict tag applications can lead to various enhancements, such as increased recall, inter-user agreement, tag disambiguation, bootstrapping and system suggestion. They collected tag data from Delicious and fetched the web pages for each bookmark. They analyze two methods: The rst applies only when the bookmarked items are web pages (and not images, songs, videos, etc.). They develop an entropy based metric which measures how much a tag is predictable. They then extract association rules based on tag co-occurrence and give measurements of their interest and con dence. They nd that many tags do not contribute substantial additional information beyond page text, anchor text and surrounding hosts. Therefore this extra information are good tag predictors. In the case of using only tags, predictability is related to generality in the sense that the more information is known about a tag (i.e. the more popular it is), the more predictable it is. They add that these measures could be used by system designers to improve system suggestion or tag browsing.

Ramage et al. [ 17 ] compare two methods to cluster web pages using tag data. Their goal is to see whether tagging data can be used to improve web document clustering. This work is based on the clustering hypothesis from information retrieval, that \the associations between documents convey information about the relevance of documents to requests". The document clusters are used to solve the problem of query ambiguity by including di erent clusters in search results.

All of the above mentioned work di ers from our current study of tag cloud-based navigation in the following ways: (i) Previous studies have investigated the usefulness of tag clouds primarily from the basic visualization rather than the navigation standpoint. (ii) Those studies explicitly investigating tag cloud based navigation, have concentrated on simple algorithms for generating tag clouds. (iii) Previous studies investigating more sophisticated algorithms for tag prediction have evaluated those algorithms by assessing prediction accuracy on held-out data rather than \in situ" evaluation with real users for a particular application (tag cloud based navigation). 3.

TAG CLOUD BASED NAVIGATION

In this section we describe algorithms for generating contextaware tag clouds and query results list for tag cloud based navigation. Generating a tag cloud simply involves selecting the one hundred tags which are the most probable (to be clicked on by the user) given the current context (query). Estimating which terms are most probable depends on the model used as we discuss below. 3.1

Generating Context Aware Tag Clouds

We now investigate three di erent models for generating context-aware tag clouds. For each model we describe rst how an initial context-independent cloud is generated. We then describe how the context dependent cloud is generated in such a way as to take the current query (context) tags into account. 3.1.1

Popularity based Cloud Generation Model

The rst and simplest tag cloud generation model is based on the popularity of the tags across all documents in the corpus. We rst describe a query independent tag cloud, which can be used as the initial cloud for popularity based navigation.

Ranking tags by popularity on the home page gives users a global access point to the most proli c sections of the portal. The most popular tags are reachable from the popular tag cloud and displayed with a font size proportional to the amount of activity on that tag. A measure of the popularity of a tag across the corpus is given in the following: p(w) =

Pd2D Nw;d

Pd2D Nd where Nw;d is the count of occurrences of tag w for resource (document) d and Nd = Pw2V Nw;d is the total count for the document.

We can now compute a context sensitive version of the popular tag cloud quite simply as follows: p(wjQ) =

Pd2D(Q) Nw;d P d2D(Q) Nd Where D(Q) = [w2QD(w) is the union of all resources that have been tagged with words from the query Q. 3.1.2

Social Network Structure based Cloud Generation Model

We are interested in taking advantage of additional information contained in the social network of users (friendships) in order to improve the quality of the tag cloud. We assume that the friends of a user are likely to share similar interests and thus we can use the tag description of a user's friends to smooth the tag description of the user.

We calculate an entry (context independent) social tag cloud as follows: p(w) = X

X u2U u02f(u)

Nw;u0 Pw2W Nw;u0 where f (u) is the set of friends of user u and U denotes the set of all users in the social network.

We apply a slightly di erent derivation to calculate the context dependent social tag cloud. We estimate the probability p(wjw0) given the context tag w0. These probabilities are precomputed and combined depending on the query at run time. We hypothesize that users who are friends on a social tagging website are likely to have similar interests (likes & dislikes) and that we can use the social network structure to improve contextual tag cloud generation. We can leverage the social network (by marginalizing out the user u) as follows: p(wjw0) =

X p(w; ujw0) u2U u2U =

X p(wju) p(w0ju)p(u) p(w0) Calculating p(w0) and p(u) = Nu= Pu02U Nu0 is straightforward. We compute p(wju) by summing over tag counts Nw;u0 for users in the social network of the user u: p(wju) =

Pu02f(u) Nw;u0 P

u02f(u) Nu0

Note that since the summation in Equation 7 over all users involves a very large computation, we perform the summation only over the top 200 users as ranked according to the frequency p(wju). 3.1.3

Topic Model based Cloud Generation Model Another way to smooth the relative term frequency estimates and thereby improve the quality of the tag clouds generated is to rely on latent topic modeling techniques [ 6 ]. Using these techniques we can extract semantic topics representing user tagging behavior (aka user interests) from a matrix of relationships between tags and people. Topic models are term probability distributions over documents (in this case users) that are often used to represent text corpora. We apply a commonly used topic modeling technique called latent Dirichlet allocation (LDA) [ 6 ] to extract 100 topics by considering people as documents (and tags as their content).

The entry (context independent) tag cloud based on topic modeling is de ned as follows: p(w) =

X p(wjz)p(z) z2Z Where p(wjz) denotes the probability of the tag w to belong to (being generated by) topic z, its value is given as an output of the LDA algorithm. p(z) is the relative frequency of the topic z across all users in the corpus.

To compute the context aware tag cloud based on topic modeling, we simply marginalize over topics (instead of users): (3) (4) (6) (7) (8) (5) 3.2

Ranking Resources

p(wjw0) =

X p(wjz)p(zjw0) z2Z z2Z =

X p(wjz)p(w0jz)p(z) p(w0)

We follow a standard Language Modeling [ 3 ] approach to ranking resources (documents) according to a query. Thus we rank resources according to the likelihood that they would be generated by the query, namely the probability p(djQ), where d is a resource and Q the query as a set of tags. We give here the derivation of p(djQ) by applying Bayes' rule. p(djQ) = p(Qjd)p(d) p(Q) For ranking we can drop the normalization by p(Q) as it is the same for each resource d, which gives us:

score(djQ) = p(Qjd)p(d) We apply the naive Bayes assumption and consider the words in the query to be independent given the document d. Thus p(Qjd) factorizes into the product of word probabilities p(wjd): p(d) Y p(wjd)

w2Q X log p(wjd) w2Q (9) (10) (11) (12) (13) (14) (15) score(djQ) = p(Qjd)p(d) score(djQ) =ranking log p(d) + This product is equivalent in terms of ranking to the sum of the corresponding log probabilities. Thus we compute the score for a particular tag as follows : Computing p(d) is straightforward, we can either use the length of the tag description of the resource d or the uniform distribution p(d) = 1=D where D is the count of documents in the corpus.

For the browsing experiment, the log probabilities within the summation are exponentially weighted so as to give preference to the most recently clicked tags, as follows: jQj browsing score(djQ) = log p(d) + X i=1 i 1 log p(wijd) (16)

Here wi denotes the ith most recent term in the query Q, and is a decay parameter set to 0:8 in our experiments. 3.3

Precomputation

For each model we precompute the values for p(wjw0) which gives us three matrices of relations between tags. At run time we rank the tags to generate a contextual tag cloud according to a query of multiple tags as follows: p(wjQ) = log p(w) +

X log p(wjw0) w02Q (17)

In our experiments we set the parameter to 0:5. EMPIRICAL SETUP

We choose "Last.fm" to fetch our experimental dataset. "Last.fm" is a music sharing online social network which allows one to get social network data and tagging data from their application programming interface (API). To our knowledge it is the only network which enables researchers to fetch the friends of any user in the system. Fetching the social network is essential for experiments with social tag clouds.

We gather tag data by crawling users via their friend relationships. Once a new user is fetched, we download her own tags and then recursively fetch her friends and so on. We start by fetching the network of the author. In order to get a complete subset of the social network of "Last.fm", we apply a breadth rst search by exploring recursively the relations of each user. Once we have a substantial subset of the social network and tags, we fetch the tracks assigned to the tags. For each tag fetched, we get the 50 top tracks annotated with this tag.

Table People

Friends Tags Tracks Usages Tag applications

Size

Once the data is fetched by the ruby scripts via the "Last.fm" Web API, we migrate it to a MySQL database for processing. We precompute various tables to store data that will be used multiple times in the calculations. For instance we compute the term frequency of each tag, the term frequency for each tag and each user, the frequency of the friends of a user for a tag. From these tables we can then compute similarity tables between the probability of one tag given another for each model which corresponds to p(wjw0), we do this only for the tags used by at least 5 people which accounts for about twenty thousand tags. 5.

EVALUATION

We built a web application to evaluate our models in a user study. We conducted a pilot study where tag clouds are used to search tracks, a user survey and a follow-up study with the search task and a browsing task where participants used the tag cloud to pilot a recommendation system. We nd statistically signi cant evidence that the topic model and the social model perform better to generate tag clouds that lead to recommend songs that were liked and unknown by the participants than our base line, the popular model. 5.1

Pilot Study

The pilot study took place at the university of Lugano. We gathered 17 participants from our Bachelor, Master and Phd programs. Participants registered on an online form before the evaluation. They were asked to ll in an entry form and an exit form to answer general questions. The participants are asked to perform 20 tasks in which they must nd a particular track. Tracks are selected randomly from a pool of the 200 most popular tracks. The tag generation method is also selected randomly for each task.

The evaluation is designed as a within subject study. Each participant is her own control group as a model is randomly selected for each task and the participant is not directly informed of which model is used. Each action of the participants are stored in a log in the database.

Most participants had fun during the experiment. Probably listening to the music and discovering new music helps with this fun aspect and keeps the participants motivated. A participant noticed that quickly he was selecting popular tags and quickly browsing for the \red link" to stop the task. This technique had him nish with the second place, we believe the rst nishing participant had the same technique and was rejecting tasks faster if he couldn't nd it with popular tags. From the comments given, a participant gives as advantages \you don't have to think about the search terms, you can just pick one", another one adds \relief from typing". It seems to be the major advantage of tag navigation, it is hard for a person to come up with search terms from the vocabulary he has in mind, whereas when presented with a vocabulary, it is simple for him to choose what terms to use. Multiple participants think it would be simpler for them to type search keywords when they know before hand what terms they would use rather than browsing the tag cloud to nd the term they are looking for. Again it seems tag clouds are good to help remembering terms and when the participant does not know what terms to use, but in the case the participant has knowledge of what he is looking for it is easier for her to type. A participant note \if a tag is not in the list, I can not use it. Free search would be better from this point of view".

Some participants mentioned as an advantage \discovering new music". Probably the evaluation process by itself makes the participant discover new music by selecting randomly a track from the 100 most popular tracks. Also people discovered new music by reading the list of tracks when they clicked on tags. A participant mentioned that he would like a tag cloud to navigate pages from his browsing history in his web browser. A tag cloud would help remembering topics he has seen in his browsing life.

Model Popular Topic Social Started 132 131 158

A total of 302 tasks were completed and 101 were rejected. Each time a new task is given the model used to generate the tag cloud is selected randomly from the three models available. 94 tasks were completed for the popular tag cloud and 94 as well for the tag cloud based on topic models. The tag cloud based on social network lead to 116 completed tasks. Participants completed more often tasks involving the social tag cloud rather than the two other tag clouds. Table 2 summarises the number of started and completed tasks and gives the relative frequency in percentage for each model. The relative frequency of completed tasks regarding the number of started tasks for each model is similar.

Figures 4 and 5 give an overview of the results. Figure 4 represents the relative frequency, the number of tasks completed with that number of tags clicked relative to the total number of clicks for each model. We see that most of the tasks were completed after the rst click. The tracks to nd were selected from the top 100 popular tracks in our dataset. These tracks have a high probability of containing a popular tag.

We have graphed the data to show di erences in the distribution of click-counts (navigation path lengths) and time to completion (time to nd a song). On average, the time taken to complete a task is slightly shorter for topic-based tag clouds than the popular one (390 seconds against 400 seconds) and a bit better for the social based tag cloud (320 seconds against 400 seconds). While the distributions do vary slightly: the topic based model appears to have slightly lower navigation path lengths, and time to success values, the di erences are minimal and the results are not considered conclusive nor statistically signi cant. 5.2

User survey

We conducted a short user survey together with the pilot study. Table 3 gives the statements that were asked to be ranked on a likert scale. Figure 6 represents the answers of the participants for each question.

The answers to question 1 clearly shows that our users are heavy internet users which you would expect when conducting a survey in a computer science faculty. Eleven participants mostly disagree with statement 4 and 8 with statement 3 which are both statements about the usage of tagging systems, which shows that tagging is still a feature that is not broadly used by people even in a computer science department. Answers to statements 5 to 9 are inconclusive, participants are mostly undecided. No participant strongly disagree with statement 8 but only 5 mostly agree, nding items by navigating a tag cloud is a hard task for a human which shows that improvements regarding searchability are needed. Eight participants agree with statements 10 and 11 and 9 with statement 12. These three statements are about using the tag cloud to navigate various resources.

Most participants nd it easy to navigate the tag cloud and would use a tag cloud to navigate the Web or their personal les. Eight participants out of 17 agree with the 13th statement, 13 mostly agree. This con rms the fact that tag-based navigation improves discovery of new resources. 5.3

Follow-up study

We conducted a second study for which we adapted the system based on the comments we received in the pilot study. We improved the e ciency of the system by precomputing term relational matrices (p(wjw0)). For this evaluation we had 20 participants. None of the participants nished the evaluation, since the search task was harder than in the pilot study. Less results were given per query which forced people to use more precise queries.

Results in Table 4 show our social model slightly outperforming the popular and topic models. The results are not statistically signi cant.

To complete the tasks participants used multiple tags in their queries, a total of 54 for the popular model, 66 for the topic model and 68 for the social model. This suggests that the social model proposes tags that are more closely related to each other and therefore enables the user to make longer queries. 5.4

Experimenting with recommendation

The recommendation experiment consisted of tasks in which participants had to select a tag from the tag cloud and then listen to a song recommended from the current query (the query being composed of the tags selected so far), participants would rate the song (whether they like it or not) and then go back to the new tag cloud generated according to the query and the model.

Model Popular Topic Social

If we look at the relative frequencies of songs that were new to the participants within the songs that they liked, we nd that the popular model is the least e cient, intuitively popular items are liked and already known, which is why they are popular because so many people know them. Table 6 shows that the topic model is the best model followed closely by the social model, both models outperform quite signi cantly the popular model. These results support our thesis that using social relationships enhances the recommendation of new and relevant information. The topic model performs better than the social model, we believe that once the social model is personnalized, i.e. uses the actual social network of the participant instead of an overall probability from a social network, the social model would perform even better.

CONCLUSION AND FUTURE WORK

Our work has some limitations, the number of participants of the pilot study and follow-up study is relatively small (17 and 20 participants) which does not allow us to draw strong conclusions. We focused our attention on only one dataset from "Last.fm" with online music data, the conclusions can not be generalised to tag cloud based navigation of other corpora.

Our survey shows that search is not practical with tag clouds whereas recommendation and discovery of new information is. Our follow-up study shows that in the case of recommendation of items that people liked and were new to them, the topic and social models perform much better than the popularity model. 6.1

Future Work

We are working on a new evaluation methodology to leverage the social model with social network data from the participants. The rest of the evaluation works as the one described in this paper. We believe that this personalized social model will outperform the topic model.

[1]

Cattuto . Semiotic dynamics in online social communities . The European Physical Journal C - Particles and Fields , 46 ( 0 ): 33 {37, aug 2006 .

[2]

Cattuto ,

Loreto , and

Pietronero . From the Cover: Semiotic dynamics and collaborative tagging . Proceedings of the National Academy of Sciences , 104 ( 5 ): 1461 , 2007 .

[3]

P. R. Christopher D.

Manning and

Schutze . Introduction to Information Retrieval. Cambridge University Press, 2008 .

[4]

Fokker ,

Pouwelse , and

Buntine . Tag-Based Navigation for Peer-to-Peer Wikipedia . In Collaborative Web Tagging Workshop at WWW2006, Edinburgh, 2006 .

[5]

Golder and

B. A.

Huberman . The structure of collaborative tagging systems . Journal of Information Science , 32 ( 2 ): 198 { 208 , April 2006 .

[6] T. L. Gri ths and M. Steyvers. Finding scienti c topics . Proceedings of the National Academy of Sciences of the United States of America , 101 , 2004 .

[7]

Hammond ,

Hannay ,

Lund , and

J. Scott. Social

Bookmarking Tools (I). D-Lib Magazine , 11 ( 4 ): 1082 { 9873 , 2005 .

[8]

Hassan-Montero and

Herrero-Solana . Improving tag-clouds as visual information retrieval interfaces . In InScit2006: International Conference on Multidisciplinary Information Sciences and Technologies , 2006 .

[9]

Heymann ,

Ramage , and

Garcia-Molina . Social tag prediction . In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval , pages 531 { 538 , New York, NY, USA, 2008 . ACM.

[10]

Ishikawa ,

Klaisubun , and

Honma . Navigation e ciency of social bookmarking service . pages 280 { 283 , Nov . 2007 .

[11]

Kaser and

Lemire . Tag-cloud drawing: Algorithms for cloud visualization . In Tagging and Metadata for Social Information Organization Workshop , WWW07, 2007 .

[12] R. M. Keller , S. R.

Wolfe , J. R.

Chen , J. L.

Rabinowitz , and N.

Mathe . A bookmarking service for organizing and sharing urls . In Selected papers from the sixth international conference on World Wide Web , pages 1103 { 1114 , Essex , UK, 1997 . Elsevier Science Publishers Ltd.

[13]

Li ,

Bao ,

Yu ,

Fei , and

Su . Towards e ective browsing of large scale social annotations . In WWW '07: Proceedings of the 16th international conference on World Wide Web , pages 943 { 952 , New York, NY, USA, 2007 . ACM.

[14]

Marlow ,

Naaman ,

Boyd , and

Davis . Ht06, tagging paper, taxonomy, ickr, academic article, to read. In HYPERTEXT '06: Proceedings of the seventeenth conference on Hypertext and hypermedia , pages 31 { 40 , New York, NY, USA, 2006 . ACM.

[15]

Marlow ,

Naaman ,

Boyd , and

Davis . Position Paper, Tagging, Taxonomy, Flickr, Article, ToRead. In Collaborative Web Tagging Workshop at WWW2006, Edinburgh, Scotland, May 2006 .

[16]

D. R.

Millen and

Feinberg . Using social tagging to improve social navigation . In Workshop on the Social Navigation and Community based Adaptation Technologies , 2006 .

[17]

Ramage ,

Heymann ,

C. D.

Manning , and

Garcia-Molina . Clustering the tagged web . In Second ACM International Conference on Web Search and Data Mining (WSDM 2009 ), November 2008 .

[18]

Sen ,

F. M.

Harper , A.

LaPitz, and

J. Riedl.

The quest for quality tags . In GROUP '07: Proceedings of the 2007 international ACM conference on Supporting group work , pages 361 { 370 , New York, NY, USA, 2007 . ACM.

[19]

Sen ,

S. K.

Lam ,

A. M.

Rashid ,

Cosley ,

Frankowski ,

Osterhouse ,

F. M.

Harper , and

Riedl . tagging, communities, vocabulary, evolution. In CSCW '06: Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work , pages 181 { 190 , New York, NY, USA, 2006 . ACM Press.

[20]

Sinclair and

Cardew-Hall . The folksonomy tag cloud: when is it useful? J. Inf . Sci., 34 ( 1 ): 15 { 29 , 2008 .

[21]

Smadja ,

Tomkins , and

Golder . Collaborative web tagging workshop . In WWW2006, Edinburgh, Scotland, 2006 .