<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pragmatic metadata matters: How data about the usage of data e ects semantic user models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Claudia Wagner</string-name>
          <email>claudia.wagner@joanneum.at</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Strohmaier</string-name>
          <email>markus.strohmaier@tugraz.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yulan He</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Graz University of Technology and Know-Center In eldgasse 21a</institution>
          ,
          <addr-line>8010 Graz</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>JOANNEUM RESEARCH, Institute for Information and Communication Technologies Steyrergasse 17</institution>
          ,
          <addr-line>8010 Graz</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The Open University</institution>
          ,
          <addr-line>KMi Walton Hall, Milton Keynes MK7 6AA</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Online social media such as wikis, blogs or message boards enable large groups of users to generate and socialize around content. With increasing adoption of such media, the number of users interacting with user-generated content grows and as a result also the amount of pragmatic metadata - i.e. data about the usage of content - grows. The aim of this work is to compare di erent methods for learning topical user pro les from Social Web data and to explore if and how pragmatic metadata has an e ect on the quality of semantic user models. Since accurate topical user pro les are required by many applications such as recommender systems or expert search engines, learning such models by observing content and activities around content is an appealing idea. To the best of our knowledge, this is the rst work that demonstrates an e ect between pragmatic metadata on one hand, and the quality of semantic user models based on user-generated content on the other. Our results suggest that not all types of pragmatic metadata are equally useful for acquiring accurate semantic user models, and some types of pragmatic metadata can even have detrimental e ects.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Analysis</kwd>
        <kwd>Social Web</kwd>
        <kwd>Topic Models</kwd>
        <kwd>User Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Online social media such as Twitter, wikis, blogs or message boards enable large
groups of users to create content and socialize around content. When a large
group of users interact and socialize around content, pragmatic metadata is
produced as a side product. While semantic metadata is often characterized as data
about the meaning of data, we de ne pragmatic metadata as data about the
usage of data. Thereby, pragmatic metadata captures how data/content is used
by individuals or groups of users - such as who authored a given message, who
replied to messages, who \liked" a message, etc. Although the amount of
pragmatic metadata is growing, we still know little about how these metadata can
be exploited for understanding the topics users engage with.</p>
      <p>Many applications, such as recommender systems or intelligent tutoring
systems, require good user models, where "`good"' means that the model accurately
re ects user`s interest and behavior and is able to predict future content and
activities of users. In this work we explore to what extent and how pragmatic
metadata may contribute to semantic models of users and their content and
compare di erent methods for learning topical user pro les from Social Web
data.</p>
      <p>To this end, we use data from an online message board. We incorporate
different types of pragmatic metadata into di erent topic modeling algorithms and
use them to learn topics and to annotate users with topics. We evaluate the
quality of di erent semantic user models by comparing their predictive performance
on future posts of user. Our evaluation is based on the assumption that \better"
user models will be able to predict future content of users more accurately and
will need less time and training data.</p>
      <p>
        Generative probabilistic models are a state of the art technique for
unsupervised learning. In such models, observed and latent variables are represented
as random variables and probability calculus is used to describe the connections
that are assumed to exist between these variables. Only if the assumptions made
by the model are correct, Bayesian inference can be used to answer questions
about the data. Generative probabilistic models have been successfully applied
to large document collections (see e.g. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). Since for many documents one can
also observe metadata, several generative probabilistic models have been
developed which allow exploiting special types of metadata (see e.g., the Author Topic
model [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the Author-Recipient Topic model [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the Group Topic model [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
or the Citation In uence Topic model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). However, previous research [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] has
also shown that incorporating metadata into the topic modeling process may
lead to model assumptions which are too strict and might over t the data. This
means that incorporating metadata does not necessarily lead to \better" topic
models, where \better" means, for example, that the model is able to predict
future user-generated content more accurately and needs less trainings data to
t the model.
      </p>
      <p>Our work aims to advance our understanding about the e ects of
pragmatics on semantics emerging from user-generated content and speci cally aims to
answer the following questions:
1. Does incorporating pragmatic metadata into topic modeling algorithms lead
to more accurate models of users and their content and if yes, what types of
pragmatic metadata are more useful?
2. Does incorporating behavioral user similarities help acquiring more accurate
models of users and their content and if yes, which types of behavioral user
similarity are more useful?</p>
      <p>The remainder of the paper is organized as follows: Section 2 gives an overview
of the related work, while Section 3 describes our experimental setup. In Section
4 we report our results, followed by a discussion of our ndings in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        From a machine learning perspective, social web applications such as Boards.ie
provide a huge amount of unlabeled training data for which usually many types
of metadata can be observed. Several generative probabilistic models have been
developed which allow exploiting special types of metadata (such as the Author
Topic model [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the Author-Recipient Topic model [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the Group Topic model
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] or the Citation In uence Topic model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). In contrast to previous work where
researchers focused on creating new topic models for each type of metadata, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
presents a new family of topic models, Dirichlet-Multinomial Regression (DMR)
topic models, which allow incorporating arbitrary types of observed features .
Our work builds on the DMR topic model and aims to explore the extent to
which di erent types of pragmatic metadata contribute to learning topic models
from user generated content.
      </p>
      <p>
        In addition to research on advancing topic modeling algorithms, the
usefulness of topic models has been studied in di erent contexts, including social
media. For example, [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] explored di erent schemes for tting topic models to
Twitter data and compared these schemes by using the tted topic model for
two classi cation tasks. As we do in our work, they also point out that models
trained with a "`User"' scheme (i.e., using post aggregations of users as
documents) perform better than models trained with a "`Post"' scheme. However,
in contrast to our work they only explore relatively simple topic models and do
not take any pragmatic metadata (except authorship information) into account
when learning their models.
      </p>
      <p>
        In our own previous work, we have studied the relationship between
pragmatics and semantics in the context of social tagging systems. We have found
that, for example, the pragmatics of tagging (users' behavior and motivation in
social tagging systems [
        <xref ref-type="bibr" rid="ref11 ref4 ref6">11, 6, 4</xref>
        ]) exert an in uence on the usefulness of emergent
semantic structures [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In social awareness streams, we have shown that di
erent types of Twitter stream aggregations can signi cantly in uence the result of
semantic analysis of tweets [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In this paper, we extend this line of research
by (i) applying general topic models and (ii) using a dataset that o ers rich
pragmatic metadata.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Setup</title>
      <p>The aim of our experiments is to explore to what extent and how pragmatic
metadata can be exploited when semantically analyzing user generated content.
3.1</p>
      <sec id="sec-3-1">
        <title>Dataset</title>
        <p>The dataset used for our experiments and analysis was provided by Boards.ie,4
an Irish community message board that has been in existence since 1998. We
used all messages published during the rst week of February 2006 (02/01/2006
- 02/07/2006) and the last week of February 2006 (02/21/2006 - 02/28/2006).
We only used messages authored by users who published more than 5 messages
and replied to more than 5 messages during this week. While we performed our
experiments on both datasets, the results are similar. Consequently, we focus on
reporting results obtained on the rst dataset which consists of 1401 users and
27525 posts which were authored by these users and got replies.</p>
        <p>To assess the predictive performance of di erent topic models we estimate
how well they are able to predict the content (i.e. the actual words) of future
posts. We generated a test corpus of 4007 held out posts in the following way:
for each of the 1401 user in our training corpus we crawled 3 future posts which
were authored by them and to which at least one user of our training corpus has
replied. From here on, we refer to this data has hold-out data.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Methodology</title>
        <p>In this section we rst introduce the topic modeling algorithms (LDA, AT-model
and DMR topic model) on which our work is based and then proceed to describe
the topic models which we tted to our training data, their model assumptions
and how we compared and evaluated them.</p>
        <p>
          Latent Dirichlet Allocation (LDA) The idea behind LDA is to model
documents as mixtures of topics and force documents to favor few topics. Therefore,
each document exhibits di erent topic proportions and each topic is de ned as
a distribution over a xed vocabulary of terms. That means the generation of a
collection of documents is modeled as a three step process: First, for each
document d a distribution over topics d is sampled from a Dirichlet distribution .
Second, for each word wd in the document d, a single topic z is chosen according
to this distribution d. Finally, each word wd is sampled from a multinomial
distribution over words z which is speci c for the sampled topic z.
The Author Topic (AT) model The Author Topic model [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] is an extension
of LDA, which learns topics conditioned on the mixture of authors that
composed the documents. The assumption of the AT model is that each document is
generated from a topic distribution which is speci c to the set of authors of the
document. The observed set of variables are the words per document (similar as
in LDA) and the authors per document. The latent variables which are learned
by tting the model, are the topic distribution per author (rather than the topic
distribution per document as in LDA) and the word distribution per topic.
4 http://www.boards.ie/
        </p>
        <p>We implemented the AT-model based on Dirichlet-multinomial Regression
(DMR) Models (explained in the next section). While the original AT-model
uses multinomial distribution (which are all drawn from the same Dirichlet) to
represent an author-speci c topic distributions, the DMR-model based
implementation uses a \fresh" Dirichlet prior for each author from which then the
topic distribution is drawn.</p>
        <p>
          Dirichlet-multinomial Regression (DMR) Models Dirichlet-multinomial
regression (DMR) topic models [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] assume not only that documents are
generated by a latent mixture of topics but also that mixtures of topics are in uenced
by an additional factor which is speci c to each document. This factor is
materialized via observed features (in our case pragmatic metadata such as authorship
or reply user information) and induce some correlation across individual
documents in the same group. This means that e.g. documents which have been
authored by the same user (i.e., they belong to one group) are more likely to
chose the same topics. Formally, the prior distribution over topics is a function
of observed document features, and is therefore speci c to each distinct
combination of feature values. In addition to the observed features we add a default
feature to each document, to account for the mean value of each topic.
Fitting Topic Models In this section we describe the di erent topic models
which we tted to our training datasets (see table 1 and 2). Each topic model
makes di erent assumptions on what a document is (see column 3), takes di
erent types of pragmatic metadata into account (see column 4) and makes di erent
assumptions on the document-speci c topic distributions which generates each
documents (see column 5).
        </p>
        <p>
          For all models, we chose the standard hyperparameters which are optimized
during the tting process: = 50=T (prior of the topic distributions), = 0:01
(prior of the word distributions) and 2 = 0:5 (variance of the prior on the
parameter values of the Dirichlet distribution ). For the default features 2 = 10.
Based on the empirical ndings of [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], we decided to place an asymmetric
Dirichlet prior over the topic distributions and a symmetric prior over the distribution
of words. All models share the assumption that the total number of topics used
to describe all documents of our collection is limited and xed (via
hyperparameter T ) and that each topic must favor few words (as denoted by hyperparameter
which de nes the Dirichlet distribution from which the word distributions are
drawn - the higher the less distinct the drawn word distributions).
        </p>
        <p>
          Following the model selection approach described in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], we selected the
optimal number of topics for our training corpus by evaluating the probability of held
out data for various values of T (keeping = 0:01 xed). For both datasets (each
represents one week boards.ie data), a model trained on the "`Post"' scheme
(i.e., using each post as a document) gives on average (over 10 runs) the highest
probability to held out documents if T = 240 and model trained on the "`User"'
scheme (i.e., using all posts authored by one user as a document) gives on
average (over 10 runs) the highest probability to held out documents if T = 120.
We kept T xed for all our experiments.
        </p>
        <p>Evaluation of Topic Models To compare di erent topic models we use
perplexity which is a standard measure for estimating the performance of a
probabilistic model. Perplexity measures the ability of a model to predict words on
held out documents. In our case a low perplexity score may indicate that a model
is able to accurately predict the content of future posts authored by a user. The
perplexity measure is de ned as followed:
Nd
X lnP (wij ^; )</p>
        <p>Nd</p>
        <p>]
perplexity(d) = exp[ i=0
(1)</p>
        <p>In words, the perplexity of a held out post d is de ned as the exponential
of the negative normalized predictive likelihood of the words wi of the held out
post d (where Nd is the total number of words in d) conditioned on the tted
model.</p>
        <sec id="sec-3-2-1">
          <title>ID Alg Doc Metadata M1 LDA Post M2 LDA User</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>M3 DMR Post author</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>M4 DMR User author</title>
        </sec>
        <sec id="sec-3-2-4">
          <title>M5 DMR Post user who replied</title>
        </sec>
        <sec id="sec-3-2-5">
          <title>M6 DMR User user who replied</title>
        </sec>
        <sec id="sec-3-2-6">
          <title>M7 DMR Post related user</title>
        </sec>
        <sec id="sec-3-2-7">
          <title>Model Assumption</title>
          <p>A post is generated by a mixture of
topics and has to favor few topics.</p>
          <p>All posts of one user are generated by a
mixture of topics and have to favor few
topics.</p>
          <p>A post is generated by a user`s
authoring-speci c mixture of topics and
a user has to favor few topics he usually
writes about.</p>
          <p>All posts of one user are generated by a
user`s authoring-speci c mixture of
topics and a user has to favor few topics he
usually writes about.</p>
          <p>A post is generated by a user`s
replyingspeci c mixture of topics and a user has
to favor few topics he usually replies to.</p>
          <p>All posts of one user are generated by a
user`s replying-speci c mixture of
topics and a user has to favor few topics he
usually replies to.</p>
          <p>A post is generated by a user`s
authoring- or replying-speci c mixture
of topics and a user has to favor few
topics he usually replies to and he usually
writes about.</p>
          <p>M8 DMR User related user</p>
          <p>To answer this question, we t di erent models to our training corpus and
tested their predictive performance on future posts authored by our trainings
users.</p>
          <p>Figure 1 shows that the predictive performance of semantic models of users
which are either solely based on the users (i.e., aggregations of users` posts)
to whom these users replied (M6) or which take in addition also the content
authored by these users (M8) into account, is best. Therefore, our results suggest
that it is bene cial to take user`s reply behavior into account when learning
topical user pro les from user generated content.</p>
          <p>We also noted that all models which use the \User" training scheme (M4, M6
and M8) perform better than the models which use the \Post" training scheme
(M3, M5 and M7). One possible explanation for this is the sparsity of posts
which consist of only 66 tokens on average.</p>
          <p>
            Since we were interested in how the predictive performance of di erent models
change depending on the amount of data and time used for training, we split
our training dataset randomly into smaller buckets and tted the model on
di erent proportions of the whole training corpus. One would expect that as
the percentage of training data increases the predictive power of each model
would improve as it adapts to the dataset. Figure 1 however shows that this
is only true for our baseline models M1 and M2 which ignore all metadata of
posts. The model M3 which corresponds to the Author Topic model exhibits a
behavior that is similar to the behavior reported in [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]: When observing only
few training data, M3 makes more accurate predictions on held-out posts than
our baseline models. But the predictive performance of the model is limited by
the strong assumptions that future posts of one author are about the same topics
as past posts of the same author. Like M3, also M5 (and M7) seem to over- t
the data by making the assumptions that future posts of a user will be about
the same topics as posts he replied to in the past (and posts he authored in the
past).
          </p>
          <p>To address these over- tting problems we decided to incorporate smoother
pragmatic metadata into the modeling process which we get by exploiting
behavioral user similarities. The pragmatic metadata we used so far capture
information about the usage behavior of individuals (e.g., who authored a document),
while our smoother variants of pragmatic metadata capture information about
the usage behavior of groups of users which share some common characteristics
(e.g., what are the forums in which the author of this document is most active).
Our intuition behind incorporating these smoother pragmatic metadata which
are based on user similarities is that users which behave similar tend to talk
about similar topics.
2. Does incorporating behavioral user similarities help acquiring more accurate
models of users and their content and if yes, which types of behavioral user
similarity are more useful?</p>
          <p>From Figure 2 one can see that indeed all models which incorporate
behavioral user similarity exhibit lower perplexity than our baseline models, especially
if only few training samples are available. The model M12, which is based on the
assumption that users who talk to the same users talk about the same topics,
exhibits the lowest perplexity and outperforms our baseline models in terms of
their predictive performance on held out posts.</p>
          <p>y
it
x
e
l
p
r
e
p
0
0
0
5
2
0
0
0
0
2
0
0
0
0
1
0
0
0
5
0
●
●
●
● M9
● M11
● M10</p>
          <p>M12
M1
M2
M3
M4
●
●●
●
●●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
20
40
60
80</p>
          <p>100
trainings docs (%)</p>
          <p>For the model M10 which assumes that users who tend to post to the same
forums talk about the same topics, we can only observe a lower perplexity than
our baseline models when only few trainings data are available, but it still
outperforms other state of the art topic models such as the Author topic model.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion of Results and Conclusion</title>
      <p>While it is intuitive to assume that incorporating metadata about the pragmatic
nature of content leads to better learning algorithms, our results show that not
all types of pragmatic metadata contribute in the same way. Our results con rm
previous research which showed that topic models which incorporate pragmatic
metadata such as the author topic model tend to over- t data. That means
incorporating metadata into a topic model can lead to model assumptions which
are too strict and which yield the model to perform worse.</p>
      <p>Summarizing, our results suggest that:
{ Pragmatics of content in uence its semantics: Integrating pragmatic
metadata information into semantic user models in uences the quality of
resulting models.
{ Communication behavior matters: Taking user`s reply behavior into
account when learning topical user pro les is bene cial. Content of users to
which a user replied seems to be even more relevant for learning topical user
pro les than content authored by a user.
{ Behavioral user similarities improve user models: Smoother versions
of metadata based topic models which take user similarity into account
always seem to improve the models.
{ Communication behavior based similarities matter: Di erent types
of proxies for behavioral user similarity (e.g., number of forums they both
posted to, number of shared communication partners) lead to di erent
results. User who have a similar communication behavior seem to be more
likely to talk about the same topics, than users who post to similar forums.
Acknowledgments. The authors want to thank Boards.ie for providing the dataset
used in our experiments and Matthew Rowe for pre-processing the data. Furthermore
we want to thank David Mimno for answering questions about the DMR topic model
and So a Angelouta for fruitful discussions. Claudia Wagner is a recipient of a
DOCfForte fellowship of the Austrian Academy of Science.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>J. Mach. Learn. Res</source>
          .
          <volume>3</volume>
          ,
          <issue>993</issue>
          {
          <fpage>1022</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dietz</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bickel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Sche er, T.:
          <article-title>Unsupervised prediction of citation in uences</article-title>
          .
          <source>In: International Conference on Machine Learning</source>
          . pp.
          <volume>233</volume>
          {
          <issue>240</issue>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Gri ths, T.L.,
          <string-name>
            <surname>Steyvers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Finding scienti c topics</article-title>
          .
          <source>Proceedings of the National Academy of Sciences 101(Suppl. 1)</source>
          ,
          <volume>5228</volume>
          {5235 (April
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Helic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trattner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strohmaier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andrews</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>On the navigability of social tagging systems</article-title>
          .
          <source>In: The 2nd IEEE International Conference on Social Computing (SocialCom</source>
          <year>2010</year>
          ), Minneapolis, Minnesota, USA. pp.
          <volume>161</volume>
          {
          <issue>168</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davison</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          :
          <article-title>Empirical study of topic modeling in twitter</article-title>
          .
          <source>In: Proceedings of the First Workshop on Social Media Analytics</source>
          . pp.
          <volume>80</volume>
          {
          <fpage>88</fpage>
          . SOMA '10,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2010</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/1964858.1964870
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Koerner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kern</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grahsl</surname>
            ,
            <given-names>H.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strohmaier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Of categorizers and describers: An evaluation of quantitative measures for tagging motivation</article-title>
          .
          <source>In: 21st ACM SIGWEB Conference on Hypertext and Hypermedia (HT</source>
          <year>2010</year>
          ), Toronto, Canada, ACM. ACM, New York, NY, USA (
          <year>June 2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Koerner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benz</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strohmaier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hotho</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stumme</surname>
          </string-name>
          , G.:
          <article-title>Stop thinking, start tagging - tag semantics emerge from collaborative verbosity</article-title>
          .
          <source>In: Proc. of the 19th International World Wide Web Conference (WWW</source>
          <year>2010</year>
          ). ACM, Raleigh,
          <string-name>
            <surname>NC</surname>
          </string-name>
          , USA (Apr
          <year>2010</year>
          ), http://www.kde.cs.uni-kassel.de/benz/papers/2010/ koerner2010thinking.pdf
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Mccallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrada-Emmanuel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>The author-recipient-topic model for topic and role discovery in social networks: Experiments with enron and academic email</article-title>
          .
          <source>Tech. rep</source>
          .,
          <string-name>
            <surname>UMass</surname>
            <given-names>CS</given-names>
          </string-name>
          (
          <year>December 2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mimno</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression</article-title>
          .
          <source>In: Proceedings of the 24th Conference on Uncertainty in Arti cial Intelligence (UAI '08)</source>
          (
          <year>2008</year>
          ), http://citeseerx.ist.psu. edu/viewdoc/summary?doi
          <source>=10.1.1.140.6925</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Rosen-Zvi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Gri ths, T.,
          <string-name>
            <surname>Steyvers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>The author-topic model for authors and documents</article-title>
          .
          <source>In: Proceedings of the 20th conference on Uncertainty in arti cial intelligence</source>
          . pp.
          <volume>487</volume>
          {
          <fpage>494</fpage>
          . UAI '04, AUAI Press, Arlington, Virginia, United
          <string-name>
            <surname>States</surname>
          </string-name>
          (
          <year>2004</year>
          ), http://portal.acm.org/citation.cfm?id=
          <volume>1036843</volume>
          .
          <fpage>1036902</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Strohmaier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koerner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kern</surname>
          </string-name>
          , R.:
          <article-title>Why do users tag? Detecting users' motivation for tagging in social tagging systems</article-title>
          .
          <source>In: International AAAI Conference on Weblogs and Social Media (ICWSM2010)</source>
          , Washington, DC, USA, May
          <volume>23</volume>
          -
          <fpage>26</fpage>
          . AAAI, Menlo Park, CA, USA (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Wagner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strohmaier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Exploring the wisdom of the tweets: Knowledge acquisition from social awareness streams</article-title>
          .
          <source>In: Proceedings of the Semantic Search 2010 Workshop (SemSearch2010)</source>
          ,
          <source>in conjunction with the 19th International World Wide Web Conference (WWW2010)</source>
          , Raleigh,
          <string-name>
            <surname>NC</surname>
          </string-name>
          , USA, April
          <volume>26</volume>
          -30, ACM (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wallach</surname>
            ,
            <given-names>H.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mimno</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Rethinking</surname>
            <given-names>LDA</given-names>
          </string-name>
          :
          <article-title>Why priors matter</article-title>
          .
          <source>In: Proceedings of NIPS</source>
          (
          <year>2009</year>
          ), http://books.nips.cc/papers/files/nips22/ NIPS2009\_0929.pdf
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohanty</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mccallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Group and topic discovery from relations and text</article-title>
          .
          <source>In: In Proc. 3rd international workshop on Link discovery</source>
          . pp.
          <volume>28</volume>
          {
          <fpage>35</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>