<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta />
  </front>
  <body>
    <sec id="sec-1">
      <title>DATA SET</title>
      <p>The dataset was collected by Aragon et al. [1] using the
Twitter API that we extended by a crawl of the user
network. Our data set hence consists of two parts:</p>
      <p>Tweet dataset: tweet text and user metadata on the</p>
      <p>
        In case of a retweet, the Twitter API provides us with
the ID of the original tweet. By collecting retweets for a
given original tweet ID, we may obtain the set users who
have retweeted a given tweet with the corresponding retweet
timestamps. The Twitter API however does not tell us the
actual path of cascades if the original tweet was retweeted
1http://en.wikipedia.org/wiki/Occupy Wall Street
several times. The information from the Twitter API on
the tweet needs to be combined with the follower network
to reconstruct the possible information pathways for a given
tweet. However it can happen that for a given retweeter,
more than one friend has retweeted the corresponding tweet
before and hence we do not know the exact information
source of the retweeter. The retweet ambiguity problem is
well described in [
        <xref ref-type="bibr" rid="ref1">3</xref>
        ]. In what follows we consider all friends
as possible information sources. In other words for a given
tweet we consider all directed edges in the follower network
in which information ow could occur (see Fig. 2 (a)).
3.2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Restoring missing cascade edges</title>
      <p>For a given tweet, the computed edges de ne us a retweet
cascade. However our dataset contains only a sample of
tweets on the given hashtags and hence may not be
complete: it can happen that a few intermediate retweeters are
missing from our data. As a result, sometimes the
reconstructed cascade graphs are disconnected. As detailed in
Fig. 2 (b) and (c), we handle this problem in two di
erent ways. One possible solution is to only consider the rst
connected component of the cascade (see Fig. 2 (b)).
Another one is to connect each disconnected part to the root
tweeter with one virtual cascade edge (see Fig. 2 (c)). In
what follows, we work with cascades that contain virtual
edges, therefore every retweeter is included in the cascade.
3.3</p>
    </sec>
    <sec id="sec-3">
      <title>Examples of highly retweeted messages</title>
      <p>In Table 3, we give a few examples of highly retweeted
messages with the actual urls and names replaced by [url]
and [name].
4.</p>
    </sec>
    <sec id="sec-4">
      <title>FEATURE ENGINEERING</title>
      <p>To train our models, we generate features for each root
tweet in the data and then we predict the future cascade
size of the root tweet from these feature sets. For a given
root tweet, we compute features about
the author user and her follower network (network
features) and
the textual content of the tweet itself (content
features).</p>
      <p>The rst step of content processing is text normalization.
We converted the text them into lower case form except
those which are fully upper cased and replaced tokens by
their stem given by the Porter stemming algorithm. We
replaced user mentions (starting with '@') and numbers by
placeholder strings and removed the punctuation marks.</p>
      <p>
        The content features are extracted from the normalized
texts. The basic feature template in text analysis consists
the terms of the message. We used a simple whitespace
tokenizer rather than a more sophisticated linguistic tokenizer
as previous studies reported its empirical advantage [
        <xref ref-type="bibr" rid="ref17">19</xref>
        ].
We employed unigrams, bigrams and trigrams of tokens
because longer phrases just hurt the performance of the system
in our preliminary experiments.
      </p>
      <p>Besides terms, we extracted the following features
describing the orthography of the message:</p>
      <p>Hashtags are used to mark speci c topics, they can
be appended after the tweets or inline in the content,
marked by #. From the counts of hashtags the user
can tips the topic categories of tweet content but too
many hashtag can be irritating to the readers as they
just make confusion.</p>
      <p>Telephone number: If the tweet contains telephone
number it is more likely to be spam or ads.</p>
      <p>Urls: The referred urls can navigate the reader to text,
sound, and image information, like media elements and
journals thus they can attract interested readers. We
distinguish between full and truncated urls. The
truncated urls are ended with three dot, its probably copied
from other tweet content, so it was interested by
somebody.</p>
      <p>The like sign is an illustrator, encouragement to others
to share the tweet.</p>
      <p>
        The presence of a question mark indicates uncertainty.
In Twitter, questions are usually rhetorical|people do
not seek answers on Twitter [
        <xref ref-type="bibr" rid="ref17">19</xref>
        ]). The author more
likely wants to make the reader think about the
message content.
      </p>
      <p>The Exclamation mark highlights the part of the tweet,
it expresses emotions and opinions.</p>
      <p>If Numerical expressions are present the facts are
quanti ed then it is more likely to have real information
content. The actual value of numbers were ignored.
Mentions: If a user mentioned (referred) in the tweet
the content of the tweet is probably connected to the
mentioned user. It can have informal or private
content.</p>
      <p>Emoticons are short character sequences representing
emotions. We clustered the emoticons into positive,
negative and neutral categories.</p>
      <p>The last group of content features tries to capture the
modality of the message:</p>
      <p>Swear words in uence the style and attractiveness of
the tweet. The reaction for swearing can be ignorance
and also reattacking, which is not relevant in terms
of retweet cascade size prediction. We extracted 458
swear words from http://www.youswear.com.</p>
      <p>
        Weasel words and phrases2 aimed at creating an
impression that a speci c and/or meaningful statement
has been made when in fact only a vague or
ambiguous claim has been communicated. We used the weasel
word lexicon of [
        <xref ref-type="bibr" rid="ref25">27</xref>
        ].
      </p>
      <p>
        We employed the linguistic inquiry categories (LIWC)
[
        <xref ref-type="bibr" rid="ref23">25</xref>
        ] of the tweets' words as well. These categories
describe words from emotional, cognitive and structural
points of view. For example the \ask" word it is in
Hear, Senses, Social and Present categories. Di
erent LIWC categories can have di erent e ect on the
in uence of the tweet in question.
4.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>N-grams</title>
      <p>By using all the content features, we built n-grams as
consecutive sequences in the tweet text that may include
simply three terms (\posted a photo"), @-mentions,
hashtags, url (\@OccupyPics Photo http://t.co/. . . " coded as
[[user] Photo [url]]), numbers (\has [number] followers"),
non-alphanumeric (\right now !") as well as markers for
swear or weasel expressions (\[weasel word] people say").
We de ned the following classes of n-grams, for n 3:
2See http://en.wikipedia.org/wiki/Wikipedia:
Embrace_weasel_words.
Modality: The n-gram contains at least one swear or
weasel word or expression (overall 208,368);
Orthographic: No swear or weasel word but at least
one orthographic term (overall 2,751,935);
Terms: N-grams formed only of terms, no swear or
weasel words and orthographic features (overall 771,196).
For e ciency, we selected the most frequent 1,000 n-grams
from each class. The entire feature set hence consists of
3,000 trigrams.
5.</p>
    </sec>
    <sec id="sec-6">
      <title>TEMPORAL TRAINING AND EVALUA</title>
    </sec>
    <sec id="sec-7">
      <title>TION</title>
      <p>Here we describe the way we generate training and test
sets for our algorithms detailed in Section 6. First, for each
root tweet we compute the corresponding network and
content features. We create daily re-trained models: for a given
day t, we train a model on all root tweets that have been
generated before t but appeared later than t , where
is the preset time frame. After training based on the data
before a given day, we compute our predictions for all root
tweets appeared in that day.</p>
      <p>In order to keep the features up to date, we recompute all
network properties online, on the y and use the new values
to give predictions. By this method, we may immediately
notice if a user starts gaining high attention or if a bursty
event happens.</p>
      <p>We take special attention to de ning the values used for
training and evaluation. For evaluation, we used the
information till the end of the three week data set collection
period, i.e. we used all the known tweets that belong to the
given cascade. However, for training, we are only allowed
to use and count the tweets up to the end of the training
period. Since the testing period is longer, we linearly
approximated the values for the remaining part of the testing
period.</p>
      <p>Our goal is to predict cascade size at the time when the
root tweet is generated. One method we use is regression,
which directly predict the size of the retweet cascade. For
regression, we only use the global error measures:</p>
      <p>
        We also experiment with multiclass classi cation for ranges
of the cascade size. The cascade size follows a power law
distribution (see Fig. 3) and we de ned three buckets, one with
0. . . 10 (referred as \low"), one with 11. . . 100 (\medium")
and a largest one with more than 100 (\high") retweeters
participating in the cascade. We evaluate performance by
AUC [
        <xref ref-type="bibr" rid="ref11">13</xref>
        ] averaged for the three classes. Note that AUC has
a probabilistic interpretation: for the example of the \high"
class, the value of the AUC is equal to the probability that a
random highly retweeted message is ranked before a random
non-highly retweeted one.
      </p>
      <p>By the probabilistic interpretation of AUC, we may realize
that a classi er will perform well if it orders the users well
with little consideration on their individual messages. Since
our goal is to predict the messages in time and not the rather
static user visibility and in uence, we de ne new averaging
schemes for predicting the success of individual messages.</p>
      <p>We consider the classi cation of the messages of a single
user and de ne two aggregations of the individual AUC
values. First, we simply average the AUC values of users for
each day (user average)</p>
      <p>N
AU Cuser = 1 X AU Ci;</p>
      <p>N i=1
Second, we are weighting the individual AUC values with
the activity of the user (number of tweets by the user for
the actual day)</p>
      <p>AU Cwuser =</p>
      <p>PN
i=1 AU CiTi
PiN Ti
where Ti is the number of tweets by the i-th user.</p>
      <p>We may also obtain regressors from the multiclass classi
cation results. In order to make classi cation and regression
comparable, we give a very simple transformation that
replaces each class by a value that can be used as regressor.
(1)
(2)</p>
      <p>We select and use the training set average value in each class
as the ideal value for the prediction.</p>
      <p>In this section, we train and evaluate rst the classi cation
and then the regression models to predict the future cascade
size of tweets. We predict day by day, for each day in the
testing period. For classi cation, we also evaluate on the
user level by using equations (1) and (2). For classi cation,
we show the best performing features as well.</p>
      <p>As mentioned in Section 5, we may train our model with
di erent . In Figure 4 we show the average AUC value with
di erent time frames. As Twitter trends change rapidly, we
achieve the best average results if we train our algorithms
on root tweets that were generated in the previous week
(approximately seven days), both for global and for user
level average evaluation.</p>
    </sec>
    <sec id="sec-8">
      <title>Cascade size by multiclass classification</title>
      <p>
        First, we measure classi er performance by computing
the average AUC values of the nal results for the three
size ranges. We were interested in how di erent classi ers
perform and how di erent feature sets a ect classi er
performance. For this reason, we repeated our experiments
with di erent feature subsets. Figure 5 shows our results.
For each day, the network features give a strong baseline.
The combination of these features with the content result
in strong improvement in classi er performance. In Table 5
we summarize the average AUC values for di erent feature
subsets over all four datasets. Our results are consistent:
in all cases, the content related features improve the
performance. Finally, we give the performance of other classi ers
in Table 6 and conclude the superiority of the Random
Forest classi er [
        <xref ref-type="bibr" rid="ref10">12</xref>
        ]. We use the classi er implementations of
Weka [
        <xref ref-type="bibr" rid="ref27">29</xref>
        ] and LibLinear [
        <xref ref-type="bibr" rid="ref9">11</xref>
        ].
6.2
      </p>
    </sec>
    <sec id="sec-9">
      <title>Cascade size by regression</title>
      <p>
        We give regression results by the linear regression,
multilayer perceptron and the regression tree implementation
of Weka [
        <xref ref-type="bibr" rid="ref27">29</xref>
        ] in Table 7. As seen when compared to the
last three columns in Table 5, regression methods
outperform multiclass classi cation results transformed to
regressors. Note that for the transformation, we use class averages
obtained from the training data. If however we could
per
      </p>
      <p>
        We selected the most important network features by
running a LogitBoost classi er [
        <xref ref-type="bibr" rid="ref12">14</xref>
        ]. The best features were all
      </p>
    </sec>
    <sec id="sec-10">
      <title>Content feature contribution analysis</title>
      <p>We selected the most important content features by
running logistic regression over the 3,000 trigrams described in
Section 4.3. The features are complex expressions
containing elements from the three major group of linguistic feature
sets in the following order of absolute weight obtained by
logistic regression:
1. Three words [marriage between democracy], in this
order;
2. [at [hashtag occupywallstreet][url]]: the word \at",
followed by the hashtag \#occupywallstreet", and a
url;
3. [between democracy and];
4. [capitalism is over];
5. [[hashtag ows] pls];
6. [[weasel word] marriage between]: the expression
\marriage between" on the weasel word list, which counts
as the third element of the trigram;
7. [[hashtag zizek] at [hashtag occupywallstreet]];
8. [[hashtag occupywallstreet][url][hashtag auspol]];
9. [over [hashtag zizek] at];
10. [calientan la]: means \heating up".</p>
      <p>Note that all these features have negative weight for the
upper two classes and positive or close to 0 for the lower
class. Hence the appearance of these trigrams decrease the
value obtained by the network feature based model. We may
conclude that the use of weasel words and uninformative
phrases reduce the chance of getting retweeted, as opposed
to the sample highly retweeted messages in Table 3.
6.6</p>
    </sec>
    <sec id="sec-11">
      <title>Frozen network features</title>
      <p>To illustrate the importance of the temporal training and
evaluation framework and the online update of the network
features, we made an experiment where we replaced user
features by static ones. The results are summarized in
Table 9. Note that on the user level, all messages will have the
same network features and hence classi cation will be
random with AUC=0.5. In contrast, online updated network
features are already capable of distinguishing between the
messages of the same user, as seen in Tables 5 and 7.
7.</p>
    </sec>
    <sec id="sec-12">
      <title>CONCLUSIONS</title>
      <p>In this paper we investigated the possibility of predicting
the future popularity of a recently appeared text message
in Twitter's social networking system. Besides the typical
user and network related features, we consider hashtag and
linguistic analysis based ones as well. Our results do not only
con rm the possibility of predicting the future popularity
of a tweet, but also indicate that deep content analysis is
important to improve the quality of the prediction.</p>
      <p>In our experiments, we give high importance to the
temporal aspects of the prediction: we predict immediately after
the message is published, and we also evaluate on the user
level. We consider user level evaluation key in temporal
analysis, since the in uence and popularity of a given user
is relative stable while the retweet count of her particular
messages may greatly vary in time. On the user level, we
observe the importance of linguistic elements of the content.</p>
    </sec>
    <sec id="sec-13">
      <title>Acknowledgments</title>
      <p>We thank Andreas Kaltenbrunner for providing us with the
Twitter data set [1].</p>
    </sec>
    <sec id="sec-14">
      <title>REFERENCES</title>
      <p>[1] P. Aragon, K. E. Kappler, A. Kaltenbrunner,
D. Laniado, and Y. Volkovich. Communication
dynamics in twitter during political campaigns: The
case of the 2011 spanish national election. Policy &amp;
Internet, 5(2):183{206, 2013.
[2] E. Bakshy, D. Eckles, R. Yan, and I. Rosenn. Social
in uence in social advertising: evidence from eld
experiments. In Proceedings of the 13th ACM
Conference on Electronic Commerce, pages 146{161.
ACM, 2012.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bakshy</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. M. H.</surname>
            ,
            <given-names>W. A.</given-names>
          </string-name>
          <string-name>
            <surname>Mason</surname>
            , and
            <given-names>D. J.</given-names>
          </string-name>
          <string-name>
            <surname>Watts</surname>
          </string-name>
          .
          <article-title>Everyone's an in uencer: quantifying in uence on twitter</article-title>
          .
          <source>In Proceedings of the fourth ACM international conference on Web search and data mining</source>
          , pages
          <volume>65</volume>
          {
          <fpage>74</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bakshy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. A.</given-names>
            <surname>Mason</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Watts</surname>
          </string-name>
          .
          <article-title>Identifying in uencers on twitter</article-title>
          .
          <source>In Fourth ACM International Conference on Web Seach and Data Mining (WSDM)</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bakshy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Karrer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Adamic</surname>
          </string-name>
          .
          <article-title>Social in uence and the di usion of user-created content</article-title>
          .
          <source>In Proceedings of the 10th ACM conference on Electronic commerce</source>
          , pages
          <volume>325</volume>
          {
          <fpage>334</fpage>
          . ACM,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bakshy</surname>
          </string-name>
          , I. Rosenn,
          <string-name>
            <given-names>C.</given-names>
            <surname>Marlow</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Adamic</surname>
          </string-name>
          .
          <article-title>The role of social networks in information di usion</article-title>
          .
          <source>In Proceedings of the 21st international conference on World Wide Web</source>
          , pages
          <volume>519</volume>
          {
          <fpage>528</fpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          , E. Bakshy,
          <string-name>
            <given-names>M.</given-names>
            <surname>Burke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Karrer</surname>
          </string-name>
          .
          <article-title>Quantifying the invisible audience in social networks</article-title>
          .
          <source>In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems</source>
          , pages
          <fpage>21</fpage>
          {
          <fpage>30</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Boyd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Golder</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Lotan</surname>
          </string-name>
          .
          <article-title>Tweet, tweet, retweet: Conversational aspects of retweeting on twitter</article-title>
          .
          <source>In System Sciences (HICSS)</source>
          ,
          <year>2010</year>
          43rd Hawaii International Conference on, pages
          <volume>1</volume>
          {
          <fpage>10</fpage>
          . IEEE,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Haddadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Benevenuto</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Gummadi</surname>
          </string-name>
          .
          <article-title>Measuring user in uence in twitter: The million follower fallacy</article-title>
          .
          <source>In 4th International AAAI Conference on Weblogs and Social Media (ICWSM)</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cheng</surname>
          </string-name>
          , L. Adamic,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Dow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          . Can cascades be predicted?
          <source>In Proceedings of the 23rd international conference on World wide web</source>
          , pages
          <volume>925</volume>
          {
          <fpage>936</fpage>
          . International World Wide Web Conferences Steering Committee,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [11]
          <string-name>
            <surname>R.-E. Fan</surname>
            ,
            <given-names>K.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-J. Hsieh</surname>
            ,
            <given-names>X.-R.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            , and
            <given-names>C.-J.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
          </string-name>
          .
          <article-title>Liblinear: A library for large linear classi cation</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          ,
          <volume>9</volume>
          :
          <year>1871</year>
          {
          <year>1874</year>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [12] FastRandomForest.
          <article-title>Re-implementation of the random forest classi er for the weka environment</article-title>
          . http://code.google.com/p/fast-random-forest/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fogarty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Baker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Hudson</surname>
          </string-name>
          .
          <article-title>Case studies in the use of roc curve analysis for sensor-based estimates in human computer interaction</article-title>
          .
          <source>In Proceedings of Graphics Interface</source>
          <year>2005</year>
          , GI '
          <volume>05</volume>
          , pages
          <fpage>129</fpage>
          {
          <fpage>136</fpage>
          , School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada,
          <year>2005</year>
          . Canadian Human-Computer Communications Society.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hastie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          .
          <article-title>Additive logistic regression: A statistical view of boosting</article-title>
          .
          <source>Annals of statistics</source>
          , pages
          <volume>337</volume>
          {
          <fpage>374</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Lerman</surname>
          </string-name>
          .
          <article-title>Predicting in uential users in online social networks</article-title>
          .
          <source>arXiv preprint arXiv:1005.4882</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>V.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Kappen</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Kaltenbrunner</surname>
          </string-name>
          .
          <article-title>Modeling the structure and evolution of discussion cascades</article-title>
          .
          <source>In Proceedings of the 22nd ACM conference on Hypertext and hypermedia</source>
          , pages
          <volume>181</volume>
          {
          <fpage>190</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>V.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Kappen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Litvak</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Kaltenbrunner</surname>
          </string-name>
          .
          <article-title>A likelihood-based framework for the analysis of discussion threads</article-title>
          .
          <source>World Wide Web</source>
          , pages
          <volume>1</volume>
          {
          <fpage>31</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kumaraguru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Castillo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Meier</surname>
          </string-name>
          . Tweetcred:
          <article-title>Real-time credibility assessment of content on twitter</article-title>
          .
          <source>In Social Informatics</source>
          , volume
          <volume>8851</volume>
          of Lecture Notes in Computer Science, pages
          <volume>228</volume>
          {
          <fpage>243</fpage>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>V.</given-names>
            <surname>Hangya</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Farkas</surname>
          </string-name>
          .
          <article-title>Filtering and polarity detection for reputation management on tweets</article-title>
          .
          <source>In Working Notes of CLEF 2013 Evaluation Labs and Workshop</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Dan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Davison</surname>
          </string-name>
          .
          <article-title>Predicting popular messages in twitter</article-title>
          .
          <source>In Proceedings of the 20th International Conference Companion on World Wide Web, WWW '11</source>
          , pages
          <fpage>57</fpage>
          {
          <fpage>58</fpage>
          , New York, NY, USA,
          <year>2011</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kwak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Park</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Moon</surname>
          </string-name>
          .
          <article-title>What is twitter, a social network or a news media</article-title>
          ?
          <source>In Proceedings of the 19th international conference on World wide web</source>
          , pages
          <volume>591</volume>
          {
          <fpage>600</fpage>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lerman</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          .
          <article-title>Information contagion: An empirical study of the spread of news on digg and twitter social networks</article-title>
          .
          <source>In Proceedings of 4th International Conference on Weblogs and Social Media (ICWSM)</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>N.</given-names>
            <surname>Naveed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gottron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kunegis</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Alhadi</surname>
          </string-name>
          .
          <article-title>Bad news travel fast: A content-based analysis of interestingness on twitter</article-title>
          .
          <source>In Proceedings of the 3rd International Web Science Conference, WebSci '11. ACM</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>R.</given-names>
            <surname>Palovics</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Daroczy</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Benczur</surname>
          </string-name>
          .
          <article-title>Temporal prediction of retweet count</article-title>
          .
          <source>In Cognitive Infocommunications (CogInfoCom)</source>
          ,
          <year>2013</year>
          IEEE 4th International Conference on, pages
          <volume>267</volume>
          {
          <fpage>270</fpage>
          . IEEE,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennebaker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ireland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzales</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Booth</surname>
          </string-name>
          .
          <article-title>The development and psychometric properties of liwc2007</article-title>
          .
          <source>Technical report</source>
          , University of Texas at Austin,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>S.</given-names>
            <surname>Petrovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Osborne</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Lavrenko</surname>
          </string-name>
          .
          <article-title>Rt to win! predicting message propagation in twitter</article-title>
          .
          <source>In ICWSM</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Gy. Szarvas</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Vincze</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Farkas</surname>
            , Gy. Mora,
            <given-names>and I.</given-names>
          </string-name>
          <string-name>
            <surname>Gurevych</surname>
          </string-name>
          .
          <article-title>Cross-genre and cross-domain detection of semantic uncertainty</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>38</volume>
          (
          <issue>2</issue>
          ):
          <volume>335</volume>
          {
          <fpage>367</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and M.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kan</surname>
          </string-name>
          .
          <article-title>Re-tweeting from a linguistic perspective</article-title>
          .
          <source>In Proceedings of the Second Workshop on Language in Social Media</source>
          , pages
          <volume>46</volume>
          {
          <fpage>55</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Frank</surname>
          </string-name>
          .
          <source>Data Mining: Practical Machine Learning Tools and Techniques</source>
          . Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, second edition,
          <year>June 2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>