<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Lexical and Machine Learning approaches toward Online Reputation Management</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chao Yang</string-name>
          <email>chao-yang@uiowa.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sanmitra Bhattacharya</string-name>
          <email>sanmitra-bhattacharya@uiowa.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Padmini Srinivasan</string-name>
          <email>padmini-srinivasan@uiowa.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Iowa</institution>
          ,
          <addr-line>Iowa City, IA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the popularity of social media, people are more and more interested in mining opinions from it. Learning from social media not only has value for research, but also good for business use. RepLab 2012 had Pro ling task and Monitoring task to understand the company related tweets. Pro ling task aims to determine the Ambiguity and Polarity for tweets. In order to determine this Ambiguity and Polarity for the tweets in RepLab 2012 Pro ling task, we built Google Adwords Filter for Ambiguity and several approaches like SentiWordNet, Happiness Score and Machine Learning for Polarity. We achieved good performance in the training set, and the performance in test set is also acceptable.</p>
      </abstract>
      <kwd-group>
        <kwd>Polarity</kwd>
        <kwd>Ambiguity</kwd>
        <kwd>Company</kwd>
        <kwd>Twitter</kwd>
        <kwd>SentiWordNet</kwd>
        <kwd>Happiness Score</kwd>
        <kwd>Google Adwords</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Social media has become an integral part of our everyday life. The increasing
in uence of social media on our daily life can be observed in various scenarios,
ranging from gathering movie reviews [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to understanding health beliefs[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] Given
the number of online users voicing their personal opinion on various topics on
social media streams such as Twitter, it's now feasible to aggregate opinions
of the public to create meaningful inferences. Reputation management is one
such area where public opinion towards a topic (such as a company or product)
is aggregated. Traditional methods of reputation management are essentially
based on word-of-mouth or surveys which are not only expensive but also time
consuming. With the advent social media, reputation management can be done
rapidly, more extensively and at a cheaper cost. The \evaluation campaign for
Online Reputation Management Systems" or RepLab 2012, aimed towards this
goal of aggregating public views on a company to see how a company (or it's
products) are perceived among online users. The goal was also to gauge company
strengths and weaknesses and most importantly, from the company's perspective,
predict early threats to it's reputation and thereby neutralize them before they
become widespread. Keeping this in mind, Replab 2012 had two tasks: Pro ling
task and Monitoring task using Twitter data.
      </p>
      <p>Our group only participated in the Pro ling task. Here systems were required
to automatically work on two di erent aspects related to tweets: Ambiguity and
Polarity. For Ambiguity one needs to judge if there is a relationship between the
tweet and the company. For example, in the tweet \Apple May Legally Force
Motorola To Destroy Their Phones http://nblo.gs/uFvFb", `apple' refers to the
company Apple, Inc. On the other hand, in the tweet \I need to get o the
co ee and eat my apple and carrots.", `apple' is a fruit. In this task, a tweet
needs to be judged as relevant or irrelevant w.r.t. a company name. Polarity
of a tweet is de ned as the polarity w.r.t. the reputation of a company. For
instance, the tweet \Lufthansa announces major expansion in Berlin with
opening of new Brandenburg Airport in June 2012" entails a positive view towards
the company `Lufthansa', and hence has a positive in uence on the company's
reputation. On the contrary, the tweet \#Freedomwaves - latest report, Irish
activists removed from a Lufthansa plane within the past hour." entails a
negative view towards `Lufthansa' and hence it may have negative in uence on same
company's reputation. As a third category, there can also be tweets that have
neither positive nor negative in uence on a company's reputation (e.g. \I'm at
Lufthansa Aviation Center (LAC) (Airportring 1, Frankfurt am Main) w/ 2
others http://4sq.com/vTCDiA"). Such cases are identi ed as neutral.
Participating systems are required to declare each tweet as positive, negative or neutral
w.r.t. a company's reputation.</p>
      <p>
        This paper describes our ve run submissions. Run 1 and Run 2, we use the
popular sentiment lexicon, SentiWordNet [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], to identify the polarity of tweets.
To determine the ambiguity of tweets, we use a Google AdWords1 lter in Run
1, while in Run 2 we treat all tweets as relevant to some companies. For Run 3
and Run 4, we used a Happiness lexicon (discussed later) to identify the polarity
of tweets. Similar to Run 1, Run 3 also uses Google AdWords lter to judge the
ambiguity tweets, while Run 4 treats all tweets as relevant to some companies.
Finally, for Run 5, we again treat all the tweets as relevant to some companies,
and use a machine learning approach to classify the polarity of test tweets. The
classi er was built using all the tweets in training set.
      </p>
      <p>Below we describe our Google AdWords lter for judging `Ambiguity',
SentiWordNet and `Happiness lexicon'-based approaches for polarity, and lastly our
classi er-based approach for polarity. We conclude with a discussion of the
performance of our submitted runs.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Dataset Acquisition</title>
      <p>Similar to the TREC Microblog track datasets, tweets about the various
companies were not distributed directly due to Twitter's data sharing policies, but
participating teams were given tweet IDs and associated information for
accessing the tweets directly from Twitter. Tools for downloading the tweet contents
from the Twitter servers were provided. However, this task proved to be
challenging. Since Twitter data is dynamic (users may delete tweets or even delete</p>
      <sec id="sec-2-1">
        <title>1 https://adwords.google.com/</title>
        <p>accounts), this resulted in di erences in datasets collected by the di erent
participating research groups at di erent times.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Dataset Statistics</title>
      <p>The training set comprised of 300 tweets each for 6 companies. Table 1 shows
the statistics of the training set.</p>
      <p>Company apple lufthansa alcatel armani barclays marriott
Total Tweets 300 300 300 300 300 300
Relevant 281 299 289 179 298 294
Non-relevant 19 1 11 21 2 6
Null Tweets 33 37 32 92 16 46
English 74 228 236 270 292 285
Spanish 226 72 64 30 8 15
Positive 70 242 221 24 248 94
Neutral 195 35 64 155 24 192
Negative 18 20 4 5 27 11</p>
      <p>The null tweets are the ones which could not be obtained from Twitter
because of the aforementioned problem (Section 2). In the training set, most of the
tweets are relevant to the company. Armani has the most non-relevant tweets
which are only 21. English tweets dominate the training set except the tweets
for Apple. We also observed the there are not too many negative tweets for each
company.</p>
    </sec>
    <sec id="sec-4">
      <title>Methods</title>
      <sec id="sec-4-1">
        <title>Google AdWords Filter For `Ambiguity'</title>
        <p>Analysis of Tweets in Training Set Before developing methods to determine
the ambiguity of tweets we manually look through the tweets in training set.
Basically, we found the tweets for `Apple' are judged as relevant or non-relevant
mostly by the following factors shown in Table 2.</p>
        <p>This table shows the various factors we found in training set that could
di erentiate the `Ambiguity' for Apple Inc. So our hypothesis for determining
`Ambiguity' for a company is: if a tweets has one or more keywords related to
company factors, it is labeled as relevant. Otherwise, it is labeled as non-relevant.</p>
        <p>It is true that di erent companies have di erent factors. But generally, there
are some common company factors: speci c products, generic products,
competitors, o cial tweet account name, company name hashtag, company leaders,
and o cial website.</p>
        <p>Factors Example
Apple as Company Speci c products iTunes, Apple TV, iPad, etc.</p>
        <p>Generic products phone, tablet, etc.</p>
        <p>Apps &amp; Music Angry Birds, etc.</p>
        <p>Competitors &amp; their products Samsung, HTC, etc
Leader name Steve Jobs</p>
        <p>Company related term products, trademark dispute
Apple as Fruit Verb. eat</p>
        <p>Detail fruit related term apple sauce, apple soup, pie, etc.</p>
        <p>Generic fruit related term fruit, food, etc.</p>
        <p>Other fruits banana</p>
        <p>
          Automatically acquiring company factors for each company is not a trivial
task. In this paper we propose a new method for getting the company factors,
using Google AdWords Keyword Tool[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Google AdWords Keyword Tool is a
service from Google to help advertisers choose a few search terms related to their
business. The keywords are the most frequent searched words by the internet
users. Using the Keyword Tool, we can easily get the popular products for the
company, generic products, and other company related terms.
        </p>
        <p>Table 3 shows the top 20 out of 787 English Google AdWords for Apple, Inc.
collected on Jun. 6th 2012 .</p>
        <p>Rank Keyword Rank Keyword
1 apple 11 apple iphone 8gb
2 apple store 12 apple iphone 5
3 apple iphone 4 13 apple iphone case
4 apple support 14 apple i4
5 apple iphone 4g 15 apple iphone covers
6 apple iphone 16 apple iphone 4gs
7 apple ipod touch 17 apple website
8 apple iphone support 18 apple 3g iphone
9 apple 3g 19 apple bumper case
10 apple i phones 20 apple 4g phone</p>
        <p>Although there are some keywords maybe mentioned the same product, for
example `apple iphone', `apple i phones'. But in total number of 787 keywords,
they covered most of the popular products and Apple Inc. related keywords.</p>
        <p>We developed two strategies to match the keywords. In the rst strategy
(Filter 1) we determine if a tweet has a whole keyword or not. In the second
strategy (Filter 2) we determine if the tweet has one or more tokens of the</p>
      </sec>
      <sec id="sec-4-2">
        <title>Flow Chart of Google AdWords Filter Figure 1 shows the owchart of our</title>
        <p>Google AdWords Filter. For one company, we use the company name or website
as the search query to extract both English and Spanish keywords. So we have 4
lists of keywords for one company. Then we merge them into one large list, and
also automatically add the Twitter account and hashtag for this company. For
example, we can add `@apple' and `#apple' as the o cial account and hashtag
for Apple Inc. Finally, if the tweet has one and more keywords, it is labeled as
relevant.
4.2</p>
      </sec>
      <sec id="sec-4-3">
        <title>SentiWordNet Approach For Polarity</title>
        <p>Although polarity for reputation is substantially di erent from standard
sentiment analysis, it does have some similarity. The polarity of a tweet is, in essence,
expressed using certain sentiment-loaded keywords. For instance, in the tweet
`Lehmann Brothers goes bankrupt', the word `bankrupt' has a negative polarity
for reputation. Thus, if we can determine the polarity for each word in tweet,
we may be able to determine the polarity of tweet.</p>
        <p>SentiWordNet Since there is no speci c polarity scores list for words to
determine the polarity of reputation, so we decided to use SentiWordNet to get the
polarity of individual words. SentiWordNet is a lexical resource for sentiment
analysis and opinion mining. SentiWordNet assigns to each synset of WordNet
three sentiment scores: positivity, negativity, objectivity. That is, SentiWordNet
has a list of negative and positive scores for words. One word may have
several negative and positive scores in di erent cases. Table 4 shows an example of
`bankrupt'</p>
        <p>POS &amp; ID Word PosScore NegScore
n 09838370 bankrupt#1 0 -0.625
v 02318165 bankrupt#1 0 0</p>
        <p>The pair (POS &amp; ID) uniquely identi es a WordNet (3.0) synset. Because the
word `bankrupt' has 2 cases in SentiWordNet, we calculate an average PosScore
and NegScore for this word.</p>
        <p>To determine the polarity for the tweet we use two strategies. Both strategies
assign an polarity score for a tweet, then use two thresholds to determine the
polarity of the tweet.</p>
        <p>In the rst strategy (MaxS) we use the maximum SentiWordNet scores among
all the words to determine the polarity of tweets. Here we nd the maximum
PosScore and NegScore for each word of the tweet rst. For example, in the
tweet `Lehmann Brothers goes bankrupt', after excluding the company name,
`bankrupt' has the maximum NegScore and `goes' has the maximum PosScore.
So the sum of these scores are the polarity score for the tweet.</p>
        <p>In the second strategy (SumS) we use the sum of SentiWordNet scores of
all the words. For example, again in `Lehmann Brothers goes bankrupt', after
excluding the company name, the sum of PosScores and NegScores of `goes' and
`bankrupt', becomes the polarity score of the tweet.</p>
        <p>Thus, both of the strategies give polarity score for each tweet. We then set
two xed thresholds: positive threshold and negative threshold. If the polarity
score is larger than positive threshold, we claim the tweet is positive. If the
polarity score is smaller than negative threshold, we claim the tweet is negative.
Otherwise, the tweet is neutral. For Spanish tweets, we use Google Translate2
to translate all the words in SentiWordNet to Spanish words. Then we use the
translated Spanish SentiWordNet to determine the polarity of Spanish tweet.
4.3</p>
      </sec>
      <sec id="sec-4-4">
        <title>Happiness Score Approach For Polarity</title>
        <p>
          In addition to the SentiWordNet score, we also tried another approach to
determine the polarity of words in a tweet. Happiness score[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] which developed by
Dodds et al by crowdsourcing. It has a list of words. Each word is associated
with a score to indicate the happiness of this word.
        </p>
        <p>We use similar strategy with MaxS, using Happiness scores instead of using
SentiWordNet scores. Firstly, we get the maximum Happiness score among all
the tokens in tweet. Then we denote it as the polarity score for this tweet.
Finally we set two threshold to determine the polarity of the tweet. We denote
this approach as HappyS.
4.4</p>
      </sec>
      <sec id="sec-4-5">
        <title>Machine Learning Approach For Polarity</title>
        <p>Instead of SentiWordNet and Happiness score approaches for polarity, we also
developed a machine learning approach. Figure 2 shows the owchart of
Classier. We take all the labeled tweets from the 6 companies (6 * 300 = 1800 tweets)
and search for the Google AdWords and replace them with `xxxx', also replace
the company names with `aaaa'. Then split this set into 2 parts by English and
Spanish tweets. Build 2 separate classi ers one for English and other for Spanish.
To evaluated the performance of the Classi er Approach, we train the classi ers
using the training tweets from 5 companies and test on the remaining company
tweets in the training set. Repeat this 6 times such that it covers all possibilities.</p>
        <p>
          Especially, we use Weka[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to test the performance of classi er. We use
Bagging[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] classi er with SVM[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] kernel (SMO[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] in weka). We extract unigram,
bigram and 3-gram features for the classi er.
        </p>
        <sec id="sec-4-5-1">
          <title>2 translate.google.com/</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <sec id="sec-5-1">
        <title>Performance of Google AdWords Filter in Training Set</title>
        <p>We used precision, recall and F-score to evaluate the performance on the
training set. Especially, we use average F-score (avgF) to evaluate the overall
performance. We see that Filter 2 has better performance with not only higher
avgF, but also all F-scores for every company are higher except for Armani. The
reason is because, as expected, the Google Adwords keywords do not cover all
the company factors. For example, `ios 5' is a Adwords keyword for Apple Inc,
but `ios' is not. Filter 2, which favors more relaxed ltering covers more cases
resulting in better performance. Therefore, we use Filter 2 for our submitted
runs on the test set.</p>
        <p>MaxS SumS HappyS</p>
        <p>En Es En Es En Es
avgF 0.34 0.31 0.336 0.30 0.27 0.35
Positive threshold 0.62 1.26 0.34 4.08 23.4 27.3</p>
        <p>Negative threshold -0.3 -0.83 -2.59 -1.11 16.1 16.7
We submitted ve runs which has di erent combination of our `Ambiguity'
Approach and Polarity Approach. We denote `AllRel' as treat all the tweets relevant
for some companies. Table 9 shows the detail of the ve di erent runs
Run ID Description
Run1 Filter2 + MaxS
Run2 AllRel + MaxS
Run3 Filter2 + HappyS
Run4 AllRel + HappyS</p>
        <p>Run5 AllRel + Cla3</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Performance of Test Set</title>
      <p>
        We describe the performance of test set separately by Filtering (`Ambiguity')
and Polarity tasks. In Filtering task, our Run1 and Run 3 ranked in 7th and
8th place out of 33 runs, (5th of 9 teams) by F(R,S) score[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. R and S refers
Reliability and Sensitivity respectively. Because we treat all the tweets relevant
in Run2, Run4, and Run5. They have the same results with the baseline (all
relevant).
      </p>
      <p>Table 10 shows the top 10 results of Filtering task.</p>
      <p>Run ID F(R,S) R S Accuracy
replab2012 related Daedalus 2 0.263922126 0.243482396 0.432991032 0.722763591
replab2012 related Daedalus 3 0.253463929 0.235162463 0.422129397 0.702232013
replab2012 related Daedalus 1 0.250619268 0.23968238 0.403657787 0.718006364
replab2012 related CIRGDISCO 1 0.227595261 0.217923478 0.336440429 0.701870318
replab2012 pro ling kthgavagai 1 0.222829043 0.253419399 0.357636447 0.774061038
replab2012 pro ling OXY 2 0.196601614 0.234666227 0.272356458 0.809025193
replab2012 pro ling uiowa 1 (Run1) 0.177919294 0.181556704 0.292220139 0.679680848
replab2012 pro ling uiowa 3 (Run3) 0.177919294 0.181556704 0.292220139 0.679680848
replab2012 pro ling ilps 4 0.15730978 0.157010828 0.223508777 0.599100149
replab2012 pro ling ilps 3 0.155698416 0.155160491 0.25552382 0.657567983</p>
      <p>One of the reason the test set results is not as good as training set is because
some of the companies in test set do not have ambiguity. For example, if a
tweet has Google or Microsoft in it. It de nitely relevant with the company it
mentioned. The right thing to do is to determine if the company has ambiguity.
If not, we should treat all the tweets as relevant.</p>
      <p>In Polarity task, our Run2 ranked in 4th place out of 31 runs, (4th of 9 teams)
by F(R,S) score. The result shows using SentiWordNet approach is better than
Happniess score and Classi er approach.</p>
      <p>Table 11 shows the top 10 results of Polarity task and all of our other Runs.
8</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>In RepLab 2012, we explored using Google AdWords as a lter to determine the
ambiguity of tweets. We also developed several approaches like SentiWordNet,
Happiness Score, Classi er to determine the polarity of tweets. The results in test
set shown our approaches performed well. However, our research still has some
limitations. Google AdWord does provide great company related keywords, but
it is not free service. We didn't received approve from Google to use AdWord API
before submitting the result. So we manually downloaded the English and
Spanish keyword list searched by company name as query. The limitation of queries
Rank Run ID F(R,S) R S Accuracy
1 replab2012 polarity Daedalus 1 0.401818195 0.392370769 0.449091977 0.479550085
2 replab2012 pro ling uned 5 0.341946295 0.340229898 0.374731432 0.449501547
3 replab2012 pro ling BMedia 2 0.341946295 0.340229898 0.374731432 0.449501547
4 replab2012 pro ling uiowa 2 (Run2) 0.341946295 0.340229898 0.374731432 0.449501547
5 replab2012 pro ling uned 2 0.341946295 0.340229898 0.374731432 0.449501547
6 replab2012 pro ling uned 4 0.341946295 0.340229898 0.374731432 0.449501547
7 replab2012 pro ling BMedia 3 0.341946295 0.340229898 0.374731432 0.449501547
8 replab2012 pro ling OPTAH 1 0.341946295 0.340229898 0.374731432 0.449501547
9 replab2012 pro ling OPTAH 2 0.341946295 0.340229898 0.374731432 0.449501547
10 replab2012 pro ling BMedia 5 0.341946295 0.340229898 0.374731432 0.449501547
19 replab2012 pro ling uiowa 1 (Run1) 0.255176622 0.315109492 0.249941079 0.274533823
23 replab2012 pro ling uiowa 4 (Run4) 0.240957995 0.264677783 0.249820237 0.397726112
26 replab2012 pro ling uiowa 5 (Run5) 0.211165461 0.375737392 0.177001887 0.425064303
30 replab2012 pro ling uiowa 3 (Run3) 0.150727485 0.231986766 0.139879816 0.321687051
lead the limitation of the AdWord keywords. Then the performance of
`Ambiguity' Filter is limited. Another limitation is that SentiWordNet is a general list
for the sentiment of words, it is not optimized for determining the polarity of
companies. For example, `Expand' is a positive word for judging polarity. But it
is almost neural in SentiWordNet. Thus exploring Google AdWords API query
to get more company related keywords, and customizing a new polarity word
list based on SentiWordNet could be the future work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Asur</given-names>
            <surname>Sitaram</surname>
          </string-name>
          ,
          <article-title>Huberman Bernardo A. Predicting the Future with Social Media</article-title>
          .
          <source>March</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Bhattacharya</given-names>
            <surname>Sanmitra</surname>
          </string-name>
          , Tran Hung, Srinivasan Padmini and
          <string-name>
            <given-names>Suls</given-names>
            <surname>Jerry</surname>
          </string-name>
          .
          <article-title>Belief Surveillance with Twitter</article-title>
          .
          <source>In Proceedings of the Fourth ACM Web Science Conference (WebSci12)</source>
          , pages
          <fpage>5558</fpage>
          , Evanston, IL, USA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Esuli</given-names>
            <surname>Andrea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sebastiani</given-names>
            <surname>Fabrizio</surname>
          </string-name>
          .
          <article-title>Sentiwordnet: A Publicly Available Lexical Resource For Opinion Mining</article-title>
          .
          <source>In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC06</source>
          , pages
          <fpage>417422</fpage>
          ,
          <year>2006</year>
          .)
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Google</given-names>
            <surname>Adwords</surname>
          </string-name>
          <article-title>Keyword Tool</article-title>
          . URL http://support.google.com/adwords/bin/ answer.py?hl=enanswer=
          <fpage>147602</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dodds</surname>
            <given-names>P.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harris K.D.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Kloumann</surname>
            <given-names>I.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bliss</surname>
            <given-names>C.A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Danforth C.M. Temporal</surname>
          </string-name>
          <article-title>Patterns of Happiness and Information in A Global Social Network: Hedonometrics and Twitter</article-title>
          .
          <source>PLoS ONE</source>
          ,
          <volume>6</volume>
          (
          <issue>12</issue>
          ):
          <fpage>e26752</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hall</surname>
            <given-names>Mark</given-names>
          </string-name>
          , Frank Eibe, Holmes Geo rey, Pfahringer Bernhard, Reutemann Peter and
          <string-name>
            <surname>Witten Ian H. The Weka Data Mining Software</surname>
          </string-name>
          :
          <article-title>An Update</article-title>
          .
          <source>SIGKDD Explorations</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ),
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Breiman</surname>
            <given-names>L. Bagging</given-names>
          </string-name>
          <string-name>
            <surname>Predictors</surname>
          </string-name>
          .
          <source>Machine learning</source>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Vapnik</surname>
            <given-names>V.N.</given-names>
          </string-name>
          <article-title>The Nature of Statistical Learning Theory</article-title>
          . HeidelbergA, (
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Platt John C. Fast</surname>
          </string-name>
          <article-title>Training of Support Vector Machines Using Sequential Minimal Optimization</article-title>
          . In B. Schoelkopf and
          <string-name>
            <given-names>C.</given-names>
            <surname>Burges</surname>
          </string-name>
          and
          <string-name>
            <surname>A</surname>
          </string-name>
          . Smola, editors,
          <source>Advances in Kernel Methods - Support Vector Learning</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Amigo</surname>
            <given-names>Enrique</given-names>
          </string-name>
          , Gonzalo Julio and
          <string-name>
            <given-names>Verdejo</given-names>
            <surname>Felisa</surname>
          </string-name>
          .
          <article-title>Reliability and sensitivity: Generic Evaluation Measures For Document Organization Tasks</article-title>
          .
          <source>Technical report</source>
          , Departamento de Lenguajes y Sistemas Informaticos,
          <string-name>
            <surname>UNED</surname>
          </string-name>
          , Madrid, Spain,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>