<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LyS at CLEF RepLab 2014: Creating the State of the Art in Author Influence Ranking and Reputation Classification on Twitter</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David Vilares</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miguel Hermo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miguel A. Alonso</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos G´omez-Rodr´ıguez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jesu´s Vilares</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Grupo LyS, Departamento de Computaci ́on, Universidade da Corun ̃a Campus de A Corun ̃a s/n</institution>
          ,
          <addr-line>15071, A Corun ̃a, Espan ̃a</addr-line>
        </aff>
      </contrib-group>
      <fpage>1468</fpage>
      <lpage>1478</lpage>
      <abstract>
        <p>This paper describes our participation at RepLab 2014, a competitive evaluation for reputation monitoring on Twitter. The following tasks were addressed: (1) categorisation of tweets with respect to standard reputation dimensions and (2) characterisation of Twitter profiles, which includes: (2.1) identifying the type of those profiles, such as journalist or investor, and (2.2) ranking the authors according to their level of influence on this social network. We consider an approach based on the application of natural language processing techniques in order to take into account part-of-speech, syntactic and semantic information. However, each task is addressed independently, since they respond to different requirements. The official results confirm the competitiveness of our approaches, which achieve the 2nd place, tied in practice with the 1st place, at the author ranking task; and 3rd place at the reputation dimensions classification tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>Reputation Monitoring</kwd>
        <kwd>Author Ranking</kwd>
        <kwd>Twitter</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In recent years, Twitter has become a wide information network, where millions
of users share their views about products and services. This microblogging social
network is an important source of information for companies and organisations,
which aim to know what people think about their articles. In this way, identifying
how people relate aspects and traits such as performance, services or leadership
with their business, is a good starting point for monitoring the perception of
the public via sentiment analysis applications. In a similar line, companies are
interested in user profiling: identifying the profession, cultural level, age or the
level of influence of authors in an specific domain may have potential benefits
when making decisions with respect to advertisement policies, for example.</p>
      <p>
        The RepLab 2014 on Twitter [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] focusses on these challenges, providing
standard metrics and test collections where both academic and commercial systems
can be evaluated. The collections contain tweets written in English and
Spanish. Two main tasks were proposed: (1) categorisation of tweets with respect
to standard reputation dimensions and (2) characterisation of Twitter profiles.
The first task consisted of classifying tweets into the standard reputation
dimensions: products&amp;services, innovation, workplace, citizenship, governance,
leadership, performance and undefined. The characterization of Twitter profiles is
composed of two subtasks: (2.1) author categorisation and (2.2) author ranking. The
author categorisation task covers up to 7 user types: journalist, professional,
authority, activist, investor, company or celebrity. With respect to the author
ranking task, the goal is to detect influential and non-influential users,
ranking them according to this aspect (from the most to the least influential). Our
approaches achieve state-of-the-art results for the classification on reputation
dimensions and author ranking.
      </p>
      <p>The remainder of the paper is structured as follows. Section 2 describes the
main features of our methods. Sections 3, 4 and 5 show how we tackle the
proposed tasks, illustrating and discussing the official results. Finally, we present
our conclusions in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>System description</title>
      <p>The major part of our models rely on natural language processing (NLP)
approaches which include steps such as: preprocessing, part-of-speech (PoS) tagging
and parsing. The obtained syntactic trees act as a starting point for extracting
the features which feed the supervised classifier employed for tasks 1
(reputation dimensions classification) and 2.1 (author categorisation). We built different
models for each task and for each language considered in the evaluation
campaign. With respect to task 2.2 (author ranking), a simple but effective method
was used. Differences between tasks and languages are explained in the following
sections. We describe below the high level architecture of our NLP pipeline.
2.1</p>
      <sec id="sec-2-1">
        <title>NLP for online reputation</title>
        <p>
          Preprocessing We carry out an ad-hoc preprocessing to normalise some of the
most common features of the Twitter jargon, which may have an influence on
the perfomance of the tasks proposed at RepLab 2014:
– Replacement of URL’s: References to external links and resources are
replaced by the string ‘URL’.
– Hashtags: The use of hashtags may be helpful for classification tasks, since
they are often used to label tweets. In this way, we only delete the symbol
‘#’ in order to give to these elements the same treatment as words.
– Twitter usernames: In this social network, the usernames are preceded by
the symbol ‘@’. In order not to cause confusion at the tokenisation or tagging
steps, we delete that symbol, to then capitalise the first character and give
these elements the same treatment as actual proper names.
Part-of-speech tagging In order to be able to obtain the syntactic structure
of tweets, we first need to label each token of the message with its respective
part-of-speech tag. We used the Ancora [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and the Penn Treebank [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] corpora
to train the Spanish and the English taggers, respectively. The Spanish tagger
relies on the Brill tagger [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] implementation included with NLTK1, following the
configuration described at [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. With respect to English we used an averaged
perceptron discriminative sequence model [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] which presents state-of-the-art results
for the Penn Treebank. Specifically, we took the trained model provided with
the TextBlob2 framework. During the process of PoS tagging we also obtain the
lemma of each word.
        </p>
        <p>
          Dependency parsing Given a sentence S = w1...wn, where wi represents the
word at the position i in the sentence, a dependency parser returns a
dependency tree, a set of triplets (wi, arcij , wj ) where wi is the head term, wj is
the dependent and arcij repr{esents the depe}ndency type, which denotes the
syntactic function that relates the head and the dependent. In this way, the phrase
‘best performance’ could be represented syntactically as (performance, modifier,
best). We rely on MaltParser [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], a data-driven dependency parser generator, to
build our parsers. We used again the Ancora and the Penn Treebank corpora
to train the Spanish and the English parser, respectively. Our aim is to
employ dependency parsing to capture the non-local relations between words that
lexical-based approaches cannot handle properly.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Feature extraction</title>
        <p>
          Our classifiers are fed with three different types of features:
– N-grams : This type of features detect the presence of sequences of
contiguous words, where n is the number of concatenated terms. In this paper,
we consider both 1-grams and 2-grams (which make it possible to capture
some contextual information based on word proximity). Simple
normalisation techniques such as converting words to their lowercase form are applied.
In addition to n-grams of words, we also consider n-grams of lemmas3. The
aim is to reduce sparsity and training more accurate classifiers, specially
for Spanish language, where verbs, adjectives and nouns present gender and
number declensions.
– Psycometric properties: The LIWC [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] is a software that can be used to
identify psychometric word properties present in a text. Among other languages,
it provides dictionaries for both Spanish and English. We use those
dictionaries in this work to relate words with psychological features such as insight,
anger or happiness, but also with topics such as money, sports or religion.
1 http://www.nltk.org/
2 http://textblob.readthedocs.org/en/dev/
3 Lemmas are the canonical forms of words. For example, the lemma of ‘walking’ is
‘walk’
In this way, we match the words of a text, returning all their psychometric
dimensions.
– Generalised dependency triplets: In this paper, we apply an enriched
approach presented at [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] of the initial method described at [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Given a
dependency triplet of the form (wi, arcij , wj ) a generalised triplet has the form
(g(wi, x), d(arcij ), g(wj , x)), where g is a generalisation function and x the
desired type of generalisation, which can be: the word itself, its lemma, its
psychometric properties, its part-of-speech tag or none, if we decide to
completely delete the content of the token. On the other hand, the function d
can be defined to keep or remove the dependency type of a triplet. For
example, the triplet (performance, modifier, best) can be generalised as (optimism,
modifier, adjective) by applying the generalisation functions (g(performance,
psychometric properties), modifier, g(best, part-of-speech tag)). The goal is
to reduce the sparsity of standard dependency triplets, generalising concepts
and ideas in a homogeneous way.
        </p>
        <p>In all cases, we use the number of occurrences as the weighting factor for the
supervised classifier.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Classifier</title>
        <p>
          We use the WEKA [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] framework for building our classifiers. For each task, we
tuned the weights and the kernel of the classifier in order to maximise
performance, as detailed in the following sections.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Task 1: Reputation Dimensions Categorisation</title>
      <p>The task consisted on relating tweets with the standard reputation
dimensions proposed by the Reputation Institute and the RepTrak model4:
products&amp;services, innovation, workplace, citizenship, governance, leadership,
performance and undefined (if a tweet is no assigned to any of the other
dimensions).</p>
      <p>Dataset The RepLab 2014 corpus is composed of English and Spanish tweets
extracted from the RepLab 2013 corpus, which contained a collection of tweets
referring to up to 61 entities. The RepLab 2014 only takes into account those
who refer to banking or automotive entities, where each one is labelled with one
of the standard reputation dimensions. To create the collection the canonical
name of the entity was used as a query to retrieve the tweets which talk about
it. Thus, each tweet contains the name of an entity. In addition, the corpus
provides information about the author of each tweet, the content of external
links that appear in a message and a flag to know if the tweet is written in
English or Spanish.
4
http://www.reputationinstitute.com/about-reputation-institute/the-reptrakframework
Evaluation metrics This task is evaluated as a multi-class categorisation
problem. Thus, precision, recall and accuracy are the official metrics:
T P
P recision = T P +F P</p>
      <p>T P
Recall = T P +F N</p>
      <p>T P +T N
Accuracy = T P +T N+F P +F N
(1)
(2)
(3)
where TP and TN refer to the true positives and negatives and FP and FN
indicate the false positives and negatives, respectively. The organisers sorted the
official results by accuracy.</p>
      <p>
        Runs We sent two runs. For each run, we trained two different LibLinear
classifiers [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]: one for English and another one for Spanish language. We tuned
the weights for the majority classes (products, citizenship, undefined and
governance) using a value of 0.75, giving the less frequent categories a weight of
1. In both cases, our approaches only handle the content of a tweet, discarding
the user information and the content of the external links. In the latter case, we
think processing the content of the web pages referred to in a tweet may
excessively increase the cost of analysing a tweet. In addition, we believe the tweet
reputation dimensions are not necessarily to be related with the content of the
link, where probably many concepts and ideas appear. The results presented
below these lines seem to confirm our hypothesis since we ranked 3rd, very close
to the best-performing system. More specifically, our contributions were:
– Run 1 : The English model took as features: unigrams of lemmas, bigrams
of lemmas, and word psychometric properties. With respect to the Spanish
classifier, the experimental setup showed that the best-performing model
over Spanish messages was composed of: unigrams of lemmas, bigrams of
lemmas and generalised triplets of the form ( , dependency type, lemma),
i.e., dependency triplets where the head is omitted. In both cases, we tried
to obtain the best sets of features via greedy search on the training corpus
and a 5-fold cross-validation.
– Run 2 : This model uses the same classifier and the same sets of features
as run 1, but excluding those which include the name of any of the entities
used to create the train corpus. Our main aim was protecting our model
from a possible bias on the training corpus. We observed that many tweets
belonging to certain entities were labelled mainly only into a single
reputation dimension. We were concerned that this fact could create an overfitted
model which would not work properly on the test set. In this respect, this
run also allowed us to measure the impact on performance of using the name
of entities on the test set.
Results Table 3 shows the ranking of the systems for the reputation dimension
task, based on their accuracy. The baseline of the RepLab organisation is a naive
bag-of-words approach trained on a Support Vector Machine (SVM). Our run 1
ranked 3rd, confirming the effectiveness of our perspective. The second run also
worked acceptably, although performance dropped by almost two percentage
points. This confirms a slight bias on the test set, since it contains tweets that
refer to the same entities as the training set and they were collected in the same
interval of time. Table 3 show the detailed performance for our best run. Our
model obtains both an acceptable recall and precision for the most prevalent
classes, but the same is not true for minority classes, due to the small number
of samples in the training set. The majority of the participants exhibited this
same weakness.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Task 2.1: Author categorisation</title>
      <p>The goal of the task was to assign Twitter profiles to one of these categories:
journalist, professional, authority, activist, investor, company and celebrity. An
additional class undecidable was proposed to place all those users that did not
match any of the proposed categories.</p>
      <p>Dataset The training and the test set are composed of the authors who wrote
the automotive and banking tweets that we mentioned previously. In addition to
user information, the organizers included the identifiers of the last 600 tweets of
each user at the moment of the creation of the corpus. Due to the lack of time,
we decided to download only 100 tweets for each author. In order to obtain
these tweets faster, we used the capabilities of the Twitter API to download the
timeline of an author instead of downloading the tweets one by one. However,
that API method only allows the user to obtain the 3 200 most recent tweets
of each author, so we were unable to find the tweets included in the corpus for
many of them (the most active ones). More specifically, we could retrieve no
tweets for around 1 000 authors.</p>
      <p>
        Evaluation metrics The official results are the average accuracy between the
categories corresponding to automotive and banking. Only the authors
categorised as influential in the gold standard of task 2.2 are taken into account.
Runs This task is addressed as follows: given a set of tweets for an author,
they are collected into a single file, which is used to finally classify the user
according to the proposed categories. Since many of the categories in the training
corpus only contained a few authors, we discarded those classes in order to avoid
confusing machine learning algorithm. We trained two classifiers, one for each
language. After, testing different Support Vector Machine implementations, we
obtained the best performance on the training set (5-fold cross-validation) using
an SMO [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>Run</p>
      <p>Accuracy</p>
      <p>To identify which authors are Spanish and which ones are English, for each
author we counted the number of his last 600 tweets included at the corpus that
are written in each language, assigning the author to the most frequent one. This
information is provided by the RepLab 2014 organisation, without any need to
download the tweets. More specifically, as we did in task 1, we sent two runs:
– Run 1 : Both the Spanish and the English models use unigrams of lemmas and
psychometric properties as features. We selected these features via greedy
search on our processed training corpus (where all tweets of a user are merged
into a single file). Since we did not have any tweet for many authors, we
trained a back-off machine learner: a bag-of-words classifier which categorises
these authors according to their profile description.
– Run 2 : The only difference with respect to run 1 is the back-off classifier.</p>
      <p>Authors for which we have not downloaded any tweet are always assigned
to the majority class in the training corpus: undecidable.</p>
      <p>Results Table 4 shows the performance of the systems participating in this
task. We think that our poor performance is due to the small site of the training
corpus that we were able to collect and process. The baseline proposed by the
RepLab organisers reinforces our hypothesis, since they used an SVM approach
based on a bag-of-words. They also included another baseline which assigns all
authors to the majority class in the training corpus.</p>
    </sec>
    <sec id="sec-5">
      <title>Task 2.2: Author ranking</title>
      <p>The task focusses on classifying authors a influential and non-influential, as well
as ranking them according to that level of influence.</p>
      <p>Dataset It is the same that the employed at task 2.1: Author Categorisation.
The proportion in the training set is about 30% of influential users, with the
remaining 70% being non-influential.</p>
      <p>Evaluation metrics The organisers address the problem as a traditional
ranking information problem using the Mean Average Precision (MAP) as standard
metric. The experimental results are ordered according to the average of
automotive and banking MAP measures.</p>
      <p>Runs Classification of influential and non-influential users is made via a
LibLinear classifier, following a machine learning perspective. To rank the authors
we take as the starting point the confidence factor reported by the classifier for
each sample. The confidence is then used to rank the users according to their
level of influence. A higher confidence should indicate a higher influence. With
respect to non-influential users, we firstly negate that factor, obtaining in this
way lower values for the least influential authors. We again sent two models to
evaluate this task, although in this case the runs present significant differences:
– Run 1 : A bag-of-words model which takes each word of the Twitter profile
descriptions to feed the supervised classifier. The weights of the classes were
tuned taking 1.8 and 1.3 for influential an non-influential users, respectively.
Since the corpus is domain-dependent (automotive and banking tweets) we
hypothesise that the brief biography of the user may be an acceptable
indicator of influence. We observed that words such as ‘car’, ‘business’ or magazine
were some of the most relevant tokens in terms of information gain.
– Run 2 : This run follows a meta-information perspective, taking the
information provided by the Twitter API for any user. More specifically, we used
binary features such as: URL in the Twitter profile, verified account, profile
user background image, default profile, geo enabled, default profile image,
notifications, is translation enabled and contributors enabled. In addition the
following numeric features are taken into account: listed count, favourites
count, followers count, statuses count, friends count and following.
Results Table 5 illustrates the official results for this task. The baseline of the
RepLab organisers ranks the authors by their number of followers. Our run 1
achieved the 2nd place, tied in practice with the 1st place, reinforcing the validity
of the proposal for a specific domain. On the other hand, our second run did not
work as expected, although it outperformed the baseline.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>This paper describes the participation of the LyS research group at RepLab
2014. We sent runs for all tasks proposed. The classification for the reputation
dimensions task is addressed from a NLP perspective, including preprocessing,
part-of-speech tagging and dependency parsing. We use the output obtained
by our NLP pipeline for extracting lexical, psychometric and syntactic-based
features, which are used to feed a supervised classifier. We ranked 3rd, very close
to the best performing system, confirming the effectiveness of the approach.</p>
      <p>The author categorisation task is addressed from the same perspective.
However, we could not properly exploit the approach due to problems to obtain much
of the content of the training corpus.</p>
      <p>On the other hand, the author ranking challenge was addressed from a
different perspective. We obtained the second best-performing system, tied in practice
with the 1st place, by training a bag-of-words classifier which takes the
Twitter profile description of the users as features. This model clearly outperformed
our second run based on metadata such as the number of favourited tweets or
followers.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>Research reported in this paper has been partially funded by Ministerio de
Econom´ıa y Competitividad and FEDER (Grant TIN2010-18552-C03-02) and
by Xunta de Galicia (Grant CN2012/008).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. E. Amig´o,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de-Albornoz</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Chugur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Corujo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          , E. Meij,
          <string-name>
            <surname>M. de Rijke</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , Overview of RepLab 2014:
          <article-title>author profiling and reputation dimensions for Online Reputation Management</article-title>
          ,
          <source>in: Proceedings of the Fifth International Conference of the CLEF initiative</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. M. Taul´e,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Mart</surname>
          </string-name>
          <article-title>´ı, M. Recasens, AnCora: Multilevel Annotated Corpora for Catalan and Spanish</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odjik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Piperidis</surname>
          </string-name>
          , D. Tapias (Eds.),
          <source>Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)</source>
          , Marrakech, Morocco,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Marcus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Marcinkiewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Santorini</surname>
          </string-name>
          ,
          <article-title>Building a large annotated corpus of English: The Penn treebank</article-title>
          ,
          <source>Computational linguistics 19 (2)</source>
          (
          <year>1993</year>
          )
          <fpage>313</fpage>
          -
          <lpage>330</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>E.</given-names>
            <surname>Brill</surname>
          </string-name>
          ,
          <article-title>A simple rule-based part of speech tagger</article-title>
          ,
          <source>in: Proceedings of the workshop on Speech and Natural Language</source>
          , HLT'91,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, Stroudsburg, PA, USA,
          <year>1992</year>
          , pp.
          <fpage>112</fpage>
          -
          <lpage>116</lpage>
          . doi:
          <volume>10</volume>
          .3115/1075527.1075553.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Vilares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Alonso</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>G´omez-Rodr´ıguez, A syntactic approach for opinion mining on Spanish reviews, Natural Language Engineering</article-title>
          . Available on CJO2013. doi:
          <volume>10</volume>
          .1017/S1351324913000181.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. M. Collins,
          <article-title>Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms</article-title>
          ,
          <source>in: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing -</source>
          Volume
          <volume>10</volume>
          , EMNLP '02,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, Stroudsburg, PA, USA,
          <year>2002</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          .3115/1118693.1118694. URL http://dx.doi.org/10.3115/1118693.1118694
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>J.</given-names>
            <surname>Nivre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nilsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chanev</surname>
          </string-name>
          , G. Eryigit, S. Ku¨bler,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marinov</surname>
          </string-name>
          , E. Marsi,
          <article-title>Maltparser: A language-independent system for data-driven dependency parsing</article-title>
          .,
          <source>Natural Language Engineering</source>
          <volume>13</volume>
          (
          <issue>2</issue>
          ) (
          <year>2007</year>
          )
          <fpage>95</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennebaker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Francis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Booth</surname>
          </string-name>
          ,
          <article-title>Linguistic inquiry</article-title>
          and word count:
          <source>LIWC</source>
          <year>2001</year>
          , Mahway: Lawrence Erlbaum Associates (
          <year>2001</year>
          )
          <fpage>71</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>D.</given-names>
            <surname>Vilares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Alonso</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>G´omez-Rodr´ıguez, On the usefulness of lexical and syntactic processing in polarity classification of twitter messages</article-title>
          ,
          <source>Journal of the Association for Information Science Science</source>
          and Technology to appear.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>M. Joshi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Penstein-Ros´e, Generalizing dependency features for opinion mining</article-title>
          ,
          <source>in: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACLShort '09</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, Suntec, Singapore,
          <year>2009</year>
          , pp.
          <fpage>313</fpage>
          -
          <lpage>316</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>M. Hall</surname>
            , E. Frank,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Pfahringer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Reutemann</surname>
            ,
            <given-names>I. H.</given-names>
          </string-name>
          <string-name>
            <surname>Witten</surname>
          </string-name>
          ,
          <article-title>The weka data mining software: an update</article-title>
          ,
          <source>SIGKDD Explorations 11 (1)</source>
          (
          <year>2009</year>
          )
          <fpage>10</fpage>
          -
          <lpage>18</lpage>
          . doi:
          <volume>10</volume>
          .1145/1656274.1656278. URL http://doi.acm.
          <source>org/10</source>
          .1145/1656274.1656278
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>R.-E. Fan</surname>
            ,
            <given-names>K.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-J. Hsieh</surname>
            ,
            <given-names>X.-R.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-J. Lin</surname>
            ,
            <given-names>LIBLINEAR:</given-names>
          </string-name>
          <article-title>A library for large linear classification</article-title>
          ,
          <source>The Journal of Machine Learning Research</source>
          <volume>9</volume>
          (
          <year>2008</year>
          )
          <fpage>1871</fpage>
          -
          <lpage>1874</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>J. C. Platt</surname>
          </string-name>
          , Advances in kernel methods, MIT Press, Cambridge, MA, USA,
          <year>1999</year>
          , Ch.
          <article-title>Fast training of support vector machines using sequential minimal optimization</article-title>
          , pp.
          <fpage>185</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>