<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the 1st Classi cation of Spanish Election Tweets Task at IberEval 2017</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maite Gimenez</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomas Baviera</string-name>
          <email>tomas.baviera@campusviu.es</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>German Llorca</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jose Gamir</string-name>
          <email>jose.gamirg@uv.es</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dafne Calvo</string-name>
          <email>dafne.calvo@uva.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Rosso</string-name>
          <email>prossog@dsic.upv.es</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francisco Rangel</string-name>
          <email>francisco.rangel@autoritas.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Autoritas Consulting</institution>
          ,
          <addr-line>S.A.</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Media ows Research Group, Universidad de Valladolid</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Media ows Research Group, Universitat de Valencia</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Media ows Research Group, Valencian International University</institution>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Pattern Recognition and Human Language Technology (PRHLT) Research Center, Universitat Politecnica de Valencia</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>This paper summarises the COSET shared task organised as part of the IberEval workshop. The aim of this task is to classify the topic discussed in a tweet into one of ve topics related to the Spanish 2015 electoral cycle. A new dataset was curated for this task and hand-labelled by experts on the task. Moreover, the results of the 17 participants of the task and a review of their proposed systems are presented. In a second phase evaluation, we provided the participants with 15.8 millions tweets in order to test the scalability of their systems.</p>
      </abstract>
      <kwd-group>
        <kwd>Topic Classi cation</kwd>
        <kwd>Twitter</kwd>
        <kwd>Elections</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Nowadays, politics has upended by the usage of social media. A political
campaign cannot be strategised using only the traditional media. During the election
cycle, both politicians and voters engage in conversations about di erent topics.
Politicians and their campaign sta share their policy approaches and bits of the
candidates' personal lives. Characterising the in uence processes in the public
space is one of the most interesting topics in political communication research.
Political parties, media and citizens send messages through a complicated media
network, where knowing who has the power of agenda setting becomes critical. In
this sense, the social media logic has boosted a more active user participation in
delivering political messages, accessing more sources, and mobilising for political
action. The analysis of this complex media network requires innovative research
tools capable of evaluating the di erent elements in the political information
ow [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>To create a shared framework, we have proposed a shared task: the Classi
cation of Spanish Election Tweets (COSET) task, which tackles the problem of
topic classi cation of political tweets in ve categories.</p>
      <p>
        The political background of the COSET project was one of the most
uncertain electoral contests in the Spain's recent political history: the December
20, 2015 General Elections. The European Elections of the previous year had
consolidated two new parties in the national political landscape. Both sought
to challenge the bipartisanship entrenched in Spanish democracy. For the 2015
General Elections, the campaign uncertainty, as well as the increased number
of candidates with possibilities of success, made the citizenry more interested
in the campaign than ever in recent history. The traditional media, particularly
TV, and social media widely covered politics during the weeks prior to Election
Day [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>The remainder of this paper is organised as follows. Section 2 illustrates the
state of the art on the topic. Following, Section 3 describes the corpus and the
process for collecting the tweets from the political conversations on Twitter
related to the 2015 Spanish General Elections, as well as the evaluation framework
proposed for evaluating the participants' models. Section 4 summarises the
proposed approaches submitted by the participants, and the results achieved by
the models evaluated are discussed. Finally, in Section 5 the conclusions are
presented.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>The following sections describe the work related to topic classi cation as well as
the work of Natural Language Processing (NLP) in political campaigns.
2.1</p>
      <sec id="sec-2-1">
        <title>Topic Classi cation Using Natural Language Processing</title>
        <p>
          Topic classi cation is one of the classical problems of NLP. In the literature, we
nd that this task has been tackled following a wide variety of approaches.6 The
task at hand has been studied in depth because it can be used as a rst step for
extracting relevant information from a text [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. The work of Hillard et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ],
depicts an example of how automatic classi cation systems can assist human
annotators in labelling the topic discussed in a document. In a structured text, the
state of the art has achieved satisfactory results in most domains. However, this
task can be challenging when dealing with the short texts with many
grammatical mistakes found on social media [
          <xref ref-type="bibr" rid="ref21 ref36 ref6">21, 36, 6</xref>
          ]. Furthermore, recently social media
has been used extensively during the elections, which has aroused the interest of
researchers working both on computational linguistics and social science studies
[
          <xref ref-type="bibr" rid="ref12 ref20 ref42">20, 12, 42</xref>
          ].
        </p>
        <p>
          Content classi cation of tweets in political research has been addressed mainly
on lexicon-based methods. A previous issue selected from the campaign provides
6 For more information, please review the survey that can be found in the following
reference [1, chap. 6]
the list of topics per which tweets will be classi ed [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. This method has also
been used for identifying political in uencers [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Other classi cations use
methods based on network graphs for uncovering word patterns [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ]. Moreover, these
works have explored the impact of di erent machine learning algorithms in order
to predict the output of the elections (e.g. Support Vector Machines (SVMs) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ],
Linear Discriminant Analysis (LDA) [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ], etc.) Likewise, some works linked the
output of the election with the sentiments expressed on Twitter [
          <xref ref-type="bibr" rid="ref38 ref39 ref41">39, 41, 38</xref>
          ].
        </p>
        <p>The utility of these methodologies relies on the set of words that distinguish
among the topics, such as economy or national security. Nevertheless, these
methods miss critical issues within the political conversation as they usually
focus on sectorial policies. To address the broader spectrum of political topics
discussed on Twitter, researchers need to develop more re ned
machine-learningbased methods able to detect more abstract topics.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Topic Labelling in Political campaigns</title>
        <p>
          To label the data set that we have collected, we followed the topic classi cation
proposed by Mazzoleni [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], as this is the baseline for the content analysis carried
out by the entire Media ows research project. Patterson [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] distinguishes among
four kinds of basic issues present in the media during the campaign. Mazzoleni
[
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] assumes this taxonomy in his studies on mediatised politics.
        </p>
        <p>
          According to Patterson [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], the media's messages during the campaign fall
into four categories based on their political content7: (i) political issues, dealing
with the most abstract aspects of electoral confrontation; (ii) policy issues,
dealing with sectorial policies; (iii) personal issues, regarding the candidates' lives
and pastimes and; (iv) campaign issues, dealing with the evolution of the
campaign. Although we had set some ltering criteria in the process of extraction,
we may have collected some tweets unrelated to the Spanish Elections or the
political campaign. Thus, we decided to introduce a fth category (v) other issues
for this kind of content.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation Framework</title>
      <p>This section de nes the task at hand, outlines the construction of the corpus
highlighting the annotation process details, and describes the performance metric
used to evaluate the participants' models.
3.1</p>
      <sec id="sec-3-1">
        <title>Corpus: Tweet Collection and Annotation</title>
        <p>In order to carry out this task, we gathered a collection of tweets from November
2, 2015, to December 21, 2015. Of these 50 days, 32 correspond with the
precampaign, 15 with the electoral campaign, one with re ection day, one with
Election Day, and one more with the following day. This last day is useful because
7 http://mediaflows.es/coset/
the conversations after knowing the results on Election Day ended at midnight.
The tweets were obtained through the Twitter API. The data mining and the
pre-processing of tweets were conducted using Python.</p>
        <p>We established three criteria for ltering tweets: a pair of general terms
related to the elections (#20D; 20-D ); the names of the four major political parties
along with their Twitter handles (PP; PPopular; PSOE; @PSOE;
ahorapodemos; Ciudadanos; CiudadanosCs; Cs); and the names of the four prime
minister candidates along with their Twitter handles (Rajoy; @marianorajoy;
Pedro Sanchez; Pedro Snchez, @sanchezcastejon; Pablo Iglesias; @Pablo Iglesias ;
Rivera; Albert Rivera). It was impossible to include the name of the political
party Podemos as a lter element. This word works poorly in constructing a
corpus through a selective extraction process because, given that it means we
can, it can be used in many contexts other than political conversations. We also
ltered out messages written in languages other than Spanish.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Task de nition</title>
        <p>As we establish in the Introduction, currently, political campaigns monitor
political conversations on Twitter, particularly when an electoral cycle is approaching.
This task is usually carried out in a semi-automatic fashion. The focus of the
proposed task COSET is on improving this process. Therefore, participants were
asked to classify tweets written in Spanish based on the political topic discussed.
As mentioned in Section 2.2, we considered ve categories:
1. Political Issues (PI): Tweets related to the most abstract elements of electoral
confrontation.
2. Policy Issues (PoI): Tweets about sectorial policies.
3. Campaign Issues (CI): Tweets related to the evolution of the campaign.
4. Personal Issues(PeI): The candidates' personal lives and pastimes.
5. Other Issues (O): The tweets that did not t in any of the previous categories.</p>
        <p>Summarising, the objective of the task is when supplied with a tweet, the
system proposed should be able to predict the tweet's topic automatically.</p>
        <p>Participants were provided with password-protected labelled data sets for
training and developing their systems. Later, their systems were evaluated against
a test data set. Table 1 presents the distribution of tweets for each topic and
data set, and Figure 1 shows the distribution of the topics over the whole dataset
(including the training, testing, and developing partitions)
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Performance measures</title>
        <p>Given that the corpora were heavily unbalanced, as we have illustrated in the
previous section, we proposed ranking the participants' models using the macro
F1-score. The F-score can be interpreted as a weighted average of the precision</p>
        <p>Training Development</p>
        <p>
          Testing
PI 530 (23.64 %) 57 (22.8 %) 151 (24.2 %)
PoI 786 (35.06 %) 88 (35.2 %) 228 (36.54 %)
CI 511 (22.79 %) 71 (28 %) 136 (21.79%)
PeI 152 ( 6.78 %) 9 ( 4 %) 38 (6.09%)
O 263 (11.73 %) 25 (10 %) 71 ( 11.38%)
Total
2242
250
624
and the recall, whereas the F1-score is the harmonic mean of the precision and
recall metrics as seen in Formula 1.
where jLj is the number of samples, y^l is the true label for the sample l, and
yl is the predicted label for the sample l [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ].
        </p>
        <p>Facing multi-class tasks, we also need to take into account the weighted
average of the F1-score of each class. Since we wanted to penalise those systems
that have bias towards the most populated classes, we have used the macro
average, which calculates the unweighted mean for each label as described in
Formula 2.
Hereafter, we present a summary of the proposed models as well as the results
that each model achieved. We should note that, each participant was allowed
to submit up to ve di erent proposals in order to allow them to test di erent
approximations. In total, 17 teams participated in the task, and a total of 39
models were submitted.</p>
        <p>
          Pre-process Most of the participants did not pre-process the tweets from
the data sets and worked with the raw data. However, the techniques used for
those who did pre-process the data sets were: tokenisation (carried out by teams
LuSer[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], Carl Os Duty [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], UC3M [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], and ivsanro1 [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]) conversion to
lowercase (teams LuSer [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], UC3M [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], and Electa[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]), and removal of several
tokens such as user handles (teams LuSer [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], ELiRF-UPV [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], and slovak [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]),
numbers (teams ELiRF-UPV [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], and slovak [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]), punctuation marks (teams
Electa[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], slovak [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], and ivsanro1 [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]), URLs (teams Electa[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], slovak [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ],
and ivsanro1 [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]), stopwords (teams Electa[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], UC3M [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], and slovak [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]),
ooding characters (team slovak [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], and UC3M [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]), and emoticons (team
Electa[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]).
        </p>
        <p>
          Features The features used to train the participants' classi ers were
diverse. Participants' models used some classical features in NLP such as word
n-grams (teams LuSer [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], LTRC IIITH [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], ConradCR [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], Electa [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], Team
17 [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ], Carl Os Duty [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], Citripio [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], LichtenwalterOlsan [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], slovak [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ],
Puigcerver [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ], and ivsanro1 [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]), character n-grams (team LTRC IIITH [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]),
Tf-Idf (teams CD team [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ], Carl Os Duty [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], LichtenwalterOlsan [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], and
Puigcerver [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]); but some of them used more recent techniques such as word
embeddings (teams LTRC IIITH [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], ELiRF [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], atoppe [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], UC3M [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], and
M Val [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]), sentence embeddings (Team 17 [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ]), and a multi-dimensional vector
approach (team UT text miners [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]). Moreover, the work of LTRC IIITH [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]
used an extensive set of handcrafted features that included top tokens, hashtags,
hashtag decomposition, mentions, and URLs among others.
        </p>
        <p>
          Classi cation approaches The most used model for addressing the task
was a model based on Neural Networks (NNs) (teams LTRC IIITH [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], ELiRF
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], Team 17 [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ], and UT text miners [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]); LuSer [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] added normalisation
techniques such as Gaussian Noise to the NNs architecture, and Carl Os Duty
[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] included batch normalisation with dropout in their NN model. In
addition, other approaches were also considered such as Support Vector Machines
(teams LTRC IIITH [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], M Val [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], and Citripio [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]), Random Forests (teams
LTRC IIITH [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], ConradCR [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], and Electa [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]), Nave Bayes (teams slovak
[
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] and ivsanro1 [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]), Logistic Regression (team Puigcerver [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]); CD team
[
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] proposed a combination of classi ers that included a Logistic Regression, an
SVM, Naive Bayes, and a K-Nearest Neighbours classi er. Deep learning models
were also considered in the work of team atoppe [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]; they experimented with
Convolutional Neural Networks, Long Short Term Memory Networks (LSTMs),
Bidirectional Long Short-Term Memory Networks, etc. Also, team UC3M [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]
addressed this task using LSTMs, and Gated Recurrent Units. Furthermore,
team 17 [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ] trained ve di erent language models for each topic and then
classi ed each tweet minimising the perplexity of language models.
4.1
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Evaluation and Discussion of the Submitted Approaches</title>
        <p>
          First, we have developed three baselines to meet di erent di culty levels. The
rst baseline is the simplest one, and it will always predict the most common
class Policy Issues (PI). The second is a traditional machine learning approach
that uses a Bag of Words (BOW) and an SVM with a linear kernel. Finally, the
last baseline proposed applies a slightly better representation of words following
a term frequency{inverse document frequency (Tf-idf) [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and Random Forests
(RF) for classifying the training samples. None of these baselines has its
hyperparameters adjusted to t the task, and they were developed using the Scikit-learn
package [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. The results of all the participants' models are presented in Table
2.
        </p>
        <p>
          Overall, this is a complicated task since several topics are similar and,
therefore, share parts of the vocabulary. Only the rst ten systems are able to achieve
an F1 macro over 0.6. The best result was obtained by ELiRF-UPV [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], who
used NNs and word embeddings to train their systems, but also included a
technique for handling the imbalance present in the data. Also, LuSer [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] applied
NNs, but in this case, they used 3-grams as features and included Gaussian
Noise, which is reported to help to minimize the e ect of over tting in NNs. It
is worth noting that some systems were unable to improve the results achieved
by some of the baseline systems.
        </p>
        <p>
          We have studied the confusion matrix of the three best-performing systems,
the rst and fourth runs from the ELiRF-UPV [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] team and the run from the
team LuSer [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], which corresponds with Figures 2, 3, and 4 respectively. It can
be observed that the predictions made for the topics PI, PoI, and CI present
certain confusion between them. Remarkably, PoI is the easiest topic to classify.
In contrast, the topic PeI is the most challenging.
We have o ered the participants the opportunity to test the scalability of their
approaches with a bigger dataset of 15.8 millions tweets. Being practically
impossible to manually label such a large corpus, we have built a silver standard
with pooling techniques [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]. Four were the teams who submitted their runs. The
best performing team [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] submitted two runs and the other teams [
          <xref ref-type="bibr" rid="ref19 ref2 ref40">2, 40, 19</xref>
          ]
submitted one run each. We have prepared a pool formed by these ve runs and
labelled the corpus with the agreement of at least four runs (80% of agreement).
The corpus size before and after labelling, besides the distribution of labels, is
shown in Table 3. As can be seen, the labelled corpus with the agreement of
three runs comprises 65.91% of the original corpus.
        </p>
        <p>
          In Table 4, results for the second phase are shown. As can be seen, the best
performing team also obtains the highest F1 value. On the contrary, Team 17
has increased its performance due to the use of fastText in this second phase
evaluation.
This paper summarises the rst edition of the task COSET on topic classi cation
during the 2015 electoral cycle. COSET was one of the tasks from the IberEval
workshop, which was part of the annual Conference held by the Spanish Society
for Natural Language Processing (SEPLN in Spanish). Given a set of tweets,
participants were asked to classify the topic discussed in them from a list of ve
topics that included: political issues, policy issues, campaign issues, personal
issues, and other issues. Seventeen participants performed the task, and the best
result was achieved by ELiRF-UPV [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] who scored 0.6482 in the F1 macro.
They applied NNs, word embeddings, and handled the imbalance present in the
data. The results achieved by the participants con rm that topic classi cation
from tweets is a di cult task, particularly when the topics are similar. Hence, a
shared task for evaluating di erent systems, like the ones proposed in this task,
can help improve the results of automatic classi cation or at least assist human
labelling. This has been the aim of the second phase evaluation.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work was conducted under the auspices of the CSO2016-77331-C2-1-R
research project \Strategies, agendas and discourses in the electoral
cybercampaigns: media and citizens" (Media ows), funded by the Spainish Ministry of
Economy, Industry and Competitiveness (MINECO in Spanish), and under the
the auspices of the TIN2015-71147-C2-1-P research project \SOcial Media
language understanding-EMBEDing contexts" (SomEMBED), funded by MINECO.
The work of the rst author is nanced by Grant PAID-01-2461 2015, from the
Universitat Politecnica de Valencia.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aggarwal</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Mining text data</article-title>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ambrosini</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nicolo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Comparative study of neural models for the COSET shared task at IberEval 2017</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ).
          <source>CEUR Workshop Proceedings. CEUR-WS.org, Murcia (Spain) (September 19</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bernath</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Submission to the 1st classi cation of spanish election tweets task at ibereval 2017</article-title>
          . http://mediaflows.es/coset/. Team: ConradCR. Spain
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Cebrian</given-names>
            <surname>Chulia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferrer Sanchez</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Classi cation Of Spanish Election Tweets (COSET) with neural networks</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ).
          <source>CEUR Workshop Proceedings. CEUR-WS.org, Murcia (Spain) (September 19</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chadwick</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The hybrid media system: Politics and power</article-title>
          . Oxford University Press (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chua</surname>
          </string-name>
          , T.s.,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.:</given-names>
          </string-name>
          <article-title>A semisupervised bayesian network model for microblog topic classi cation</article-title>
          .
          <source>In: Coling</source>
          . pp.
          <volume>561</volume>
          {
          <issue>576</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Conover</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goncalves</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ratkiewicz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flammini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menczer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Predicting the political alignment of twitter users</article-title>
          .
          <source>In: Privacy, Security, Risk and Trust (PASSAT)</source>
          and
          <source>2011 IEEE Third Inernational Conference on Social Computing (SocialCom)</source>
          ,
          <source>2011 IEEE Third International Conference on</source>
          . pp.
          <volume>192</volume>
          {
          <fpage>199</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Conway</surname>
            ,
            <given-names>B.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kenski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The rise of twitter in the political campaign: Searching for intermedia agenda-setting e ects in the presidential primary</article-title>
          .
          <source>Journal of Computer-Mediated Communication</source>
          <volume>20</volume>
          (
          <issue>4</issue>
          ),
          <volume>363</volume>
          {
          <fpage>380</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Diez</given-names>
            <surname>Alba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Vieco</surname>
          </string-name>
          <string-name>
            <surname>Perez</surname>
          </string-name>
          , J.:
          <source>IberEval</source>
          <year>2017</year>
          ,
          <article-title>COSET task: a basic approach</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ).
          <source>CEUR Workshop Proceedings. CEUR-WS.org, Murcia (Spain) (September 19</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dubois</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ga ney</surname>
          </string-name>
          , D.:
          <article-title>The multiple facets of in uence: identifying political inuentials and opinion leaders on twitter</article-title>
          .
          <source>American Behavioral Scientist</source>
          <volume>58</volume>
          (
          <issue>10</issue>
          ),
          <volume>1260</volume>
          {
          <fpage>1277</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Fernandez</given-names>
            <surname>Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Segura</surname>
          </string-name>
          <string-name>
            <surname>Bedmar</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>Submission to the 1st classi cation of spanish election tweets task at ibereval 2017</article-title>
          . http://mediaflows.es/coset/.
          <source>Team: UC3M</source>
          . Spain
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Gayo</given-names>
            <surname>Avello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Metaxas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.T.</given-names>
            ,
            <surname>Mustafaraj</surname>
          </string-name>
          , E.:
          <article-title>Limits of electoral predictions using twitter</article-title>
          .
          <source>In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media. Association for the Advancement of Arti cial Intelligence</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Gharavi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bijari</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Short text classi cation using deep representation: A case study of Spanish tweets in COSET Shared Task</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ).
          <source>CEUR Workshop Proceedings. CEUR-WS.org, Murcia (Spain) (September 19</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Gonzalez</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pla</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hurtado</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          :
          <string-name>
            <surname>ELiRF-UPV at</surname>
          </string-name>
          IberEval 2017:
          <article-title>Classi cation Of Spanish Election Tweets (COSET)</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ).
          <source>CEUR Workshop Proceedings. CEUR-WS.org, Murcia (Spain) (September 19</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Hillard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Purpura</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilkerson</surname>
          </string-name>
          , J.:
          <article-title>Computer-assisted topic classi cation for mixed-methods social science research</article-title>
          .
          <source>Journal of Information Technology &amp; Politics</source>
          <volume>4</volume>
          (
          <issue>4</issue>
          ),
          <volume>31</volume>
          {
          <fpage>46</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Juarez</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peralta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Submission to the 1st classi cation of spanish election tweets task at ibereval 2017</article-title>
          . http://mediaflows.es/coset/. Team: Electa. Spain
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          :
          <article-title>Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Kao</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poteet</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          :
          <source>Natural language processing and text mining</source>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Khandelwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swami</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhtar</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shrivastava</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Classi cation Of Spanish Election Tweets (COSET) 2017: Classifying Tweets using Character and Word Level Features</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ).
          <source>CEUR Workshop Proceedings. CEUR-WS.org, Murcia (Spain) (September 19</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Larsson</surname>
            ,
            <given-names>A.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moe</surname>
          </string-name>
          , H.:
          <article-title>Studying political microblogging: Twitter users in the 2010 swedish election campaign</article-title>
          .
          <source>New Media &amp; Society</source>
          <volume>14</volume>
          (
          <issue>5</issue>
          ),
          <volume>729</volume>
          {
          <fpage>747</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palsetia</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patwary</surname>
            ,
            <given-names>M.M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choudhary</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Twitter trending topic classi cation</article-title>
          .
          <source>In: Data Mining Workshops (ICDMW)</source>
          ,
          <year>2011</year>
          IEEE 11th International Conference on. pp.
          <volume>251</volume>
          {
          <fpage>258</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Lichtenwalter</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Ol}san, T.:
          <article-title>Submission to the 1st classi cation of spanish election tweets task at ibereval 2017</article-title>
          . http://mediaflows.es/coset/. Team: LichtenwalterOlsan. Czech Republic and Germany
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23. Lopez Garc a, G.,
          <string-name>
            <surname>Valera</surname>
            <given-names>Ordaz</given-names>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          :
          <article-title>Pantallas electorales</article-title>
          . El discurso de partidos,
          <source>medios y ciudadanos en la campan~a de 2015</source>
          .
          <article-title>Editorial UOC (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>Mahiques</given-names>
            <surname>Sifres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            ,
            <surname>Lyeuta Tykhovod</surname>
          </string-name>
          , V.:
          <article-title>Submission to the 1st classi cation of spanish election tweets task at ibereval 2017</article-title>
          . http://mediaflows.es/coset/. Team: slovak. Spain
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>Maluenda</given-names>
            <surname>Maez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Garca Ferrando</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.A.</surname>
          </string-name>
          :
          <article-title>Submission to the 1st classi cation of spanish election tweets task at ibereval 2017</article-title>
          . http://mediaflows.es/coset/. Team: Citripio. Spain
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Mazzoleni</surname>
          </string-name>
          , G.:
          <article-title>La comunicacion pol tica</article-title>
          .
          <source>Alianza Editorial</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27. M guez,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Valdiviezo</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Submission to the 1st classi cation of spanish election tweets task at ibereval 2017</article-title>
          . http://mediaflows.es/coset/. Team:
          <string-name>
            <given-names>M</given-names>
            <surname>Val</surname>
          </string-name>
          . Spain
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Patterson</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The Mass Media Election: How Americans Choose Their President</article-title>
          ., vol.
          <volume>75</volume>
          . New York: Praeger Special Studies. (
          <year>1980</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <volume>2825</volume>
          {
          <fpage>2830</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Puigcerver</surname>
          </string-name>
          , J.:
          <article-title>Submission to the 1st classi cation of spanish election tweets task at ibereval 2017</article-title>
          . http://mediaflows.es/coset/. Team: Puigcerver. Spain
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Quercia</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Askham</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crowcroft</surname>
          </string-name>
          , J.:
          <article-title>Tweetlda: supervised topic classi cation and link prediction in twitter</article-title>
          .
          <source>In: Proceedings of the 4th Annual ACM Web Science Conference</source>
          . pp.
          <volume>247</volume>
          {
          <fpage>250</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Sanchez</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Submission to the 1st classi cation of spanish election tweets task at ibereval 2017</article-title>
          . http://mediaflows.es/coset/. Team: ivsanro1. Spain
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33. De la Pen~a Sarracen,
          <string-name>
            <surname>G.L.</surname>
          </string-name>
          :
          <article-title>Ensembles of methods for Tweet Topic Classi cation</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ).
          <source>CEUR Workshop Proceedings. CEUR-WS.org, Murcia (Spain) (September 19</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Sokolova</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lapalme</surname>
          </string-name>
          , G.:
          <article-title>A systematic analysis of performance measures for classi cation tasks</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>45</volume>
          (
          <issue>4</issue>
          ),
          <volume>427</volume>
          {
          <fpage>437</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Spark-Jones</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Report on the need for and provision of an'ideal'information retrieval test collection</article-title>
          .
          <source>Computer Laboratory</source>
          (
          <year>1975</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Sriram</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fuhry</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demir</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferhatosmanoglu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demirbas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Short text classi cation in twitter to improve information ltering</article-title>
          .
          <source>In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval</source>
          . pp.
          <volume>841</volume>
          {
          <fpage>842</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Sudhahar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veltri</surname>
            ,
            <given-names>G.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cristianini</surname>
          </string-name>
          , N.:
          <article-title>Automated analysis of the us presidential elections using big data and network analysis</article-title>
          .
          <source>Big Data &amp; Society</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ) (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>Taboada</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brooke</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , To loski, M.,
          <string-name>
            <surname>Voll</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stede</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Lexicon-based methods for sentiment analysis</article-title>
          .
          <source>Computational linguistics 37(2)</source>
          ,
          <volume>267</volume>
          {
          <fpage>307</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <surname>Tumasjan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sprenger</surname>
            ,
            <given-names>T.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sandner</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welpe</surname>
            ,
            <given-names>I.M.:</given-names>
          </string-name>
          <article-title>Predicting elections with twitter: What 140 characters reveal about political sentiment</article-title>
          .
          <source>ICWSM</source>
          <volume>10</volume>
          (
          <issue>1</issue>
          ),
          <volume>178</volume>
          {
          <fpage>185</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <string-name>
            <given-names>Villar</given-names>
            <surname>Lafuente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Garces D</surname>
          </string-name>
          az-Mun o, G.:
          <article-title>Several approaches for tweet topic classi cation in COSET - IberEval 2017</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ).
          <source>CEUR Workshop Proceedings. CEUR-WS.org, Murcia (Spain) (September 19</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Can</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazemzadeh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bar</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A system for real-time twitter sentiment analysis of 2012 us presidential election cycle</article-title>
          .
          <source>In: Proceedings of the ACL 2012 System Demonstrations</source>
          . pp.
          <volume>115</volume>
          {
          <fpage>120</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          42.
          <string-name>
            <surname>Zirn</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glavas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nanni</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eichorts</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stuckenschmidt</surname>
          </string-name>
          , H.:
          <article-title>Classifying topics and detecting topic shifts in political manifestos</article-title>
          .
          <source>PolText</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>