<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the Task on Stance and Gender Detection in Tweets on Catalan Independence at IberEval 2017</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mariona Taule</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. Antonia Mart</string-name>
          <email>amartig@ub.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francisco Rangel</string-name>
          <email>francisco.rangel@autoritas.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Rosso</string-name>
          <email>prosso@dsic.upv.es</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristina Bosco</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viviana Patti</string-name>
          <email>pattig@di.unito.it</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Autoritas Consulting</institution>
          ,
          <addr-line>S.A.</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CLiC-UBICS, Universitat de Barcelona</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>PRHLT Research Center, Universitat Politecnica de Valencia</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Universita degli Studi di Torino</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>157</fpage>
      <lpage>177</lpage>
      <abstract>
        <p>Stance and Gender Detection in Tweets on Catalan Independence (StanceCat) is a new shared task proposed for the rst time at the IberEval 2017 evaluation campaign. The automatic natural language systems presented must detect the tweeter stance (in favor, against or neutral) towards the target independence of Catalonia in Twitter messages written in Spanish or Catalan, as weel as the author's gender if possible. We have received a total of 31 submitted runs from 10 di erent teams from 5 countries. We present here the datasets, which include annotations for dealing with stance and gender, the evaluation methodology, and discuss results and participating systems.</p>
      </abstract>
      <kwd-group>
        <kwd>Stance detection</kwd>
        <kwd>Twitter</kwd>
        <kwd>Spanish</kwd>
        <kwd>Catalan</kwd>
        <kwd>Gender identi cation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The aim of the task of Stance and Gender Detection in Tweets on Catalan
Independence at IberEval 2017 (StanceCat) is to detect the author's gender and
stance with respect to the independence of Catalonia in tweets written in Spanish
or Catalan. Classical sentiment analysis tasks carried out in recent years in
evaluation campaigns for di erent languages have mostly involved the detection of
the subjectivity and polarity of microblogs at the message level, i.e. determining
whether a tweet is subjective or not, and, if subjective, determining its positive
or negative semantic orientation. However, comments and opinions are usually
directed towards a speci c target or issue, and therefore give rise to ner-grained
tasks such as stance detection, in which the focus is on detecting what particular
stance (in favor, against or neutral) a user takes with respect to a speci c target.</p>
      <p>
        Stance detection is related to sentiment analysis, but there are signi cant
di erences, as is stressed in [9]: in sentiment analysis, the systems detect whether
the sentiment polarity of a text is positive, negative or neutral, while in stance
detection, the systems detect whether a given text is favorable or unfavorable
to a given target, which may or may not be explicitly mentioned in the text.
Stance detection is particularly interesting for studying political debates in which
the topic is controversial. Therefore, for this task we have chosen to focus on a
speci c political target: the independence of Catalonia [5]. The stance detection
task is also related to a textual inference task due to the fact that the position of
the tweeter is often expressed implicitly, therefore, the stance has to be inferred
in many cases. See, for instance, the following tweet (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ).
      </p>
    </sec>
    <sec id="sec-2">
      <title>1. Language: Catalan</title>
      <p>Target: Catalan Independence
Stance: FAVOR
Tweet: Avui #27S2015 tot esta per fer... Un nou pa s es possible k*kA les urnes...
#27S http://t.co/ls2nkRWt2b
Today #27S2015 the future is ours to make... A new country is possible k*k
Get out and vote... #27S http://t.co/ls2nkRWt2b
(where k*kstands for the Catalan Independence ag).</p>
      <p>Stance detection and author pro ling tasks on microblogging texts are
currently being carried out in several evaluation forums, including SemEval-2016
(Task-6) [9] and PAN@CLEF [12]. However, these two tasks have never been
performed together for Spanish and Catalan as part of one single task. The
results obtained will be of interest not only for sentiment analysis but also for
author pro ling and for socio-political studies.
2</p>
      <sec id="sec-2-1">
        <title>Task description</title>
        <p>The StanceCat Task includes two subtasks that are meant to be independent,
namely stance detection and the identi cation of the gender of the author.
Moreover, the participation of each team in each subtask can be for one or both
languages involved in the contest, i.e. Spanish and Catalan.</p>
        <p>As far as the stance detection subtask is concerned, providing that the
reference data have been ltered with hashtags and keywords related to a speci c
topic, i.e. the independence of Catalonia, it consists of deciding whether each
message is neutral or oriented in favor of or against the given target. The three
labels representing the stance of the author in writing the message are mutually
exclusive.</p>
        <p>The second task consists of identifying the gender of the author of each
message and thus labeling it as male or female, as mutually exclusive labels.
Section 3.2 provides further explanation and examples about the labels included
in the annotation scheme applied to the dataset.</p>
        <p>The distribution of the labels (shown in Table 2) for gender in both the
training and test sets: half of the data are produced by female authors and the
other half by males. In contrast, the distribution of the labels for stance is not
balanced. Also the participation varies according to the subtask given that not
all the teams took part in the gender classi cation task but all tackled the stance
detection task.</p>
        <p>Based on the experience of previous contests, di erent metrics were adopted
for the di erent subtasks (see section 4) and di erent rankings of the participants
scores were generated for the evaluation of each subtask.</p>
        <p>As far as the language is concerned, half of the data are in Spanish and
the other half in Catalan and each of the previously described subtasks had
to be performed separately for Spanish and Catalan. Each team could decide
to perform the task for a single language or for both. Given that most teams
performed the selected subtasks in both Spanish and Catalan, an evaluation
of performance across the two di erent languages was done, showing relevant
di erences in scores.
3
3.1</p>
      </sec>
      <sec id="sec-2-2">
        <title>Development and Test Data</title>
        <sec id="sec-2-2-1">
          <title>Corpus Description</title>
          <p>As usual in the last few years in debates on social and political topics, the
discussion on Catalan separatism involved a massive use of social media by users
interested in the discussion. In order to draw attention to the related issues, as
also happens with commercial products and political elections, users created new
hashtags to give greater visibility to information and opinions on the subject.</p>
          <p>Among them #Independencia and #27S are two of the hashtags that have
been widely accepted with the dialogical and social context growing around
the topic, and were widely used within the debate. At the current stage of the
development of our project we exploited the hashtag #Independencia and #27S
as the rst two keywords for ltering data to be included in the TW-CaSe corpus.
We selected the #27S hashtag because on that date the autonomy elections
of Catalonia were celebrated, and were considered as a plebiscite by the
proindependence parties. The hashtag #Independencia and #27S allowed us to
select 10,800 original messages -5,400 written in Catalan (TW-CaSe-ca ) and
5,400 tweets written in Spanish (TW-CaSe-es )- collected between the end of
September and December 2015 and were also largely retweeted5. Half of the
tweets in each language were written by female authors and half by male authors.
3.2</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Annotation Scheme</title>
          <p>
            This section describes the scheme adopted for the annotation of the TW-CaSe
corpus with the author's stance and gender.
5 The dataset was collected with the Cosmos tool by Autoritas (http://www.
autoritas.net) and it was annotated by the CLiC group at the University of
Barcelona (http://clic.ub.edu)
In order to annotate the stance, we use the following tags adopting the
annotation scheme proposed in [5] and [9]:
{ FAVOR: positive stance towards the independence of Catalonia (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ).
{ AGAINST: negative stance towards the independence of Catalonia (
            <xref ref-type="bibr" rid="ref3">3</xref>
            ).
{ NONE: neutral stance towards the independence of Catalonia and cases in
which the stance cannot be inferred (
            <xref ref-type="bibr" rid="ref4">4</xref>
            ).
          </p>
          <p>
            The possible gender labels are: FEMALE (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) and MALE (
            <xref ref-type="bibr" rid="ref3">3</xref>
            ).These tags were
automatically extracted from proper nouns dictionaries (INE6) and manually
reviewed to remove ambiguous names. The following are examples of tweets
labelled for both the author's stance and gender in both languages.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Language: Catalan</title>
      <p>Target: Catalan Independence
Stance: FAVOR
Gender: FEMALE
Tweet: 15 diplomatics internacional observen les plebiscitaries, sera que
interessen a tothom menys a Espanya #27S
'15 international diplomats observe the plebiscite, perhaps it is of interest to
everybody except to Spain #27S2015'</p>
    </sec>
    <sec id="sec-4">
      <title>3. Language: Spanish</title>
      <p>Target: Catalan Independence
Stance: AGAINST
Gender: MALE
Tweet: #27S cual fue la diferencia en 2012 entre los resultados de la encuesta de
TV3 y resultados nales? Nos servir a para hacernos una idea
(In 2012, what was the di erence between the results of the TV3 poll and the nal
results? That would give us an idea)</p>
    </sec>
    <sec id="sec-5">
      <title>4. Language: Catalan</title>
      <p>Target: Catalan Independence
Stance: NONE
Gender: MALE
Tweet: 100% escrutat a Arbucies #27S http://t.co/avMzng6iyV
(100% of votes counted in Arbucies #27s http://t.co/avMzng6iyV)
6 http://www.ine.es
Although tweets are very short pieces of text, they tend to be complex in their
internal structure and often contain considerable informational content. It should
be pointed out that for the annotation of stance we took into account all the
information appearing in the written text (including emoticons), as well as the
information concerning some other user mentioned and hashtags. The mentioned
users are identi ed with the symbol @, and they are also known as mentions;
hashtags are semantic labels (introduced with #), which are important for
understanding the tweet, and often denote the content highlighted by the author.</p>
      <p>It is worth noting that hashtags, like mentions, can appear in any position
within the text playing a syntactic-semantic role within a tweet.</p>
      <p>We consider that all of these components play a role in the interpretation
of the whole tweet and we took them into account in the annotation of stance.
However, links -web addresses including photographs, videos and webpages- are
also very useful for interpreting the stance, and are especially relevant for the
interpretation of ironical tweets, but in this version of the corpus we did not
take them into account since the automatic systems do not do so. It is worth
noting that we are currently working on a new version of the TW-CaSe corpus
in which irony and humor are also being annotated, as well as information on
the role played by links in the tweet.
3.3</p>
      <sec id="sec-5-1">
        <title>Annotation procedure</title>
        <p>In this section, we present the methodology applied in the annotation of tweets,
the results of the inter-annotator agreement test carried out and, nally, we
analyse the di erent sources of disagreement.</p>
        <p>Three trained annotators, supervised by two senior researchers, carried out
the whole manual annotation of TW-CaSe. The annotation process was
performed in the following way: 1) First, the three trained annotators tagged the
stance in 500 tweets in Catalan and 500 tweets in Spanish working in parallel and
following the guidelines [5]. 2) We then conducted an inter-annotator agreement
test on the 500 tweets tagged in each language in order to test the validity of this
annotation (see Table 1), and to detect and solve the disagreements and possible
inconsistencies. 3) Finally, the annotators went on to annotate the whole corpus
individually. During the annotation process, we met once a week to discuss
problematic cases, which were discussed by all the people involved in the annotation
process and solved by common consensus.</p>
        <p>
          Table 1 presents the pairwise and average agreement percentages obtained
in the inter-annotator agreement test in TW-CaSe-ca and TW-CaSe-es. In the
rst four rows (
          <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2-5</xref>
          ), we show the result of the observed agreement for each
pair of annotators (pairwise agreement) and the average agreement (79.26% in
TW-CaSe-ca and 78.4% in TW-CaSe-es ). The last row shows the Fleiss' Kappa
coe cient (0.60 in both subcorpora). The results obtained show a moderate
agreement, demonstrating the complexity of the task. The annotation of the
corpus was completed in 16 weeks.
        </p>
        <p>
          Regarding disagreements, the most problematic cases in the annotation of
stance arise when the authors communicative intentions are not clear. For
instance, one annotator tagged tweet (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) as being AGAINST independence,
probably in uenced by the language used in the tweet (Spanish), whereas the other
two annotators tagged it as NONE. However, after collectively discussing this
case, we agreed to tag the tweet (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) with the NONE stance, because it was not
clear enough to which ag (Spanish or Catalan) the writer was referring to.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Language: Spanish</title>
      <p>Target: Catalan Independence
Stance: NONE
Gender: MALE
Tweet: #27s voy a denunciar a todo aquel q me siga insultando usando ls red. Yo
no soy imbcil, ni mi bandera es n trapo
(#27s I'm going to denounce anyone who continues to insult me using the web.</p>
      <p>Im not stupid, neither my ag is a rag)</p>
    </sec>
    <sec id="sec-7">
      <title>6. Language: Catalan</title>
      <p>Target: Catalan Independence
Stance: NONE
Gender: MALE
Tweet: La @cupnacional t la clau de Matrix
(The @cupnacional has the key of Matrix</p>
      <p>
        The same problem occurs with tweet (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ), in which each annotator assigned
a di erent tag for stance. This is an example of total disagreement. In the end,
it was also annotated as NONE since the stance could not be clearly inferred.
The cases in which the disagreement was total, we tended to assign the neutral
NONE tag.
      </p>
      <p>This is domain dependent information and the annotators knowledge of the
domain is therefore crucial. Frequently, the annotators have to infer the stance
and, for doing this inference, they need to know the socio-political context and
the social agents involved in the debate, in our case, about Catalan independence,
which is not always true for all annotators.
3.4</p>
      <sec id="sec-7-1">
        <title>Format and Distribution</title>
        <p>We provided participants with a single development set for training, which
consists of a collection of 4; 319 tweets in Spanish and 4; 319 tweets in Catalan, with
annotations concerning the two subtasks: stance detection and identi cation of
gender. For each language, we distributed two les: the rst one includes tweets'
IDs and textual contents. The data format is as follows: id ::: contents; the
second one includes the truth labels for the two tasks. For the truth les the data
format is id ::: stance ::: gender (see Section 3.2 for a description of the possible
labels). The language was encoded in the le name.</p>
        <p>The test data consist of 1; 081 tweets in Spanish and 1; 081 tweets in Catalan
in the same format: id ::: contents. Participants therefore did not need to detect
the language. Tweets were provided to the participants in two independent les
per language, as in the training set. The blind version of the test data did not
include the truth les7.</p>
        <p>The distribution in training and testing sets of the data exploited for the
stance subtask is balanced in an 80/20 proportion: 80% for training and 20% for
testing. The distribution in both training and test data for stance, gender and
language is given in Table 2.
The evaluation was performed according to standard metrics. In particular, we
used the macro-average of F -score (FAVOR) and F -score (AGAINST) to
evaluate stance, in accordance with the metric proposed at Semeval 2016 - Task
7 Data will be available for downloading at the following address: http://stel.ub.
edu/Stance-IberEval2017/data.html. In the rst stage access has been restricted
to participants registered for the task. To acces the dataset, ask for the password by
emailing to stancetask2017@gmail.com.
68. Gender was evaluated in terms of accuracy, in accordance with the metrics
proposed at the Author Pro ling task at PAN@CLEF9.</p>
        <p>Four di erent rankings are shown depending on the subtask and language.
Concretely, stance ranking for Spanish and Catalan, and gender ranking for
Spanish and Catalan. Two baselines are provided for comparison purposes: A
random basis approach that returns the majority class, and the Low
Dimensionality Representation (LDR) [11] approach. The key concept of LDR is a
weight representing the probability of each term to belong to each of the
different categories: for stance (in favor vs. against) and gender (female vs. male).
The distribution of weights for a given document should be close to the weights
of its corresponding category. LDR takes advantage of the whole vocabulary.
However, in order to work properly, it needs a su cient amount of information
per author.
5</p>
        <sec id="sec-7-1-1">
          <title>Overview of the Submitted Approaches</title>
          <p>Ten teams from ve countries participated in the shared task by sending up
to thirty-one runs. Table 3 provides an overview of the teams, their country of
origin (C) and the tasks they took part in, i.e. stance (S) and gender (G) for the
two languages: Spanish (ES) and Catalan (CA).</p>
          <p>All the teams participated in the stance subtask in Spanish and nine of them
in Catalan. Four teams participated in the gender subtask, both in Catalan and
Spanish, whereas only one team participated in the gender subtask in Spanish.
Eight teams sent a description of their systems, and used only the training data
provided for the task. In what follows, we analyse their approaches from two
8 http://alt.qcri.org/semeval2016/task6/index.php?id=data-and-tools
9 http://pan.webis.de/clef16/pan16-web/author-profiling.html
perspectives: classi cation approaches, and features to represent the authors'
texts.</p>
          <p>Classi cation approaches. Most participants used SVM: i) ltl uni due, which
also applied LSTM and a hybrid system that decides with a decision tree which
algorithm to apply; ii) iTACOS, which also experimented with logistic regression,
decision trees, random forest and multinomial NB; iii) ARA1337 and
ELiRFUPV, which also used neural networks; and iv) LTRC IIITH, which used RBF
kernels. Neural networks and deep learning approaches were widely used by
participants such as ltl uni due (LSTM), ARA1337, ELiRF-UPV, LuSer (multilayer
perceptron), and atoppe (CNN, LSTM, MLP, FASTTEXT, KIM and BI-LSTM).</p>
          <p>Features. Both n-grams and embeddings are the most used features. Teams
using SVM represented texts with n-gram based approaches, whereas teams
using di erent kinds of deep approaches basically used word embeddings. For
instance, ltl uni due used combinations of word and character n-grams with SVM
and word embeddings with LSTM. LTRC IIITH used character and word
ngrams with SVM, as well as speci c stance and gender indicative tokens. In
contrast, teams using deep approaches represented texts with bag-of-words
embeddings (deepCybErNet ), and word and n-gram embeddings (atoppe). ELiRF-UPV
used one-hot vectors to train its networks. Other teams used neural networks as
classi cation algorithms, but with features such as word, tokens and hashtags
unigrams (ARA1337 ) or bag of n-grams (LuSer). Finally, iTACOS combined
bag of words with bag of part-of-speech, bag of lemmas, bag of hashtags, bag
of words in hashtags and mentions, char n-grams, number of hashtags,
number of words starting with capital letter, language, number of words, number of
characters, average word length, and bag of words extracted from urls.
6</p>
        </sec>
        <sec id="sec-7-1-2">
          <title>Evaluation and Discussion of the Submitted</title>
        </sec>
        <sec id="sec-7-1-3">
          <title>Approaches</title>
          <p>We evaluated both subtasks (stance and gender) independently. We show results
separately for the evaluation of each subtask and for each language. Results are
given in F -score in case of stance and accuracy in case of gender.
6.1</p>
        </sec>
      </sec>
      <sec id="sec-7-2">
        <title>Stance Subtask</title>
        <p>Ten teams participated in the Spanish subtask, presenting thirty-one runs, and
nine teams participated in the Catalan subtask, presenting twenty nine runs. In
Table 4, the F -scores achieved by all runs are shown, as well as the two baselines.
At the bottom of the table some basic statistics are provided: minimum (min),
maximum (max), mean, median, standard deviation (stdev), rst quartile (q1)
and third quartile (q3).</p>
        <p>In the Catalan subtask, the majority of the runs (29 out of 31) obtained worse
results than the majority class prediction (F -score 0.4882). The only runs that
improved majority class prediction belong to the same team (iTACOS ) with an
F -score of 0.4901 and 0.4885. They approached the task with di erent machine
learning algorithms such as SVM, logistic regression or decision trees, among
others, with combinations of di erent kinds of features (bag of words, bag of
parts-of-speech, n-grams) and stylistic features (word length, number of words,
number of hashtags, number of words starting with capital letters, and so on).
The worst results were obtained with deep learning approaches, with F -scores
between 0.2710 (attope.1 ) and 0.3790 (deepCybErNet.2 ).</p>
        <p>In the Spanish subtask, twelve runs obtained better results than the
majority class baseline (0.4479). The best result was also obtained by the iTACOS
team, with an F -score of 0.4888. The next best results were obtained by di
erent runs of LTRC IIITH (0.4679 and 0.4640) and ELIRF-UPV (0.4637). While
LTRC IIITH used SVM learning from character and word n-grams besides
speci c stance features, ELIRF-UPV used neural networks and SVM with one-hot
vectors and bag-of-words. The worst results were obtained by the attope team
with word embeddings and combinations of neural networks models (between
0.1906 and 0.2466).</p>
        <p>As can be seen in Figure 1, results are similar for mean, max and q3 statistics
for both languages, although they are more sparse for Spanish and have lower
values for the worst systems. Results for Catalan are between 0.4901 and 0.2710,
with an average value of 0.4053. Results for Spanish are between 0.4888 and
0.1906, with an average value of 0.3843.</p>
        <p>Catalan</p>
        <p>LDR obtained worst results than the majority class prediction. Since this
task was focused on the tweet level instead of the author level, these low results
might be expected due to the need of LDR for a large amount of data per
author in order to normalise frequency distributions. Something similar might
have happened with deep learning approaches that need large amounts of data
to learn the models. However, the provided dataset is small and biased towards
a majority class.
6.2</p>
      </sec>
      <sec id="sec-7-3">
        <title>Gender Subtask</title>
        <p>Five teams participated in the Spanish subtask, presenting nineteen runs, and
four teams in the Catalan subtask, presenting seventeen runs. In Table 5 the
accuracies achieved by all runs are shown, together with the two baselines. At
the bottom of the table some basic statistics are also provided: minimum (min),
maximum (max), mean, median, standard deviation (stdev), rst quartile (q1)
and third quartile (q3).</p>
        <p>In the Catalan subtask, all the runs (19) obtained worse results than the
majority class (0.5005) and LDR predictions (0.6068). The best results were
obtained by deepCybErNet (0.4857, 0.4829 and 0.4653) and LTRC IIITH (0.4459
and 0.4440). They used SVM with combinations of char and word n-grams
together with speci c gender indicators, and deep learning methods respectively.
The worst results were obtained by UPF-LaSTUS (0.3571 and 0.4043) and
iTACOS (0.3996 and 0.3987). iTACOS used di erent machine learning algorithms
with a combination of di erent bags of features, and UPF-LaSTUS did not
provided a description of their system.</p>
        <p>In the Spanish subtask, most runs obtained better results than the majory
class prediction, although they were below LDR. The best results were obtained
by LTRC IIITH (between 0.6485 and 0.6401) and iTACOS (between 0.6161 and
0.6124). The worst results were obtained by deepCybErNet (0.4764, 0.4903 and
0.5014). It is noteworthy that the latter team obtained the best results in Catalan
but the worst in Spanish. However, the obtained accuracies were similar (0.4857,
0.4829, 0.4656 vs. 0.5014, 0.4903, 0.4764) for both languages. This demonstrates
the stability of this system when applied to di erent datasets.</p>
        <p>As can be seen in Figure 2, results for Catalan are less sparse than for Spanish,
though all of them are below the majority class and have an average accuracy of
0.4459. There are three outliers corresponding from above to LDR (0.6068) and
majority class (0.5050), and from below to UPF-LaSTUS (0.3571). Most results
for Spanish are between 0.5495 and 0.6448, with an average accuracy of 0.5935.
The maximum value of 0.6855 was obtained by ELIRF-UPV and the minimum
of 0.4764 by deepCybErNet.</p>
        <p>LDR obtained the best result for Catalan and the second best result for
Spanish, despite the low amount of data per author. The majority class
prediction coincides with a random classi cation since the dataset is balanced in
terms of gender. Deep learning approaches such as deepCybErNet maintained
their stability, though with values below those of the majority class.</p>
        <p>Catalan
In this section the performance of the systems with respect to both subtasks is
analysed together. The aim is to know whether systems performing properly in
one subtask, do the same in the other one. The analysis is carried out separately
per language.</p>
        <p>The results for Catalan are shown in Figure 3. In this language, results for
gender were below the majority class and LDR. DeepCybErNet achieved the best
results in gender identi cation, and the worst in stance. This team approached
the task with deep learning techniques. On the other hand, systems that obtained
some of the best results for stance (iTACOS.1, iTACOS.2 and iTACOS.3 ),
obtained some of the worst results for gender. Systems such as UPF-LaSTUS.3
and UPF-LaSTUS.4 obtained some of the worst results both for gender and
stance. In this case, they did not provide a description of their system.</p>
        <p>The results for Spanish are shown in Figure 4. In this language, results for
gender are higher than in Catalan, with most systems over the majority class
baseline. There is a clearly observable trend for the systems that obtained better
results for gender to do the same for stance. For example, ELIRF-UPV.1
obtained the best result for gender and the third position for stance. In this case,
the authors approached the task with one-hot vectors and neural networks.
Similarly, iTACOS.1 obtained the best result for stance, with a value on the median
for gender, by using combinations of features and SVM. And nally, the group
of results obtained by LTRC IIITH are some of the bests for both subtasks.
They learned RBF kernels for SVM with combinations of character and word
n-grams with indicative tokens per subtask. On the other hand, deepCybErNet
and UPF-LasTUS obtained the worst results in both subtasks. There is no
information for UPF-LasTUS but deepCybErNet used di erent deep learning-based
approaches.
In this section we analyse errors in stance detection based on the author's gender.
We observed two kinds of errors: i) the participants interpreted a stance as being
"in favor" when the real value was "against" (F -&gt;A); and ii) the participants
interpreted "against" when it was actually "in favor" (A -&gt;F). We analyse the
error rate for these two kinds of error depending on the gender of the author
who wrote the tweet. As can be seen in Table 6, in both kinds of errors the rate
is higher when the tweets were written by males. The greatest di erence occurs
with error A -&gt;F in Catalan with a di erence of more than 8%. In the case of
Catalan, such di erences are highly signi cant (p-value equal to 4.24 and 5.33
respectively). In the case of Spanish, they are only signi cant when the type
of error is F -&gt;A (p-value equal to 2.16). In the case of error type A -&gt;F, the
results are only statistically di erent at level 0.05 (p-value equal to 1.38).</p>
        <p>In the case of Catalan, the A -&gt;F error rate is higher, than in Spanish,
where it is close to 2%. This may be due to a bias resulting from the di erence
in the number of tweets classi ed according to the sentiment expressed: there is
a higher number of tweets in favor of independence written in Catalan, whereas
there is a higher number of tweets against independence written in Spanish.</p>
        <p>Tables 7 and 8 show tweets that were wrongly classi ed more often. The
tables show ve examples per gender, with females examples at the top, and
males at the bottom. Taking into account the results shown in Table 6, we can
say that it seems more di cult to detect stance for male tweets.</p>
        <p>Considering that the average agreement percentage obtained in the
interannotator agreement test is moderate (around 79%), probably there exists a
percentage of inconsistency in the training sets, which could explain the
moderatelow results obtained by the systems. Moreover, the analysis of the 40 tweets in
Tables 7 and 8, namely those that were wrongly classi ed more often, does not
Bastanta por em fa l'actitut de @InesArrimadas @CiudadanosCs De que
criden #libertat? #27S #Eleccions27S
"@FinancialTimes: Independence parties win in Catalonia
http://t.co/pOmcTAG70b" @InesArrimadas Prou de mentides. Ha
guanyat el si. #27STV3
En quina nit electoral parlen els numeros 1, 4 i 5? I la Carme Forcadell i
la Muriel Casals? De oreros? #JuntsXsiLlistaCiutadana #27STV3
Els pol tics diuen "catalanes i catalans", per que? No em sento exclosa en
el mascul ... Euforia pels resultats! #llengua #27S
El S no ha estat aclaparador. Em sap greu de debo perque ho desitjava,
pero la victoria que celebra @JuntsPelSi no es tal... I ara? #27S
Bon dia Catalunya! ll*ll #27s #votar #araeslhora de @srta borrat.</p>
        <p>Opina a: http://t.co/8WK4JrTOqj http://t.co/Siqnqvz01G
@Bioleg @JuntsPelSi @cupnacional #hovolemtot
Bon article d'@eduardvoltas resumint el #27S: Gran victoria
independentista http://t.co/e5vlcc8W9z
Bon dia, #catalunya. Com ho duis? #27S #27SCatRadio #27S2015
Bufen nous vents!! #catalunya #27S #muntanya #montan~a #mountain
#trekking #ig catalonia... https://t.co/XEVU11L1ae
Against -&gt;Favor
#27S ???????? No volem independencia. Visca Catalunya i visca Espanya
????
#27S Unio te un problema, i es diu 3%. Au va!!!
#Eleccions27S ERC + CiU perden 9 diputats i amb tot el suport
mediatic i el bombo i plateret d'aquests dies #QuinExit!
Escoltar els crits "Catalun~a es Espan~a" de Ciutadans i que se'm posi la
pell de gallina #NO #independencia
Gracies @JuntsPelSi pel resultat de @CiudadanosCs . Sou uns cracks!
#eleccionescatalanas #27S
Avui mes que mai, Catalunya es Espanya. #27S
BON DIA A TOTS ELS TONTOS DEL CUL QUE EM VOTARAN. UN
PETONET, IMBECILS!! #27S #GuanyemJunts
http://t.co/YABQAUzdX1
Catalans!!! Heu de follar mes i votar menys!! #FollemJunts #27S
#GuanyemJunts http://t.co/RZM3cUIsCU
avui es el primer dia de la meva vida que he de dir amb tristessa que
m'avergonyo de ser del meu poble. #elprat #27s @CiudadanosCs
Avui votare per les valencianes que porten anys de lluita perque la nostra
llengua i cultura seguisquen ben vives. #27S #somdelSud #SomPPCC
allow us to infer the reasons for the low performance of the systems. These facts
highlight the di culty of this task, in which there is an important subjective
component and the linguistic content of the tweets is very scarce.</p>
        <p>In order to improve the results, we should probably work with a higher
number of tweets, to take into account the information included in the links {to see
whether they contribute to detect the stance of the tweet{, and to take into
consideration other aspects such as the presence of irony and humor in the tweets.
For instance, in our current research about stance and irony, we observed that
tweets against independence tend to be more ironic than those that are in favor
of independence, and that irony is more common in men than in women.
7</p>
        <sec id="sec-7-3-1">
          <title>Conclusion</title>
          <p>We described a new shared task on detecting the stance towards Catalan
Independence and the author's gender in tweets written in Spanish and Catalan,
the two languages used by users directly involved in the political debate. Unlike
previous evaluation campaigns, we decided to perform stance and gender
detection together as part of one single shared task. We encouraged participants to
address both sub-tasks, but participation was also allowed only in stance
detection, which constitutes the main focus of the shared task. Interestingly, we
observed a clear trend showing that systems that participated in both sub-tasks
and obtained better results for gender also did so for stance.</p>
          <p>StanceCat was proposed for the rst time at the IberEval evaluation
campaign and was one of the tasks with highest participation in the 2017 edition.
We received submissions from ten teams from ve countries, collecting more
than thirty runs, with systems utilizing a wide range of methods, features and
resources. Overall, results con rm that stance detection of micro-blogging texts
is challenging, with large room for improvement, as was also observed in the
shared task organized at Semeval 2016 for English. We hope that the dataset
made available as part of the StanceCat task will foster further research on this
topic, also in the context of under resourced languages such as Catalan.</p>
        </sec>
        <sec id="sec-7-3-2">
          <title>Acknowledgements</title>
          <p>The work has been carried out in the framework of the SOMEMBED project
(TIN2015-71147), funded by Ministerio de Econom a y Competitividad, Spain.
The work of the third author has been partially funded by Autoritas
Consulting. The work of Cristina Bosco and Viviana Patti was partially funded by
Progetto di Ateneo/CSP 2016 \Immigrants, Hate and Prejudice in Social Media"
(S1618 L2 BOSC 01).</p>
          <p>We would like to thank Enrique Amigo and Jorge Carrillo de Albornoz from
UNED10 for their help during the evaluation with the EVALL platform [3].
10 http://portal.uned.es
54.17%
52.08%</p>
          <p>Favor -&gt;Against
Si como dijo @PSOE no era un plebiscito, porque ahora @sanchezcastejon
dice que Mas ha perdido el plebiscito?? Mi no entender #marxem #27s
Sen~ora @InesArrimadas que dimision pide si todavia no hay presidente?!
#27S #CatalunyaIndependent #27STV3 ????????
??????????????????????????????????????
Ho acabes de dir, @Albert Rivera: "Empieza una nueva pol tica para
Espan~a". #independencia #27S #27STV3
@ anapastor @InesArrimadas . No le han pasado bien los apuntes.</p>
          <p>Ganan #JuntsPelSi# con un doble apoteosico
@InesArrimadas te equivoques nena. Donde ves la mayor a??? Bocazas
#JuntesPelSi
@Albert Rivera @CiudadanosCs ha sido quien ha votado la ruptura de
Espan~a y no la vieja pol tica" #eleccionescatalanas
#27STV3 en serio @Albert Rivera @InesArrimadas @CiudadanosCs
alguien os ha ensen~ado los resultados? Sabeis contar?
http://t.co/ccajELgsE4
Ahora @CiutadansCs pide nuevas elecciones que sean verdaderamente
autonomicas. Al nal s eran un plebiscito? Dec danse #27S
Que alguien le diga a Rivera Arrimadas que los reyes son los padres. #27S
A los que dec an que esto no era un plebiscito lo utilizan ahora al saber
los resultados. Me encanta esa logica. #27S
Against -&gt;Favor
#27S #L6cat. Es evidente que desde Madrid se sigue sin entender nada
de nada. Que sordera, que ceguera...es surrealista
#27STV3 CUP dice, no se costara un catalan sin comer 3 platos al dia,
sen~or Mas yo no he comido! Pues NO te acuestes!
Campeon: @Albiol XG "Llevo en pol tica muchos an~os. No he perdido
nunca" 2012 471.681 2015 337.645 97% escrut #27STV3
http://t.co/PRSQ2QIA5F
Hola @InesArrimadas Soy una mas de las orgullosas personas
simpatizantes de @CsTorredembarra y con este #Ciutadans25,
http://t.co/tNby9XL6zV
#27STV3 Pero la Cup no decia que no apoyaria un proceso sin mayoria
de votos??????
Pues yo quer a una independencia de Catalun~a,que as puedo decir que
tengo familia en el extranjero. #YloqueMolaDecirEsoQue #democracia
#27S
Puedo entender el deseo de muchos independentistas pero el discurso de
Romeva es el nuevo Alicia en el pa s de las maravillas. #27S
CUP rechaza la Union Europea (Prog #27S pag 13) Romeva: JxS
negociara reingreso con Union Europea "desde dentro" #Catalunya
#InesPresidenta
@catsiqueespot no perdamos el rumbo. (Aunque una encuesta no es un
referendum) #CSQEP http://t.co/g3bfHdDtpX
Ciutadans gritando: "Espan~a unida jamas sera vencida" vease la
regeneracion pol tica. #27Stv3 #CataloniaVotes</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aineto-Garc</surname>
            <given-names>a</given-names>
          </string-name>
          , D.,
          <string-name>
            <surname>Larriba-Flor</surname>
            ,
            <given-names>A.M.:</given-names>
          </string-name>
          <article-title>Stance detection at ibereval 2017: A biased representation for a biased problem</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ), Murcia, Spain,
          <source>September 19, CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ambrosini</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nicolo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Neural models for stancecat shared task at ibereval 2017</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ), Murcia, Spain,
          <source>September 19, CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Amigo</given-names>
            <surname>Cabrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Gonzalo</surname>
            <given-names>Arroyo</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Verdejo</surname>
          </string-name>
          <string-name>
            <surname>Maillo</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.F.</surname>
          </string-name>
          :
          <article-title>Evall: A framework for information systems evaluation (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Barbieri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Shared task on stance and gender detection in tweets on catalan independence - lastus system description</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ), Murcia, Spain,
          <source>September 19, CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Tweeting in the debate about catalan elections</article-title>
          .
          <source>In: Proceedings of the International Workshop on Emotion and Sentiment Analysis (co-located with LREC</source>
          <year>2016</year>
          ). ELSA, Portoroz, Slovenia (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chulia</surname>
            ,
            <given-names>L.C.</given-names>
          </string-name>
          :
          <article-title>Submission to the 1st task on stance and gender detection in tweets on catalan independence at ibereval 2017</article-title>
          . http://stel.ub.edu/stance-ibereval2017/ team: Luser. spain.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gonzalez</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pla</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hurtado</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          :
          <article-title>Elirf-upv at ibereval 2017: Stance and gender detection in tweets</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ), Murcia, Spain,
          <source>September 19, CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cignarella</surname>
            ,
            <given-names>A.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hernandez-Farias</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          <article-title>: itacos at ibereval2017: Detecting stance in catalan and spanish tweets</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ), Murcia, Spain,
          <source>September 19, CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mohammad</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiritchenko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sobhani</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cherry</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Semeval-2016 task 6: Detecting stance in tweets</article-title>
          .
          <source>In: Proceedings of the International Workshop on Semantic Evaluation</source>
          . pp.
          <volume>31</volume>
          {
          <fpage>41</fpage>
          . SemEval '16,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          , San Diego, California (
          <year>June 2016</year>
          ), http://aclweb.org/anthology/S/S16/S16-1003.pdf
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. R, V.:
          <article-title>Submission to the 1st task on stance and gender detection in tweets on catalan independence at ibereval 2017</article-title>
          . http://stel.ub.edu/stance-ibereval2017/ team: Deepcybernet. india.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franco-Salvador</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A low dimensionality representation for language variety identi cation</article-title>
          .
          <source>In: 17th International Conference on Intelligent Text Processing and Computational Linguistics</source>
          , CICLing. Springer-Verlag, LNCS, arXiv:
          <fpage>1705</fpage>
          .10754 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhoeven</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the 4th author pro ling task at PAN 2016: Cross-genre evaluations</article-title>
          . In: Balog,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Macdonald</surname>
          </string-name>
          , C. (eds.) Working Notes of CLEF 2016 -
          <article-title>Conference and Labs of the Evaluation forum</article-title>
          , Evora, Portugal,
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          September,
          <year>2016</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>1609</volume>
          , pp.
          <volume>750</volume>
          {
          <fpage>784</fpage>
          . CEURWS.org (
          <year>2016</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1609</volume>
          /16090750.pdf
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Swami</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khandelwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shrivastava</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarfaraz-Akhtar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Ltrciiith at ibereval 2017:
          <article-title>Stance and gender detection in tweets on catalan independence</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ), Murcia, Spain,
          <source>September 19, CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Verdu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Submission to the 1st task on stance and gender detection in tweets on catalan independence at ibereval 2017</article-title>
          . http://stel.ub.edu/stance-ibereval2017/ team: Ateam. spain.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wojatzki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zesch</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Neural, non-neural and hybrid stance detection in tweets on catalan independence</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ), Murcia, Spain,
          <source>September 19, CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <source>2017 (2017) 4.05% 2.89% 2.31% 2.31% 2.31% 5.85% 5.85% 4.79% 2.66% 2</source>
          .66%
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>