<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SardiStance @ EVALITA2020: Overview of the Task on Stance Detection in Italian Tweets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessandra Teresa Cignarella</string-name>
          <email>cigna@di.unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mirko Lai</string-name>
          <email>lai@di.unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristina Bosco</string-name>
          <email>bosco@di.unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viviana Patti</string-name>
          <email>patti@di.unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Rosso</string-name>
          <email>prosso@dsic.upv.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>. Dipartimento di Informatica, Università degli Studi di Torino</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>. PRHLT Research Center, Universitat Politècnica de València</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>English. SardiStance is the first shared task for Italian on the automatic classification of stance in tweets. It is articulated in two different settings: A) Textual Stance Detection, exploiting only the information provided by the tweet, and B) Contextual Stance Detection, with the addition of information on the tweet itself such as the number of retweets, the number of favours or the date of posting; contextual information about the author, such as follower count, location, user's biography; and additional knowledge extracted from the user's network of friends, followers, retweets, quotes and replies. The task has been one of the most participated at EVALITA 2020 (Basile et al., 2020), with a total of 22 submitted runs for Task A, and 13 for Task B, and 12 different participating teams from both academia and industry.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction/Motivation</title>
      <p>
        The interest towards detecting people’s opinions
towards particular targets, and towards monitoring
politically polarized debates on Twitter has grown
more and more in the last years, as it is attested
by the proliferation of questionnaires and polls
online
        <xref ref-type="bibr" rid="ref1 ref15 ref22 ref5">(Küçük and Can, 2020)</xref>
        . In fact, through the
constant monitoring of people’s opinion, desires,
complaints and beliefs on political agenda or
public services, policy makers could better meet
population’s needs.
      </p>
      <p>In the fields of Natural Language Processing
and Sentiment Analysis, this translates into the
creation of a specifically dedicated task, namely:</p>
      <p>Copyright © 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).</p>
      <p>Stance Detection (SD), which is defined as the
task of automatically determining from the text
whether the author of a given textual content is
in favor of, against, or neutral towards a certain
target. Research on this topic, beyond mere
academic interest, could have an impact on different
aspects of everyday life such as public
administration, policy-making, marketing or security
strategies.</p>
      <p>Although SD is a fairly recent research topic,
considerable effort has been devoted to the
creation of stance-annotated datasets. In their
recent survey on this topic, Küçük and Can (2020)
describe the existence of a variety of
stanceannotated datasets (different text types such as
tweets, posts in online forums, news articles, or
news comments) for at least eleven languages.</p>
      <p>
        The first shared task on SD was held for
English at SemEval in 2016, i.e. Task 6 “Detecting
Stance in Tweets”
        <xref ref-type="bibr" rid="ref13 ref14">(Mohammad et al., 2016b)</xref>
        for
detecting stance towards six different targets of
interest: “Hillary Clinton”, “Feminist Movement”,
“Legalization of Abortion”, “Atheism”, “Donald
Trump”, and “Climate Change is a Real Concern”.
A more recent evaluation for SD systems was
proposed at IberEval 2017 for both Catalan and
Spanish (Taulé et al., 2017) where the target was only
one, i.e. “Independence of Catalonia”. A re-run
was proposed the following year at the
evaluation campaign IberEval 2018 regarding the target
“Catalan first of October Referendum”
encouraging furthermore an exploration of multimodal
expressions such as audio, videos and images (Taulé
et al., 2018).
      </p>
      <p>
        SardiStance@EVALITA2020 is the pioneer task
for SD in Italian tweets. The motivation behind the
proposal of this task is multi-faceted. On the one
hand, we aimed at the creation of a new annotated
dataset for SD in Italian which would enrich the
panorama of available resources for this language,
such as CONREF-STANCE-ITA (Lai et al., 2018)
and X-STANCE
        <xref ref-type="bibr" rid="ref1 ref15 ref22 ref5">(Vamvas and Sennrich, 2020)</xref>
        . On
the other hand, the organization of this task allows
us a deeper investigation of SD at a contextual
level, by encouraging the participants and the
research community to follow this research line that
has proved promising in previous work, see e.g.
Lai et al. (2019), Lai et al. (2020) and Del Tredici
et al. (2019). In fact, with the data distributed in
Task B different types of social network
communities, based on friendships, retweets, quotes, and
replies could be investigated, in order to analyze
the communication among users with similar and
divergent viewpoints.
      </p>
      <p>
        The efficacy of approaches based on contextual
features paired with textual information has been
widely attested in literature on SD
        <xref ref-type="bibr" rid="ref12 ref17 ref7">(Magdy et
al., 2016; Rajadesingan and Liu, 2014)</xref>
        and
additionally confirmed by the results obtained in this
shared task, especially by those teams who
participated to both settings (see Section 5).
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Definition of the Task</title>
      <p>With this task proposal, we wanted to invite
participants to explore features based on the textual
content of the tweet, such as structural, stylistic, and
affective features, but also features based on
contextual information that does not emerge directly
from the text, such as knowledge about the
domain of the political debate or information about
the user’s community. For these reasons, we
proposed two different settings:</p>
      <sec id="sec-2-1">
        <title>Task A - Textual Stance Detection:</title>
        <p>The first task was a three-class classification
task where the system had to predict whether a
tweet is in FAVOUR, AGAINST or NONE towards
the given target, exploiting only textual
information, i.e. the text of the tweet.</p>
        <p>
          From reading the tweet, which of the options below is
most likely to be true about the tweeter’s stance towards
the target?
          <xref ref-type="bibr" rid="ref13 ref14">(Mohammad et al., 2016a)</xref>
          1. FAVOUR: We can infer from the tweet that the
tweeter supports the target.
2. AGAINST: We can infer from the tweet that the
tweeter is against the target.
3. NONE: We can infer from the tweet that the
tweeter has a neutral stance towards the target or
there is no clue in the tweet to reveal the stance of
the tweeter towards the target.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Task B - Contextual Stance Detection:</title>
        <p>The second task was the same as the first one:
a three-class classification task where the system
had to predict whether a tweet is in FAVOUR,
AGAINST or NONE towards the given target. Here
participants had access to a wider range of
contextual information based on the post such as: the
number of retweets, the number of favours, the
number of replies and the number of quotes
received to the tweet, the type of posting source (e.g.
iOS or Android), and date of posting. Furthermore
we shared (and encouraged its exploitation)
contextual information related to the user, such as:
number of tweets ever posted, user’s bio, user’s
number of followers, user’s number of friends.
Additionally we shared users’ contextual
information about their social network, such as: friends,
replies, retweets, and quotes’ relations. The
personal ids of the users were anonymized but their
network structures were maintained intact.
Participants could decide to participate to both
tasks or only to one. Although they were
encouraged to participate to both.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Data</title>
      <p>We chose to gather the data from the social
networking Twitter due to the free availability of a
huge amount of users’ generated data and because
it allowed us to explore different types of relations
among the users involved in a debate.
3.1</p>
      <sec id="sec-3-1">
        <title>Collection and annotation of the data</title>
        <p>We collected around 700K tweets written in
Italian about the “Movimento delle Sardine”
(Sardines movement1), retrieving tweets containing the
keywords “sardina”, “sardine”, and the
homonymous hashtags. Furthermore, we collected all the
conversation threads in which the said tweet
belongs, iteratively following the reply’s tree. We
also collected the quoted tweets and the list of all
the retweets of each previously recovered tweet,
obtaining about 1M tweets. Finally, we collected
the friend list of all the users included in the
annotated dataset.</p>
        <p>The tweets were gathered between the 46th
week of 2019 (November) and the 5th week of
2020 (January), corresponding to a 12 weeks
timewindow. Through the experience matured as
participants in previous shared tasks of SD, and in
or1https://en.wikipedia.org/wiki/
Sardines_movement.
der to reduce noise in text, we collected data
taking into account the following constraints: only
one tweet per author for each week, no retweets,
no replies, no quotes, no tweets containing URLs,
no tweets containing pictures or videos.</p>
        <p>Then, we included only Italian tweets posted
using a limited number of “sources” (utilities used to
post the tweet, such as iOS, Android, etc...) in
order to avoid to include pre-written tweets posted
using a Tweet button.2 Furthermore, we validated
that all the collected tweets presented a Jaccard
similarity coefficient &lt; 0:8. From about 25K
filtered tweets, we finally randomly selected around
300 tweets for each week (only the first week of
2020 does not reach 300 tweets), thus obtaining
3,600 tweets in total.
We created a web platform for annotation
purposes, see Figure 1, in order to facilitate the
labelling task to the annotators, unifying the
visualization mode and shuffling the tweets in a random
order.3 12 different native Italian speakers with an
interest for news and politics were involved in the
annotation, according to detailed guidelines we
provided with examples for annotation and
examples in their native language. We randomly
shuffled the annotators and matched them into 66 pairs
in which each pair would annotate 55 tweets. As
a result, each annotator labelled 605 tweets
independently and each tweet was annotated by two
annotators, who had to choose among four
different labels: AGAINST, FAVOUR, NONE/NEUTRAL
and OUT OF TOPIC.</p>
        <p>2https://developer.twitter.com/en/
docs/twitter-for-websites/tweet-button/
overview.</p>
        <p>3In this way, each annotator was surely seeing emojis –
which, we believe are essential in order to understand the
correct stance– in the same way of the other annotators
independently of the device used.</p>
        <p>Furthermore, as it can also be seen in Figure 1
(Tonight we are all sardines in Bologna
#bolognanonsilega), we asked the annotators to mark
whether, in their opinion, the tweet was IRONIC
or NOT IRONIC. Finally, we were not able to
obtain satisfactory results on this end, so we did not
include it in the task.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Analysis of the annotation</title>
        <p>At the end of a first phase of annotation, which
lasted more or less a month, we obtained 2,256
tweets in agreement, with a clear decision on one
of the three main classes. Other 917 tweets
presented a light disagreement (i.e. FAVOUR vs.
NEUTRAL or AGAINST vs. NEUTRAL), and the
remaining 457 tweets were discarded because the
majority of annotators considered them out of topic or
were in strong disagreement (i.e. FAVOUR vs. OUT
OF TOPIC).</p>
        <p>We then proceeded in the resolution of those
917 tweets, whose disagreement was deemed
”light” in order to obtain a bigger dataset. We
resorted once again to the annotation platform used
in the first phase, we revised the annotation
guidelines and asked the annotators to label the tweets
again. In this phase, we paid attention that the
tweets in disagreement were not assigned to the
same pair of annotators that had previously
labelled them, and furthermore we chose to show the
two annotations in contrast, along with any
comment - if present - to the annotator that had to solve
the disagreement.</p>
        <p>After the second phase, we computed the
inter-annotator agreement (IAA) through Cohen’s
kappa coefficient (over the three main classes)
resulting in = 0.493 (weak agreement). The
same coefficient was also used to compute the
IAA among annotators over the two most
significant classes (AGAINST and FAVOUR, excluding
the NEUTRAL class), resulting in a higher score:
= 0.769 (moderate agreement). Notably, we
observed that the IAA significantly changes
depending on the observed pair of annotators (it ranges
from 0.873 to 0.473) in the first phase of the
annotation. We also noticed that the average IAA,
computed through the sum of each IAA between
any annotator and the remaining 11 annotators,
can significantly change (ranging from 0.704 to
0.609). In other words, some annotators tend to
strongly agree with all the other ones, while others
tend to disagree with the majority. As future work,
we aim to shed more light on this phenomena
exploring the background of the annotators and the
social relationship among them.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Task A</title>
      <p>The training data (TRAIN.csv) was released in the
following format:</p>
      <sec id="sec-4-1">
        <title>3.3 Composition of the dataset</title>
        <p>After the second round of annotation we were
finally able to create the official dataset for the
SardiStance shared task. It is composed by a
total of 3,242 tweets, 1,770 of which belong to the
class AGAINST, 785 to the class FAVOUR, and 687
to the class NONE. In Table 1 we show the
distribution of such instances accordingly to the
training set and the test set and in Table 2 we report
tweet as example for each class.</p>
        <p>TRAINING SET TEST SET
AGAINST FAVOUR NONE AGAINST FAVOUR NONE
1,028 589 515 742 196 172
2,132 1,110</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.4 Data Release</title>
        <p>
          We shared data following the methodology
recommended in
          <xref ref-type="bibr" rid="ref18 ref21">(Rangel and Rosso, 2018)</xref>
          in order
to comply to GDPR privacy rules and Twitter’s
policies. The identifiers of tweets and users
have been anonymized and replaced by unique
identifiers. We exclusively released the emojis
eventually contained in the location and
description user’s biography, in order to make very hard
to trace users and to preserve everybody’s privacy.
text
label
where tweet_id is the Twitter ID of the
message, user_id is the Twitter ID of the user who
posted the message, text is the content of the
message, label is AGAINST, FAVOUR or NONE.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Task B</title>
      <p>In order to participate to Task B, we released
additional contextual information.</p>
      <p>the file TWEET.csv, containing contextual
information regarding the tweet, with the following
format:</p>
      <p>tweet_id user_id retweet_count
favorite_count source created_at
where tweet_id is the Twitter ID of the
message, user_id is the Twitter ID of the user who
posted the message, retweet_count indicates
the number of times the tweet has been retweeted,
favorite_count indicates the number of
times the tweet has been liked, source
indicates the type of posting source (e.g. iOS or
Android), and created_at displays the time
of creation according to a yyyy-mm-dd hh:mm:ss
format. Minutes and seconds have been encrypted
and transformed to zeroes for privacy issues.</p>
      <p>the file USER.csv, containing contextual
information regarding the user. It was released in the
following format:
user_id statuses_count friends_count
followers_count created_at emoji
where user_id is the Twitter ID of the user
who posted the message, statuses_count,
friends_count indicates the number of
friends of the user, followers_count
indicates the number of followers of the user,
created_at displays the time of the user
registration on Twitter, and emoji shows a list of the
emojis in the user’s bio (if present, otherwise the
field is left empty).</p>
      <p>The files FRIEND.csv, QUOTE.csv, REPLY.csv
and RETWEET.csv containing contextual info
about the social network of the user. Each file was
released in the following format:</p>
      <p>Source</p>
      <p>Target</p>
      <p>Weight
where Source and Target indicate two nodes
of a social interaction between two Twitter users.
More specifically, the source user performs one of
the considered social relation towards the target
user. Two users are tied by a friend relationship if
the source user follows the target user (friend
relationship does not have a weight, because it is
either present or absent); while two users are tied by
a quote, retweet, or reply relationship if the source
user respectively quoted, retweeted, or replied the
target user. Table 4 shows some metrics about the
shared networks.</p>
      <p>friend
retweet
quote
reply</p>
      <p>Weight indicates the number of interactions
existing between two users. Note that this
information is not available for the friend
relation (hence, this column was not present in the
FRIEND.csv file) due to the fact that it is a
relationship of the type present/absent and cannot be
described through a weight. In all the files, users
are defined by their anonimyzed User ID.</p>
      <p>Regrettably, we did not think to anonymize the
screen names contained in the text of the tweets
(with the same numeric string used to anonymize
users), for allowing to match it with the users’ ids
and allowing the exploration of the network based
on mentions. We will surely take it into account in
our future works.
4</p>
    </sec>
    <sec id="sec-6">
      <title>Evaluation Measures</title>
      <p>Each participating team was allowed to submit a
maximum of 4 runs for each sub-task: two
constrained runs and two unconstrained runs.
Submitting at least a constrained run was anyway
compulsory. We decided to provide two
separate official rankings for Task A and Task B, and
two separate ranking for constrained and
unconstrained runs. Systems have been evaluated
using F1-score computed over the two main classes
(FAVOUR and AGAINST). Therefore, the
submissions have been ranked by the averaged
F1score over the two classes, according the following
equation: F 1avg = (F 1favour + F 1against)=2.
4.1</p>
      <sec id="sec-6-1">
        <title>Baselines</title>
        <p>We computed a baseline using a simple machine
learning model, for Task A: a Support Vector
Classifier based on token uni-gram features. A
second baseline we computed for Task B is a system
based on our previous work on Stance Detection: a
Logistic Regression classifier paired with token
ngrams features (unigrams, bigrams and trigrams),
plus features based on a binary one-hot
encoding representation of the communities extracted
from the network of retweets and the network of
friends (see the best system for Italian, in Lai et al.
(2020)).
5</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Participants and results</title>
      <p>A total of 12 teams, both from academia and
industry sector participated to at least one of the two
tasks of SardiStance. In Table 3 we provide an
overview of the teams in alphabetical order.</p>
      <p>
        Teams were allowed to submit up to four runs (2
constrained and 2 unconstrained) in case they
implemented different systems. Furthermore, each
team had to submit at least a constrained run.
Participants have been invited to submit multiple runs
to experiment with different models and
architectures. However, they have been discouraged from
team name
deepreading
GhostWriter
IXA
MeSoVe
QMUL-SDS
SSN_NLP
SSNCSE-NLP
TextWiller
UNED
UninaStudents
UNITOR
Venses
institution
UNED, Spain
You Are My Guide, Italy
UPV/EHU, Spain
ISASI, Italy
QMUL-SDS-EECS, UK
CSE Department/SSNCE, India
SSN College of Engineering, India
UNIPD, Italy
UPV/EHU and UNED, Spain
UNINA, Italy
UNIROMA2, Italy
UNIVE, Italy
report
(Espinosa et al., 2020)
        <xref ref-type="bibr" rid="ref4">(Bennici, 2020)</xref>
        (Espinosa et al., 2020)
        <xref ref-type="bibr" rid="ref1 ref15 ref22 ref5">(Alkhalifa and Zubiaga, 2020)</xref>
        (Kayalvizhi et al., 2020)
        <xref ref-type="bibr" rid="ref5">(Bharathi et al., 2020)</xref>
        (Ferraccioli et al., 2020)
(Espinosa et al., 2020)
        <xref ref-type="bibr" rid="ref15">(Moraca et al., 2020)</xref>
        (Giorgioni et al., 2020)
(Delmonte, 2020)
submitting slight variations of the same model.
Overall we have 22 runs for Task A and 13 runs
for Task B.
5.1
      </p>
      <sec id="sec-7-1">
        <title>Task A: Textual Stance Detection</title>
        <p>The best results are achieved by the UNITOR team
that, with an unconstrained, ranked as 1st position
with F1avg = 0.6853. The best result for the
constrained runs is achieved once again by the
UNITOR team with F1avg = 0.6801.</p>
        <p>The best results for the two main classes
AGAINST and FAVOR are obtained by the three
best systems of the ranking, which are all
submissions by the team UNITOR. On the other hand,
though, the Deepreading team, ranking as 4th,
has obtained the best F1-score for the NONE class,
with F1none = 0.4213.</p>
        <p>Among the 12 participating teams, at least 6
show an improvement over the baseline, which
was computed using an SVM paired with token
unigrams as unique feature, resulting an already
strong result to beat (F1avg = 0.5784).
5.2</p>
      </sec>
      <sec id="sec-7-2">
        <title>Task B: Contextual Stance Detection</title>
        <p>The best scores are achieved by the IXA team that
with a constrained run obtained the highest score
of F1avg = 0.7445. The best F1-score for the
main classes AGAINST and FAVOUR is achieved
by the team ranked 1st, IXA, team with F1against =
0.8562, and F1f avour = 0.6329, respectively. Once
again, the Deepreading team, ranking 3rd and
4th, has obtained the best F1-score for the NONE
class, with F1none = 0.4251.</p>
        <p>
          Almost all participating systems show an
improvement over the baseline, which was computed
using a Logistic Regression classifier paired with
token n-grams features (unigrams, bigrams and
trigrams), features based on the network of retweets,
and features based on the network of friends
          <xref ref-type="bibr" rid="ref10">(Lai
et al., 2020)</xref>
          .
6
        </p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Discussion</title>
      <p>
        In this section we compare the participating
systems according to the following main dimensions:
system architecture, features, use of additional
annotated data for training, and use of external
resources (e.g. sentiment lexica, NLP tools, etc.).
We also operate a distinction between runs
submitted in Task A and those submitted in Task B.
This discussion is based on the participants’
reports and the answers the participants provided to
a questionnaire proposed by the organizers. Two
teams, namely TextWiller and Venses wrote a
joint report, overlapping between this task and the
HaSpeeDe 2 task
        <xref ref-type="bibr" rid="ref19">(Sanguinetti et al., 2020)</xref>
        , as they
participated in both competitions. The three
following teams, Deepreading, IXA, and UNED,
also wrote a unique report as the participants,
belong to the same research project and wanted to
compare their three different approaches.
6.1
      </p>
      <sec id="sec-8-1">
        <title>Systems participating to Task A</title>
        <p>System architecture. Among all submitted runs
we counted a great variety of architectures,
ranging from classical machine learning
classifiers, to recent state-of-the-art approaches, and
statistically-based models. For instance,
regarding the use of classical ML, the team
UninaStudents used a SVM, and the team MeSoVe used
Logistic Regression in one run. Regarding the
use of neural networks, the QMUL-SDS team
used bidirectional-LSTM, a CNN-2D, and a
biLSTM with attention. Also SSN_NLP exploited
the LSTM neural network.</p>
        <p>
          Four teams exploited different variants of the
BERT model: Ghostwriter used AlBERTo trained
on Italian tweets, IXA used GilBERTo and
UmBERTo4, while UNITOR adopted only this latter
model. Finally the Deepreading team made use
of transformers such as BERT XXL and
XMLRoBERTa, paired together with linear classifiers.
TextWiller is the only team to have exploited the
xg-boost algorithm, and ItVenses relied on
supervised models, based on statistics and semantics.
The UNED team proposed instead a voting
system among the output of different models.
Features. Besides having explored a variety of
system architectures, the teams participating in
Task A, also used many different textual features,
in the most of cases based on n-grams or
chargrams. MeSoVe and TextWiller additionally
engineered features based on emoticons. The team
UNED, in one of their runs, proposed a system
relying on psychologycal and social features, while
UninaStudents proposed features of uni-grams
of hashtags. Interestingly, UNITOR added
special tags to the texts, which are the result of a
classification with respect some so-called
“auxiliary task”. In particular, they trained three
classifiers based respectively on SENTIPOLC 2016
          <xref ref-type="bibr" rid="ref2">(Barbieri et al., 2016)</xref>
          for sentiment analysis
classification, on HaSpeeDe 2018
          <xref ref-type="bibr" rid="ref6">(Bosco et al., 2018)</xref>
          4https://huggingface.co/Musixmatch/
umberto-commoncrawl-cased-v1.
for hate speech detection, and on IronITA 2018
          <xref ref-type="bibr" rid="ref8">(Cignarella et al., 2018)</xref>
          for irony detection; and
they added three tags to each instance of the
SardiStance datasets with respect to these three
dimensions: sentiment, hate and irony. ItVenses
proposed features collected automatically from a
unique dictionary list, frequency of occurrence
of emojis and emoticons, and semantic features
investigating propositional level, factivity and
speech act type.
        </p>
        <p>Additional training data. The only team who
participated to the unconstrained setting of
SardiStance is UNITOR. They proposed two
unconstrained runs in addition to other two constrained
ones. For the unconstrained setting, they
downloaded and labeled about 3,200 tweets using
distant supervision and used the additional data to
train their systems. In particular they created the
following subsets:
- 1,500 AGAINST: tweets from 2019 containing
the hashtag: #gatticonsalvini;
- 1,000 FAVOUR: tweets from 2019 containing
the hashtags: #nessunotocchilesardine,
#iostoconlesardine, #unmaredisardine, #vivalesardine and
#forzasardine;
- 700 NONE/NEUTRAL: texts derived from news
titles. These were retrieved by querying to Google
news with the keyword “sardine”.</p>
        <p>
          Other resources. Five teams declared to have
used also other resources such as lexica, word
embeddings, or others. In particular, GhostWriter
used grammar model to rephrase the tweets.
MeSoVe exploited SenticNet
          <xref ref-type="bibr" rid="ref7">(Cambria et al.,
2014)</xref>
          and the “Nuovo vocabolario di base della
lingua italiana”.5 QMUL-SDS took advantage
of temporal embeddigns and FastText, while only
one team, UninaStudents, used a sentiment
lexicon: AFINN
          <xref ref-type="bibr" rid="ref16">(Nielsen, 2011)</xref>
          . Lastly, Venses
used a proprietary lexicon of Italian, enriched with
conceptual, semantic and syntactic information;
and similarly TextWiller approach relies on a
selfcreated vocabulary and trained word-embeddigs
on the corpus PAISÀ
          <xref ref-type="bibr" rid="ref11">(Lyding et al., 2014)</xref>
          .
6.2
        </p>
      </sec>
      <sec id="sec-8-2">
        <title>Systems participating to Task B</title>
        <p>Seven teams participated in Task B submitting
a total of 13 runs. Most teams extensively
explored the additional features available for Task B;
GhostWriter, on the contrary, proposes the same
5https://dizionario.internazionale.it.
two approaches presented in Task A. Notably, the
three runs with a score lower than the baseline do
not have benefited from any features based on the
users’ social network.</p>
        <p>System architecture. Most teams enriched the
models they submitted in Task A by taking
advantage of contextual information available in Task B.
UNED, DeepReading, and TextWiller exploited
the xg-boost algorithm selecting different features
from contextual data. The language model BERT
was used in different variants by SSNCSE-NLP,
DeepReading, and IXA. In particular, the last
two teams proposed three voting based ensemble
methods that use two or more models that
exploit the xg-boost algorithm. Furthermore, the
neural network framework proposed by
QMULSDS exploits and combine four different
embedding methods into a dense layer for generating the
final label using a softmax activation function.
Features. Not every team took full advantage of
contextual information. For example,
SSNCSENLP only exploits the number of friends in run
1, and the number of quotes and friends in run
2. In its run 1 UNED also exploited some
features based on the tweets in addition to the
psychological and emotional ones, using the xg-boost
algorithm. The other teams exploited different
approaches for learning vector representations of the
nodes of the available networks. DeepReading,
IXA, and UNED proposed a feature that computes
the mean distances of each user to the rest of users
whose stance is known. TextWiller experimented
a multi-dimensional scaling (MDS) for retaining
the first and second dimension for each of the four
networks instated. Node2vec and deepwalk for
learning a vector representation of the nodes of the
networks were used respectively in QMUL-SDS’s
runs 1 and 2.</p>
        <p>The comparison between the approaches
respectively used for dealing with Task A and
Task B, clearly highlights the benefits of
exploiting information from different and heterogeneous
sources. In particular, it is interesting to
observe that all the teams that participated to both
tasks, also produced better results in the second
setting. Experimenting with different classifiers
trained with the textual content of the tweets as
well as with features based on contextual
information (additional info on the tweets, on users, or
their social networks) seems therefore to allow to
obtain overall better results.</p>
        <p>In particular, among the 6 teams that
participated to both tasks, only 4 fully explored the social
network relations of the author of the tweet. The
only two runs that overcome the baseline
without investigating the structures of the social graphs
are those submitted by the SSNCSE-NLP team.
Only one team participated to both tasks
exploiting the same architecture. This, allowed us to
compare the F1-scores obtained in the first
setting with those obtained in the second,
highlighting that adding contextual features could increase
performance of +0.2432, in terms of F1avg.</p>
        <p>Additionally, we calculated the increment in
performance between the score obtained by the
run ranked as 1st position in Task A (UNITOR,
Favg = 0.6853) and the score of the run ranked as
1st position in Task B (IXA, Favg = 0.7445),
showing that taking advantage of contextual features
could increase performance up to 8,6% in terms
of F1avg.
7</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Conclusions</title>
      <p>We presented the first shared task on Stance
Detection for Italian, discussing the development of the
datasets used and the participation. A great panel
for discussions about techniques and
state-of-theart approaches has been opened which can be used
for investigating future research directions.</p>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgments</title>
      <p>The work of C. Bosco, M. Lai and V. Patti is
partially funded by the project “Be Positive!”
(under the 2019 “Google.org Impact Challenge on
Safety” call). The work of C. Bosco and V.
Patti is also partially funded by Progetto di
Ateneo/CSP 2016 Immigrants, Hate and Prejudice in
Social Media (S1618_L2_BOSC_01). The work
of P. Rosso is partially funded by the Spanish
MICINN under the research projects
MISMISFAKEnHATE on Misinformation and
Miscommunication in social media: FAKE news and HATE
speech (PGC2018-096212-B-C31) and
PROMETEO/2019/121 (DeepPattern) of the Generalitat
Valenciana.</p>
      <p>A special mention also to the people who helped
us with the annotation of the dataset. In random
order: Matteo, Luca, Ylenia, Simona, Elisa,
Sebastiano, Francesca, Simona, Komal and Angela,
thank you very much for your great help.</p>
      <p>You Shall Know a User by the Company It Keeps:
Dynamic Representations for Social Media Users in
NLP. In Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing
and the 9th International Joint Conference on
Natural Language Processing (EMNLP-IJCNLP 2019).</p>
      <p>ACL.</p>
      <p>Maria S. Espinosa, Rodrigo Agerri, Alvaro Rodrigo,
and Roberto Centeno. 2020. DeepReading @
SardiStance: Combining Textual, Social and
Emotional Features. In Proceedings of the 7th
Evaluation Campaign of Natural Language Processing and
Speech Tools for Italian (EVALITA 2020).
CEURWS.org.</p>
      <p>Federico Ferraccioli, Andrea Sciandra, Mattia Da Pont,
Paolo Girardi, Dario Solari, and Livio Finos. 2020.
TextWiller @ SardiStance, HaSpeede2: Text or
Con-text? A smart use of social network data in
predicting polarization. In Proceedings of the 7th
Evaluation Campaign of Natural Language
Processing and Speech Tools for Italian (EVALITA 2020).</p>
      <p>CEUR-WS.org.</p>
      <p>Simone Giorgioni, Marcello Politi, Samir Salman,
Danilo Croce, and Roberto Basili. 2020.
UNITOR@Sardistance2020: Combining
Transformerbased architectures and Transfer Learning for robust
Stance Detection. In Proceedings of the 7th
Evaluation Campaign of Natural Language Processing and
Speech Tools for Italian (EVALITA 2020).
CEURWS.org.</p>
      <p>S. Kayalvizhi, D. Thenmozhi, and Chandrabose
Aravindan. 2020. SSN_NLP@SardiStance : Stance
Detection from Italian Tweets using RNN and
Transformers. In Valerio Basile, Danilo Croce,
Maria Di Maro, and Lucia C. Passaro, editors,
Proceedings of the 7th Evaluation Campaign of Natural
Language Processing and Speech Tools for Italian
(EVALITA 2020). CEUR-WS.org.</p>
      <p>Dilek Küçük and Fazli Can. 2020. Stance detection: A
survey. ACM Computing Surveys, 53(1):1–37.
Mirko Lai, Viviana Patti, Giancarlo Ruffo, and Paolo
Rosso. 2018. Stance evolution and Twitter
interactions in an Italian political debate. In
Proceedings of the 23rd International Conference on
Natural Language &amp; Information Systems (NLDB 2018).</p>
      <p>Springer.</p>
      <p>Mirko Lai, Marcella Tambuscio, Viviana Patti,
Giancarlo Ruffo, and Paolo Rosso. 2019. Stance polarity
in political debates: A diachronic perspective of
network homophily and conversations on twitter. Data
&amp; Knowledge Engineering, 124:101738.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Rabab</given-names>
            <surname>Alkhalifa</surname>
          </string-name>
          and
          <string-name>
            <given-names>Arkaitz</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>QMULSDS @ SardiStance: Leveraging Network Interactions to Boost Performance on Stance Detection using Knowledge Graphs</article-title>
          .
          <source>In Proceedings of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2020</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Valerio Basile, Danilo Croce, Malvina Nissim, Nicole Novielli, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the EVALITA 2016 SENTIment POLarity Classification task</article-title>
          .
          <source>In Proceedings of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2016</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>EVALITA 2020: Overview of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          .
          <source>CEUR-WS.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Mauro</given-names>
            <surname>Bennici</surname>
          </string-name>
          .
          <year>2020</year>
          . ghostwriter19 @
          <article-title>SardiStance: Generating new tweets to classify SardiStance EVALITA 2020 political tweets</article-title>
          .
          <source>In Proceedings of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2020</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>B.</given-names>
            <surname>Bharathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bhuvana</surname>
          </string-name>
          , and Nitin Nikamanth Appiah Balaji.
          <year>2020</year>
          .
          <article-title>SardiStance@EVALITA2020: Textual and Contextual stance detection from Tweets using machine learning approach</article-title>
          .
          <source>In Proceedings of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2020</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          , Felice Dell'Orletta, Fabio Poletto, Manuela Sanguinetti, and
          <string-name>
            <given-names>Maurizio</given-names>
            <surname>Tesconi</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the Evalita 2018 Hate Speech Detection Task</article-title>
          .
          <source>In Proceedings of 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2018</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Erik</surname>
            <given-names>Cambria</given-names>
          </string-name>
          , Daniel Olsher, and
          <string-name>
            <given-names>Dheeraj</given-names>
            <surname>Rajagopal</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>SenticNet 3: a Common and Commonsense Knowledge Base for Cognition-driven Sentiment Analysis</article-title>
          .
          <source>In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI</source>
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Alessandra</given-names>
            <surname>Teresa</surname>
          </string-name>
          <string-name>
            <surname>Cignarella</surname>
          </string-name>
          , Simona Frenda, Valerio Basile, Cristina Bosco, Viviana Patti, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the EVALITA 2018 task on Irony Detection in Italian Tweets (IronITA)</article-title>
          .
          <source>In Proceedings of 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2018</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Marco Del Tredici</surname>
            , Diego Marcheggiani, Sabine Schulte im Walde, and
            <given-names>Raquel</given-names>
          </string-name>
          <string-name>
            <surname>Fernández</surname>
          </string-name>
          .
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Mirko</given-names>
            <surname>Lai</surname>
          </string-name>
          , Alessandra Teresa Cignarella, Delia Irazú Hernández Farías, Cristina Bosco, Viviana Patti, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Multilingual stance detection in social media political debates</article-title>
          .
          <source>Computer Speech &amp; Language</source>
          ,
          <volume>63</volume>
          (
          <issue>101075</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Verena</given-names>
            <surname>Lyding</surname>
          </string-name>
          , Egon Stemle, Claudia Borghetti, Marco Brunello, Sara Castagnoli, Felice Dell'Orletta, Henrik Dittmann, Alessandro Lenci, and
          <string-name>
            <given-names>Vito</given-names>
            <surname>Pirrelli</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The PAISA' Corpus of Italian Web Texts</article-title>
          .
          <source>In Proceedings of the 9th World Archaeological Congress (WAC-9)</source>
          <article-title>@ the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014)</article-title>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Walid</given-names>
            <surname>Magdy</surname>
          </string-name>
          , Kareem Darwish, Norah Abokhodair, Afshin Rahimi, and
          <string-name>
            <given-names>Timothy</given-names>
            <surname>Baldwin</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>#isisisnotislam or #deportallmuslims?: Predicting unspoken views</article-title>
          .
          <source>In Proceedings of the 8th ACM Conference on Web Science (WebSci</source>
          <year>2016</year>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Saif</given-names>
            <surname>Mohammad</surname>
          </string-name>
          , Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and
          <string-name>
            <given-names>Colin</given-names>
            <surname>Cherry</surname>
          </string-name>
          .
          <year>2016a</year>
          .
          <article-title>A Dataset for Detecting Stance in Tweets</article-title>
          .
          <source>In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC</source>
          <year>2016</year>
          ). ELRA.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Saif</given-names>
            <surname>Mohammad</surname>
          </string-name>
          , Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and
          <string-name>
            <given-names>Colin</given-names>
            <surname>Cherry</surname>
          </string-name>
          .
          <year>2016b</year>
          . SemEval
          <article-title>-2016 Task 6: Detecting Stance in Tweets</article-title>
          .
          <source>In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)</source>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Maurizio</given-names>
            <surname>Moraca</surname>
          </string-name>
          , Gianluca Sabella, and
          <string-name>
            <given-names>Simone</given-names>
            <surname>Morra</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>UninaStudents @ SardiStance: Stance detection in Italian tweets - Task A</article-title>
          .
          <source>In Proceedings of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2020</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Finn</given-names>
            <surname>Årup Nielsen</surname>
          </string-name>
          .
          <year>2011</year>
          . AFINN. Richard Petersens Plads, Building,
          <volume>321</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Ashwin</given-names>
            <surname>Rajadesingan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Huan</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Identifying users with opposing opinions in Twitter debates</article-title>
          .
          <source>In Proceedings of the 7th Social Computing</source>
          ,
          <string-name>
            <surname>Behavioral-Cultural Modeling</surname>
          </string-name>
          and Prediction International Conference (SBP-BRiMS
          <year>2014</year>
          ). Springer.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Francisco</given-names>
            <surname>Rangel</surname>
          </string-name>
          and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>On the implications of the general data protection regulation on the organisation of evaluation tasks</article-title>
          .
          <source>Language and Law / Linguagem e Direito</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <fpage>95</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Manuela</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , Gloria Comandini, Elisa Di Nuovo, Simona Frenda, Marco Stranisci, Cristina Bosco, Tommaso Caselli, Viviana Patti, and
          <string-name>
            <given-names>Irene</given-names>
            <surname>Russo</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>HaSpeeDe 2@EVALITA2020: Overview of the EVALITA 2020 Hate Speech Detection Task</article-title>
          .
          <source>In Proceedings of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2020</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Mariona</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Antònia</given-names>
            <surname>Martí</surname>
          </string-name>
          , Francisco M. Rangel Pardo, Paolo Rosso, Cristina Bosco, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Overview of the Task on Stance and Gender Detection in Tweets on Catalan Independence</article-title>
          .
          <source>In Proceedings of the 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          )
          <article-title>co-located with 33th Conference of the Spanish Society for Natural Language Processing (SEPLN</article-title>
          <year>2017</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Mariona</given-names>
            <surname>Taulé</surname>
          </string-name>
          , Francisco M. Rangel Pardo, M. Antò- nia
          <string-name>
            <surname>Martí</surname>
            , and
            <given-names>Paolo</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the Task on Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum</article-title>
          .
          <source>In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2018</year>
          )
          <article-title>co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN</article-title>
          <year>2018</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Jannis</given-names>
            <surname>Vamvas</surname>
          </string-name>
          and
          <string-name>
            <given-names>Rico</given-names>
            <surname>Sennrich</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>X-Stance: A Multilingual Multi-Target Dataset for Stance Detection</article-title>
          .
          <source>In Proceedings of the 5th Swiss Text Analytics Conference (SwissText 2020) &amp; 16th Conference on Natural Language Processing (KONVENS</source>
          <year>2020</year>
          ).
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>