<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evalita 2018: Overview on the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tommaso Caselli</string-name>
          <email>t.caselli@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicole Novielli</string-name>
          <email>nicole.novielli@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viviana Patti</string-name>
          <email>patti@di.unito.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Rosso</string-name>
          <email>prosso@dsic.upv.es</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Informatica, Universit degli Studi di Bari Aldo Moro</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Informatica, Universit degli Studi di Torino</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>PRHLT Research Center, Universitat Politcnica de Valncia</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Rijksuniversiteit Groningen</institution>
          ,
          <addr-line>Groningen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>7</lpage>
      <abstract>
        <p>EVALITA1 is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. Since 2007, the general objective of EVALITA is to promote the development and dissemination of language resources and technologies for Italian, providing a shared framework where different systems and approaches can be evaluated in a consistent manner. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC)2 and it is endorsed by the Italian Association for Artificial Intelligence (AI*IA)3 and the Italian Association for Speech Sciences (AISV)4.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <sec id="sec-1-1">
        <title>Hate Speech</title>
        <p>
          iLISTEN - itaLIan Speech acT labEliNg. This task consists in automatically annotating dialogue
turns with speech act labels, i.e. with the communicative intention of the speaker, such as statement,
request for information, agreement, opinion expression, general answer
          <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref3 ref5 ref6 ref8">(Basile and Novielli, 2018)</xref>
          .
IDIAL - Italian DIALogue systems evaluation. The task develops and applies evaluation protocols
for the quality assessment of dialogue systems for the Italian language. The target of the evaluation
are existing task-oriented dialogue systems, both from industry and academia
          <xref ref-type="bibr" rid="ref11 ref9">(Cutugno et al., 2018)</xref>
          .
AMI - Automatic Misogyny Identification. This task focuses on the automatic identification of
misogynous content both in English and in Italian languages in Twitter. More specifically, it is a
two-fold task. It includes: (i) a Misogyny Identification subtask consisting in a binary classification
of tweets as being either misogynous or not; (ii) a Misogynistic Behaviour and Target Classification
subtask aimed at classifying tweets according to different finer-grained types of misogynistic
behaviour detected, such as sexual harassment or discredit, and the target of the message (individuals
or group of people).
          <xref ref-type="bibr" rid="ref12 ref13">(Fersini et al., 2018a)</xref>
          ;
HaSpeeDe - Hate Speech Detection. This task is organized into three sub-tasks, concerning: (i)
the identification of hate speech on Facebook (HaSpeeDe-FB), (ii) the identification of hate speech
on Twitter (HaSpeeDe-TW), and (iii) the cross-dataset setting concerning the assessment of the
performance of the hate speech recognition system developed, i.e., when trained on Facebook data
and evaluated on Twitter data, and vice versa
          <xref ref-type="bibr" rid="ref7">(Bosco et al., 2018)</xref>
          .
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Semantics4AI</title>
        <p>
          NLP4FUN - Solving language games. This task consists in designing a solver for “The Guillotine”
game, inspired by an Italian TV show. The game involves a single player, who is given a set of five
words - the clues - each linked in some way to a specific word that represents the unique solution
of the game. Words are unrelated to each other, but each of them has a hidden association with
the solution. Once the clues are given, the player has to provide the unique word representing the
solution. The participant systems are required to build an artificial player able to solve the game
          <xref ref-type="bibr" rid="ref3 ref5 ref6 ref8">(Basile et al., 2018b)</xref>
          .
        </p>
        <p>
          SUGAR - Spoken Utterances Guiding Chef ’s Assistant Robots. This task goal is to develop a
voicecontrolled robotic agent to act as a cooking assistant. To this aim, a train corpus of spoken
commands is collected and annotated using a 3D virtual environment that simulates a real kitchen where
users can interact with the robot. The task specifically focuses on a set of commands, whose
semantics is defined according to the various possible combination of actions, items (i.e. ingredients),
tools and different modifiers
          <xref ref-type="bibr" rid="ref11 ref9">(Di Maro et al., 2018)</xref>
          .
3
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Fostering Reproducibility and Cross-community Engagement</title>
      <p>Open access to resources and research artifacts, such as data, tools, and dictionaries, is deemed crucial for
the advancement of the state of the art in scientific research. Accessibility of resources and experimental
protocols enable both full and partial replication of studies in order to further validate their findings,
towards building of new knowledge based on solid empirical evidence. To foster reproducibility and
encourage follow-up studies leveraging the resources built within EVALITA 2018, we introduced two
novelties this year. First of all, we intend to distribute all datasets used as benchmark for the tasks
of this edition. To this aim, we have set up a repository on Github5, in line with the good practices
already applied by the organizers of the previous edition6. Also, the datasets for all the tasks will be
5The dataset of EVALITA 2018 made available by the task organizers can be found at: https://github.com/
evalita2018/data</p>
      <p>6The datasets of EVALITA 2016 can be found at: https://github.com/evalita2016/data
hosted and distributed by the European Language and Resources Association (ELRA). In addition, we
decided to further encourage the sharing of resources by making availability of the systems an eligibility
requirement for the best system award (see Section 4).</p>
      <p>
        In the same spirit, we encouraged cross-community involvement in both task organization and
participation. We welcomed the initiative of the organizers of AMI, the Automatic Misogyny Identification
task
        <xref ref-type="bibr" rid="ref12 ref13">(Fersini et al., 2018a)</xref>
        , focusing on both English and Italian tweets. This task has been proposed
first at IberEval 2018 for Spanish and English
        <xref ref-type="bibr" rid="ref12 ref13">(Fersini et al., 2018b)</xref>
        , and then re-proposed at Evalita
for Italian, and again for English with a new dataset for training and testing. The ITAmoji shared task
was also a re-proposal for the Italian language of the Multilingual Emoji Prediction Task at International
Workshop on Semantic Evaluation (SemEVAL 2018)
        <xref ref-type="bibr" rid="ref14 ref2">(Barbieri et al., 2018)</xref>
        , which focused on English
and Spanish. Here the re-proposal of the task at Evalita was driven by twofold aim to widen the setting
for cross-language comparisons for emoji prediction in Twitter and to experiment with novel metrics to
better assess the quality of the automatic predictions, also proposing a comparison with human
performances on the same task.
      </p>
      <p>
        In the 2016 edition task organisers were encouraged to collaborate on the creation of a shared test
set across tasks
        <xref ref-type="bibr" rid="ref4">(Basile et al., 2017)</xref>
        . We were happy to observe that also this year this practice was
maintained. In particular, a portion of the dataset of IronITA
        <xref ref-type="bibr" rid="ref8">(Cignarella et al., 2018)</xref>
        , the task on irony
detection in Twitter, partially overlaps with the dataset of the hate speech detection task (HaSpeeDe)
        <xref ref-type="bibr" rid="ref7">(Bosco et al., 2018)</xref>
        . The intersection includes tweets related to three social groups deemed as
potential target for hate speech online: immigrants, Muslims and Roma. Also, the sentiment corpora
with multi-layer annotations developed in last years by the EVALITA community, which included also
morpho-syntactic and entity linking annotations, were exploited by some ABSITA
        <xref ref-type="bibr" rid="ref3 ref5 ref6 ref8">(Basile et al., 2018a)</xref>
        and IronITA
        <xref ref-type="bibr" rid="ref8">(Cignarella et al., 2018)</xref>
        participants to address the finer-grained sentiment related tasks
proposed this year under the Affect, Creativity and Style track.
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Award: Best System Across-tasks</title>
      <p>For the first time, this year we decided to award the best system across-task, especially that of young
researchers. The award was introduced with the aim of fostering student participation to the evaluation
campaign and to the workshop, and received a funding from Google Research, CELI7, and from the
European Language and Resources Association (ELRA)8.</p>
      <p>Criteria for eligibility, are (i) the availability of the system as open source software by the end of the
evaluation period, when the results are due to the task organizers, and (ii) the presence of at least one PhD
candidate, a master or a bachelor student among the authors of the final report describing the system. The
systems will be evaluated based on:
novelty, to be declined as novelty of the approach with respect to the state of the art (e.g. a new
model or algorithm), or novelty of features (for discrete classifiers);
originality, to be declined as identification of new linguistic resources employed to solve the task
(for instance, using WordNet should not be considered as a new resource), or identification of
linguistically motivated features; or implementation of theoretical framework grounded in linguistics;
critical insight, to be declined as a deep error analysis that highlights limits of the current system
and pave direction to future challenges; technical soundness and methodological rigor.</p>
      <p>We collected 7 system nominations from the organizers of 5 tasks belonging to the Affect, Creativity
and Style track and to the Hate Speech track. 14 students were involved in the development of the systems
which received a mentions: 7 PhD students and and 7 master students. Most students are enrolled in
Italian universities, but 5 of them. The award recipient(s) will be announced during the final EVALITA
workshop, co-located with CliC-it 2018, the Fifth Italian Conference on Computational Linguistics9.
7https://www.celi.it/
8http://elra.info/en/
9http://clic2018.di.unito.it/it/home/</p>
    </sec>
    <sec id="sec-4">
      <title>Participation</title>
      <p>The tasks and the challenge of EVALITA 2018 attracted the interest of a large number of researchers from
academia and industry, for a total of 237 single preliminary registrations. Overall, 50 teams composed
of 115 individuals from 13 different countries participated to one or more tasks, submitting a total of 34
system descriptions.</p>
      <p>A breakdown of the figures per task is shown in Table 1. With respect to the 2016 edition, we collected
a significantly higher number of both preliminary registrations (237 registrations vs. 96 collected in
2016), teams (50 vs. 34 in 2016), and participants (115 10 vs. 60 in 2016), that can be interpreted as
a signal that we succeeded in reaching a wider audience of researchers interested in participating in the
campaign as well as a further indication of the growth of the NLP community at large. This result could
be also positively affected by the novelties introduced this year to involve cross-community participation,
represented by the ITAMoji and AMI tasks. Indeed, of the 50 teams that submitted at least one run, 12
include researchers from foreign institutions. In addition to this, this year all tasks have received at least
one submission.</p>
      <p>
        A further aspect of the success for this edition can be due to the tasks themselves, especially the
“Affect, Creativity and Style” and the “Hate Speech” tracks. Although these two tracks cover 60% of
all tasks, they have collected the participation of 82% of the teams (41 teams). This is clearly a sign of
growing interest in the NLP community at large in the study and analysis of new text types such as those
produced in Social Media platforms and (on-line) user-generated content, also reflecting the outcome of
the 2016 survey
        <xref ref-type="bibr" rid="ref15 ref4">(Sprugnoli et al., 2016)</xref>
        .
      </p>
      <p>
        Finally, we consider the new protocol for the submission of participants’ runs, consisting in three
nonoverlapping evaluation windows, as a further factor that may have positively impact the participation.
Indeed, from the 2016 survey, it emerges that the main reasons for not participating in the evaluation
either refer to personal issues or preferences (“I gave priority to other EVALITA tasks”) also due to the
difficulty of participating in the evaluation step of all tasks simultaneously, as the evaluation period was
perceived as too short to enable participation to more than one task
        <xref ref-type="bibr" rid="ref15 ref4">(Sprugnoli et al., 2016)</xref>
        . Although
appreciated by the EVALITA participants, this is not a major cause of the increased participation: out of
50 teams, only 6 have participated in more than one task.
      </p>
      <p>Finally, it is compelling to open a reflection on the distinction between constrained and unconstrained
submissions and participation to the tasks. Half of the tasks, namely ABSITA, ITAMOji, IronITA, and
AMI, paid attention to this distinction and the other half did not take it into account. In early
evaluation campaigns, the distinction used to be very relevant as it aimed at distinguishing the contribution
of features or the learning approach from external sources of information, mainly intended as lexical
10Please note that the unique participants that also submitted a report are 68. This drop is mainly due to the participation to
more than one task, resulting in the submission of only one report from the same team.
resources. In recent years, the spread and extensive use of pre-trained word embedding representations,
especially as a strategy to initialize Neural Network architectures, challenges this distinction at its very
heart. Furthermore, this distinction is also challenged by the development of multi-task learning
architectures. A multi-task system could definitely represent an instance of an unconstrained system, although
it exploits data from a different task, rather than a lexical resource or additional data annotated with the
same information as that in the main task. As a contribution to the discussion on this topic, we think that
proponents of tasks that aim at differentiating between constrained and unconstrained runs must specify
what are the actual boundaries, in terms of extra training data, auxiliary tasks, use of word embeddings
and lexical resources.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Final Remarks</title>
      <p>For this edition of EVALITA we introduced novelties towards supporting reproducibility and
crosscommunity engagement, towards advancement of methodology and techniques for natural language and
speech processing tasks beyond the performance improvement, which is typically used as a metrics to
assess state of the art approaches in benchmarking and shared task organization. In particular, the
decision to award the best-system across tasks is inspired by this vision and aim at emphasizing the value of
critical reflection and insightful discussion beyond the metric-based evaluation of participating systems.</p>
      <p>
        In line with the suggestion provided by the organizers of the previous edition in 2016
        <xref ref-type="bibr" rid="ref1 ref15 ref3 ref4 ref4">(Basile et al.,
2016; Sprugnoli et al., 2016)</xref>
        , we introduced a novel organization of the evaluation period based on
non-overlapping windows, in order to help those who want to participate in more than one task. This
year EVALITA has reached a new milestone concerning the participation of industry. Overall, we have
registered a total of 9 industrial participants: 7 directly participated to tasks, 6 of them submitted a paper,
and 2 were involved as “targets” of an evaluations exercise
        <xref ref-type="bibr" rid="ref11 ref9">(Cutugno et al., 2018)</xref>
        .
      </p>
      <p>
        Finally, a new trend that has emerged this year is the presence of tasks, GxG and HaSpeeDe, that
aimed at testing the robustness of systems across text genres, further challenging the participants to
develop their system. This “extra challenge” aspect is a new trend in EVALITA that started with the
2016 SENTIPOLC task
        <xref ref-type="bibr" rid="ref1">(Barbieri et al., 2016)</xref>
        , where the text genre was not changed but the test data
was partially created using tweets that do not exactly match the selection procedure used for the creation
of the training set.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We would like to thank our sponsors CELI11, Google Research and the European Language and
Resources Association (ELRA)12 for their support to the event and to the best-system across task award. A
further thank goes to ELRA for its offer and support in hosting the task datasets and systems’ results. We
also thanks Agenzia per l’Italia Digitale (AGID)13 for its endorsement.
11https://www.celi.it/
12http://elra.info/en/
13https://www.agid.gov.it
evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA’18), Turin, Italy.</p>
      <p>CEUR.org.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Basile Valerio, Croce Danilo, Nissim Malvina, Novielli Nicole, and
          <string-name>
            <given-names>Patti</given-names>
            <surname>Viviana</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the Evalita 2016 SENTIment POLarity Classification Task</article-title>
          . In Pierpaolo Basile, Franco Cutugno, Malvina Nissim, Viviana Patti, and Rachele Sprugnoli, editors,
          <source>Proceedings of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2016</year>
          ), Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Jose Camacho-Collados, Francesco Ronzano, Luis Espinosa Anke, Miguel Ballesteros, Valerio Basile, Viviana Patti, and
          <string-name>
            <given-names>Horacio</given-names>
            <surname>Saggion</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Semeval 2018 task 2: Multilingual emoji prediction</article-title>
          .
          <source>In Proceedings of The 12th International Workshop on Semantic Evaluation</source>
          , pages
          <fpage>24</fpage>
          -
          <lpage>33</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nicole</given-names>
            <surname>Novielli</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the Evalita 2018 itaLIan Speech acT labEliNg (iLISTEN) Task</article-title>
          . In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the 6th Pierpaolo Basile</source>
          , Franco Cutugno, Malvina Nissim, Viviana Patti, and
          <string-name>
            <given-names>Rachele</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>EVALITA 2016: Overview of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          . In Pierpaolo Basile, Franco Cutugno, Malvina Nissim, Viviana Patti, and Rachele Sprugnoli, editors,
          <source>Proceedings of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2016</year>
          ), Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Malvina Nissim, Rachele Sprugnoli, Viviana Patti, and
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Cutugno</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Evalita goes social: Tasks, data, and community at the 2016 edition</article-title>
          .
          <source>Italian Journal of Computational Linguistics</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Valerio Basile, Danilo Croce, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Polignano</surname>
          </string-name>
          . 2018a.
          <article-title>Overview of the EVALITA 2018 Aspect-based Sentiment Analysis task (ABSITA)</article-title>
          .
          <source>In Tommaso Caselli</source>
          , Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</source>
          , Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Marco de Gemmis, Lucia Siciliani, and
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Semeraro</surname>
          </string-name>
          . 2018b.
          <article-title>Overview of the EVALITA 2018 Solving language games (NLP4FUN) Task</article-title>
          . In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</source>
          , Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          , Felice Dell'Orletta, Fabio Poletto, Manuela Sanguinetti, and
          <string-name>
            <given-names>Maurizio</given-names>
            <surname>Tesconi</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the EVALITA 2018 Hate Speech Detection Task</article-title>
          . In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</source>
          , Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Alessandra</given-names>
            <surname>Teresa</surname>
          </string-name>
          <string-name>
            <surname>Cignarella</surname>
          </string-name>
          , Simona Frenda, Valerio Basile, Cristina Bosco, Viviana Patti, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the EVALITA 2018 Task on Irony Detection in Italian Tweets (IronITA)</article-title>
          . In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</source>
          , Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Cutugno</surname>
          </string-name>
          , Maria Di Maro, Sara Falcone, Marco Guerini, Bernardo Magnini, and Antonio Origlia.
          <year>2018</year>
          .
          <article-title>Overview of the EVALITA 2018 Evaluation of Italian DIALogue systems (IDIAL) Task</article-title>
          . In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</source>
          , Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Felice</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          and
          <string-name>
            <given-names>Malvina</given-names>
            <surname>Nissim</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the EVALITA 2018 Cross-Genre Gender Prediction (GxG) Task</article-title>
          . In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</source>
          , Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Maria</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Maro</surname>
          </string-name>
          , Antonio Origlia, and
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Cutugno</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the EVALITA 2018 Spoken Utterances Guiding Chef's Assistant Robots (SUGAR) Task</article-title>
          . In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</source>
          , Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          , Debora Nozza, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          . 2018a.
          <article-title>Overview of the Evalita 2018 Task on Automatic Misogyny Identification (AMI)</article-title>
          .
          <source>In Tommaso Caselli</source>
          , Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</source>
          , Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          , Paolo Rosso, and
          <string-name>
            <given-names>Maria</given-names>
            <surname>Anzovino</surname>
          </string-name>
          . 2018b.
          <article-title>Overview of the Task on Automatic Misogyny Identification at IberEval 2018</article-title>
          . In Paolo Rosso, Julio Gonzalo, Raquel Mart´ınez, Soto Montalvo, and Jorge Carrillo de Albornoz, editors,
          <source>Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2018</year>
          )
          <article-title>co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN</article-title>
          <year>2018</year>
          ), Sevilla, Spain,
          <year>September 18th</year>
          ,
          <year>2018</year>
          ., volume
          <volume>2150</volume>
          <source>of CEUR Workshop Proceedings</source>
          , pages
          <fpage>214</fpage>
          -
          <lpage>228</lpage>
          . CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Ronzano</surname>
          </string-name>
          , Francesco Barbieri, Endang Wahyu Pamungkas, Viviana Patti, and
          <string-name>
            <given-names>Francesca</given-names>
            <surname>Chiusaroli</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the EVALITA 2018 Italian Emoji Prediction (ITAMoji) Task</article-title>
          . In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</source>
          , Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Rachele</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          , Viviana Patti, and
          <string-name>
            <given-names>Cutugno</given-names>
            <surname>Franco</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Raising Interest and Collecting Suggestions on the EVALITA Evaluation Campaign</article-title>
          . In Pierpaolo Basile, Franco Cutugno, Malvina Nissim, Viviana Patti, and Rachele Sprugnoli, editors,
          <source>Proceedings of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA</source>
          <year>2016</year>
          ), Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>