<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>EVALITA 2016: Overview of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pierpaolo Basile</string-name>
          <email>basilepp@di.uniba.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Franco Cutugno</string-name>
          <email>cutugno@unina.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Malvina Nissim</string-name>
          <email>m.nissim@rug.nl</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viviana Patti</string-name>
          <email>patti@di.unito.it</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rachele Sprugnoli</string-name>
          <email>sprugnoli@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FBK and University of Trento</institution>
          ,
          <addr-line>Via Sommarive, 38123 Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University Federico II</institution>
          ,
          <addr-line>Via Claudio 21, 80126 Naples</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Bari</institution>
          ,
          <addr-line>Via E.Orabona, 4, 70126 Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Groningen</institution>
          ,
          <addr-line>Oude Kijk in t Jatstraat 26, 9700 AS Groningen, NL</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Turin</institution>
          ,
          <addr-line>c.so Svizzera 185, I-10149 Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>EVALITA1 is the evaluation campaign of Natural Language Processing and Speech Tools for the Italian language. The aim of the campaign is to improve and support the development and dissemination of resources and technologies for Italian. Indeed, many shared tasks, covering the analysis of both written and spoken language at various levels of processing, have been proposed within EVALITA since its first edition in 2007. EVALITA is an initiative of the Italian Association for Computational Linguistics2 (AILC) and it is endorsed by the Italian Association of Speech Science3 (AISV) and by the NLP Special Interest Group of the Italian Association for Artificial Intelligence4 (AI*IA). Following the success of the four previous editions, we organised EVALITA 2016 around a set of six shared tasks and an application challenge. In EVALITA 2016 several novelties were introduced on the basis of the outcome of two questionnaires and of the fruitful discussion that took place during the panel “Raising Interest and Collecting Suggestions on the EVALITA Evaluation Campaign” held in the context of the second Italian Computational Linguistics Conference (CLiC-it 2015) (Sprugnoli et al., 2016). Examples of these novelties are a greater involvement of industrial companies in the organisation of tasks, the introduction of a task and a challenge that are strongly application-oriented, and the creation of cross-task shared data. Also, a strong focus has been placed on using social media data, so as to promote the investigation into the portability and adaptation of existing tools, up to now mostly developed for the newswire domain.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        FactA – Event Factuality Annotation. In this task, the factuality profiling of events is represented
by means of three attributes associated to event mentions, namely: certainty, time, and polarity.
Participating systems were required to provide the values for these three attributes
        <xref ref-type="bibr" rid="ref6">(Minard et al.,
2016)</xref>
        .
      </p>
      <p>
        NEEL-it – Named Entity rEcognition and Linking in Italian Tweets. The task consists in
automatically annotating each named entity mention (belonging to the following categories: Thing, Event,
Character, Location, Organization, Person and Product) in a tweet by linking it to the DBpedia
knowledge base
        <xref ref-type="bibr" rid="ref2 ref3">(Basile et al., 2016)</xref>
        .
      </p>
      <p>
        PoSTWITA – POS tagging for Italian Social Media Texts. The task consists in Part-Of-Speech
tagging tweets, rather than more standard texts, that are provided in their already tokenised form
        <xref ref-type="bibr" rid="ref4">(Bosco et al., 2016)</xref>
        .
      </p>
      <p>
        QA4FAQ – Question Answering for Frequently Asked Questions. The goal of this task is to develop
a system retrieving a list of relevant FAQs and corresponding answers related to a query issued by
an user
        <xref ref-type="bibr" rid="ref3 ref5">(Caputo et al., 2016)</xref>
        .
      </p>
      <p>
        SENTIPOLC – SENTIment POLarity Classification. The task consists in automatically annotating
tweets with a tuple of boolean values indicating the messages subjectivity, its polarity (positive or
negative), and whether it is ironic or not
        <xref ref-type="bibr" rid="ref2">(Barbieri et al., 2016)</xref>
        .
      </p>
      <p>
        Application Challenge In addition to the more standard tasks described above, for the first time
EVALITA included a challenge, organised by IBM Italy. The IBM Watson Services Challenge’s aim
is to create the most innovative app on Bluemix services5, which leverages at least one Watson Service,
with a specific focus on NLP and speech services for Italian
        <xref ref-type="bibr" rid="ref1">(http://www.evalita.it/2016/
tasks/ibm-challenge)</xref>
        .
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Participation</title>
      <p>The tasks and the challenge of EVALITA 2016 attracted the interest of a large number of researchers,
for a total of 96 single registrations. Overall, 34 teams composed of more than 60 individual participants
from 10 different countries6 submitted their results to one or more different tasks of the campaign.</p>
      <p>A breakdown of the figures per task is shown in
Table 1. With respect to the 2014 edition, we collected a
significantly higher number of registrations (96 registrations Table 1: Registered and actual participants
vs 55 registrations collected in 2014), which can be inter- task registered actual
preted as a signal that we succeeded in reaching a wider ARTIPHON 6 1
audience of researchers interested in participating in the FactA 13 0
campaign. This result could be also be positively affected NEEL-IT 16 5
by the novelties introduced this year to improve the dis- QA4FAQ 13 3
semination of information on EVALITA, e.g. the use of PoSTWITA 18 9
social media such as Twitter and Facebook. Also the num- SENTIPOLC 24 13
ber of teams that actually submitted their runs increased in IBM Challenge 6 3
2016 (34 teams vs 23 teams participating in the 2014
edition), even if we reported a substantial gap between the total 96 34
number of actual participants and those who registered.</p>
      <p>In order to better investigate this issue and gather some insights on the reasons of the significant drop
in the number of participants w.r.t. the registrations collected, we ran an online questionnaire specifically
designed for those who did not submit any run to the task to which they were registered. In two weeks
we collected 14 responses which show that the main obstacles to the actual participation in a task were
related to personal issues (“I had an unexpected personal or professional problem outside EVALITA” or
“I underestimated the effort needed”) or personal choices (“I gave priority to other EVALITA tasks”). As
for this last point, NEEL-it and SENTIPOLC were preferred to FactA, which did not have any participant.
Another problem mentioned by some of the respondents is that the evaluation period was too short: this
issue is highlighted mostly by those who registered to more than one task.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Making Cross-task Shared Data</title>
      <p>As an innovation at this year’s edition, we aimed at creating datasets that would be shared across tasks so
as to provide the community with multi-layered annotated data to test end-to-end systems. In this sense,
we encouraged task organisers to annotate the same instances, each task with their respective layer. The
involved tasks were: SENTIPOLC, PoSTWITA, NEEL-it and FactA.</p>
      <p>The testsets for all four
tasks comprise exactly the
same 301 tweets, although Table 2: Overview of cross-task shared data. Number of tweets are
Sentipolc has a larger test- reported. When the figure is marked with a *, it is instead the number
set of 2000 tweets, and of sentences from newswire documents.</p>
      <p>FactA has an additional non- TRAIN
social media testset of 597 SENTIPOLC NEEL-it PoSTWITA FactA
newswire sentences. More- SENTIPOLC 7410 989 6412 0
over, the training sets of NEEL-it 989 1000 0 0
PoSTWITA and NEEL-it are PoSTWITA 6412 0 6419 0
almost entirely subsets of FactA 0 0 0 2723*
SENTIPOLC. 989 tweets
from the 1000 that make TEST
NEEL-it’s training set are SENTIPOLC NEEL-it PoSTWITA FactA
in SENTIPOLC, and 6412 SENTIPOLC 2000 301 301 301
of PoSTWITA (out of 6419) NEEL-it 301 301 301 301
also are included in the PoSTWITA 301 301 301 301
SENTIPOLC training set. FactA 301 301 301 597*+301</p>
      <p>
        The matrix in Table 2
shows both the total number
of test instances per task (diagonally) as well as the number of overlapping instances for each task pair.
Please note that while SENTIPOLC, NEEL-it, and PoSTWITA provided training and test sets made up
entirely of tweets, FactA included tweets only in one of their test set, as a pilot task. FactA’s training
and standard test sets are composed of newswire data, which we report in terms of number of sentences
        <xref ref-type="bibr" rid="ref6">(Minard et al., 2016)</xref>
        . For this reason the number of instances in Table 2 is broken down for FactA’s test
set: 597 newswire sentences and 301 tweets, the latter being the same as the other tasks.
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Towards Future Editions</title>
      <p>On the basis of this edition’s experience, we would like to conclude with a couple of observations that
prospective organisers might find useful when designing future editions.</p>
      <p>Many novelties introduced in EVALITA 2016 proved to be fruitful in terms of cooperation between
academic institutions and industrial companies, balance between research and applications, quantity and
quality of annotated data provided to the community. In particular, the involvement of representatives
from companies in the organisation of tasks, the development of shared data, the presence of
applicationoriented tasks and challenge are all elements that could be easily proposed also in future EVALITA
editions.</p>
      <p>Other innovations can be envisaged for the next campaign. For example, in order to help those who
want to participate in more than one task, different evaluation windows for different tasks could be
planned instead of having the same evaluation deadlines for all. Such kind of flexibility could foster the
participation of teams to multiple tasks, but the fact that it impacts on the work load of the EVALITA’s
organizers should not be underestimated. Moreover, social media texts turned out to be a very attractive
domain but others could be explored as well. For instance, Humanities resulted as one of the most
appealing domains in the questionnaires for industrial companies and former participants and other countries
are organising evaluation exercises on it (see, for example, the Translating Historical Text shared task at
CLIN 277).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Leonardo</given-names>
            <surname>Badino</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>The ArtiPhon Challenge at Evalita 2016</article-title>
          . In Pierpaolo Basile, Anna Corazza, Franco Cutugno, Simonetta Montemagni, Malvina Nissim, Viviana Patti, Giovanni Semeraro, and Rachele Sprugnoli, editors,
          <source>Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ).
          <article-title>Associazione Italiana di Linguistica Computazionale (AILC).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Valerio Basile, Danilo Croce, Malvina Nissim, Nicole Novielli, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the EVALITA 2016 SENTiment POLarity Classification Task</article-title>
          . In Pierpaolo Basile, Anna Corazza, Franco Cutugno, Simonetta Montemagni, Malvina Nissim, Viviana Patti, Giovanni Semeraro, and Rachele Sprugnoli, editors,
          <source>Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ).
          <article-title>Associazione Italiana di Linguistica Computazionale (AILC).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Annalina Caputo, Anna Lisa Gentile, and
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Rizzo</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the EVALITA 2016 Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) Task</article-title>
          . In Pierpaolo Basile, Anna Corazza, Franco Cutugno, Simonetta Montemagni, Malvina Nissim, Viviana Patti, Giovanni Semeraro, and Rachele Sprugnoli, editors,
          <source>Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ).
          <article-title>Associazione Italiana di Linguistica Computazionale (AILC).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          , Fabio Tamburini, Andrea Bolioli, and
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Mazzei</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the EVALITA 2016 Part Of Speech on TWitter for ITAlian Task</article-title>
          . In Pierpaolo Basile, Anna Corazza, Franco Cutugno, Simonetta Montemagni, Malvina Nissim, Viviana Patti, Giovanni Semeraro, and Rachele Sprugnoli, editors,
          <source>Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ).
          <article-title>Associazione Italiana di Linguistica Computazionale (AILC).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Annalina</given-names>
            <surname>Caputo</surname>
          </string-name>
          , Marco de Gemmis, Pasquale Lops, Franco Lovecchio, and
          <string-name>
            <given-names>Vito</given-names>
            <surname>Manzari</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the EVALITA 2016 Question Answering for Frequently Asked Questions (QA4FAQ) Task</article-title>
          . In Pierpaolo Basile, Anna Corazza, Franco Cutugno, Simonetta Montemagni, Malvina Nissim, Viviana Patti, Giovanni Semeraro, and Rachele Sprugnoli, editors,
          <source>Proceedings of Third Italian Conference on Computational Linguistics (CLiCit</source>
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ).
          <article-title>Associazione Italiana di Linguistica Computazionale (AILC).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Anne-Lyse</surname>
            <given-names>Minard</given-names>
          </string-name>
          , Manuela Speranza, and
          <string-name>
            <given-names>Tommaso</given-names>
            <surname>Caselli</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>The EVALITA 2016 Event Factuality Annotation Task (FactA)</article-title>
          . In Pierpaolo Basile, Anna Corazza, Franco Cutugno, Simonetta Montemagni, Malvina Nissim, Viviana Patti, Giovanni Semeraro, and Rachele Sprugnoli, editors,
          <source>Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ).
          <article-title>Associazione Italiana di Linguistica Computazionale (AILC).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Rachele</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          , Viviana Patti, and
          <string-name>
            <given-names>Franco</given-names>
            <surname>Cutugno</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Raising Interest and Collecting Suggestions on the EVALITA Evaluation Campaign</article-title>
          . In Pierpaolo Basile, Anna Corazza, Franco Cutugno, Simonetta Montemagni, Malvina Nissim, Viviana Patti, Giovanni Semeraro, and Rachele Sprugnoli, editors,
          <source>Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ).
          <article-title>Associazione Italiana di Linguistica Computazionale (AILC).</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>