<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Computer Science Review</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>NERMuD at EVALITA 2023: Overview of the Named-Entities Recognition on Multi-Domain Documents Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessio Palmero Aprosio</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Teresa Paccosi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Psicologia e Scienze Cognitive, Università di Trento</institution>
          ,
          <addr-line>Corso Bettini 84, I-38068 Rovereto (TN)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Via Sommarive 18, I-38121 Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>29</volume>
      <issue>2018</issue>
      <fpage>282</fpage>
      <lpage>289</lpage>
      <abstract>
        <p>In this paper, we describe NERMuD, a Named-Entities Recognition (NER) shared task presented at the EVALITA 2023 evaluation campaign. NERMuD is organized into two diferent sub-tasks: a domain-agnostic classification and a domainspecific one. We display the evaluation of the system presented by the only task participant, ExtremITA. ExtremITA proposes a unified approach for all the tasks of EVALITA 2023, and it addresses in our case only the domain-agnostic sub-task. We present an updated version of KIND, the dataset distributed for the training of the system. We then provide the baselines proposed, the results of the evaluation, and a brief discussion.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivation</title>
      <p>([8]), NEEL-IT 2016 [9] and NER 2011 [10].</p>
      <p>
        The rest of this article is structured as follows.
SecNamed-entity recognition (NER) is one of the most com- tion 2 describes the task, and Section 3 gives an overview
mon and important task in the field of Natural Language of the dataset provided. In Section 4 we portray the
Processing (NLP). It involves identifying and classifying baseline and the evaluation metric, while in Section 5
mentions of entities in texts and it is widely used in ap- we describe the work of the participant ExtremITA. In
plications such as text understanding [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], information the end, Section 6 contains a brief discussion, while in
retrieval [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], knowledge base construction [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and the Section 7 we draw some conclusions.
protection of personal data [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These entities can belong
to a set of predefined categories, with people, locations,
and organizations being the most common ones. 2. Task description
      </p>
      <p>Manually annotated data play a crucial role in training
and evaluating NER systems, similar to other NLP tasks. In this Section, we describe NERMuD, a task presented
Systems trained on datasets from specific domains often at EVALITA 2023 [11] that involves the extraction and
do not perform well when applied to diferent types of classification of named entities – including persons,
ortexts [5]. ganizations, and locations – from documents in various</p>
      <p>NER has been addressed in almost all languages, in- domains.
dicating a significant interest in the topic [ 6]. It is an NERMuD 2023 includes two diferent sub-tasks:
important task in its own right, as it can be used to
process large archival collections. While NER is considered
a solved task, some studies have shown that there is
always room for improvement depending on factors such
as labels, languages, and topics [7]. It is worth noting
that, despite the great number of studies on this topic,
datasets and tasks for NER often focus on news and, more
recently, social media, as seen in initiatives like I-CAB
• Domain-agnostic classification (DAC).
Participants are required to select and classify entities
into three categories (person, organization,
location) from diferent types of texts (news, fiction,
political speeches) using a single general model.
• Domain-specific classification (DSC).
Participants are required to make use of a diferent
model for each of the above types, trying to
increase the accuracy of every considered type.</p>
      <p>Each participant can submit up to 3 runs for each sub- 3.1. Wikinews (WN)
task.</p>
      <p>The runs should be contained in a TSV file with fields
delimited by a tab and it should follow the same format
of the training dataset. No missing data are allowed: a
label should be predicted for each token in the test set.</p>
    </sec>
    <sec id="sec-2">
      <title>3. Available dataset</title>
      <sec id="sec-2-1">
        <title>Wikinews is a multi-language free-content project of col</title>
        <p>laborative journalism. The Italian chapter contains more
than 11,000 news articles,2 released under the Creative
Commons Attribution 2.5 License.3</p>
        <p>In building the dataset, we randomly choose 1,198
articles evenly distributed in the last 20 years, for a total
of 364,816 tokens.</p>
        <p>The corpus that can be used for training is the Kessler 3.2. Literature (FIC)
Italian Named-entities Dataset (KIND) [12], presented in
2021 at the Language Resources and Evaluation Confer- For the annotation of fiction literature, we have included
ence (LREC). KIND is available and freely downloadable 86 book chapters from a collection of 11 publicly available
on Github.1 Italian-authored books. This annotated dataset comprises</p>
        <p>The original dataset comprises over one million tokens a total of 219,638 tokens. While the majority of the
seand includes annotations for three entity classes: person, lected books are novels, we have also included a mix of
location, and organization. The majority of the dataset, epistles and biographies. The plain texts come from the
approximately 600K tokens, features manual gold anno- Liber Liber website.4
tations across three distinct domains: news, literature, In particular, we select: Il giorno delle Mésules (Ettore
and political discourses. This specific subset can be used Castiglioni, 1993, 12,853 tokens), L’amante di Cesare
(Auas the training data for the NERMuD 2023 task, which gusto De Angelis, 1936, 13,464 tokens), Canne al vento
focuses on Named Entity Recognition and Multi-domain (Grazia Deledda, 1913, 13,945 tokens), 1861-1911 -
CinClassification. quant’anni di vita nazionale ricordati ai fanciulli (Guido</p>
        <p>All the texts used for the annotation are publicly avail- Fabiani, 1911, 10,801 tokens), Lettere dal carcere (Antonio
able, under a license that allows both research and com- Gramsci, 1947, 10,655), Anarchismo e democrazia (Errico
mercial use. In particular, the texts used for the NERMuD Malatesta, 1974, 11,557 tokens), L’amore negato (Maria
task come from: Messina, 1928, 31,115 tokens), La luna e i falò (Cesare
Pavese, 1950, 10,705 tokens), La coscienza di Zeno (Italo
• Wikinews (WN) as a source providing news texts Svevo, 1923, 56,364 tokens), Le cose più grandi di lui
(Lufrom the last few decades; ciano Zuccoli, 1922, 20,989 tokens), L’occhio del lago
(Tul• Some Italian fiction books (FIC) in the public do- lio Giordana, 1899, 27,190 tokens).</p>
        <p>main, freely accessible for use; We prioritized selecting texts in the public domain
• Writings and speeches from the Italian politician that are as recent as possible (considering that, under the
Alcide De Gasperi (ADG), a collection of texts current legislation, copyright expires 70 years after the
including the works and speeches of Alcide De death of the author). This choice was made to ensure that
Gasperi, the Italian politician. the model trained on this data would be well-suited for
applying to novels written in recent years. By focusing</p>
        <p>Since the dataset is already publicly released and avail- on more contemporary texts, the language used in these
able, a new set of data has been annotated and shared novels is expected to be more similar to the language
using the same guidelines (available on the KIND reposi- used in present-day novels. Additionally, for the test
tory on Github). data, we specifically chose works by the author Tullio</p>
        <p>The dataset has been collected in full compliance with Giordana. His works are then not included in the train
ethical standards, ensuring that it aligns with the terms or the dev sets, to not have a model possibly biased in
of use of the sources and that respects the intellectual terms of style.
property and privacy rights of the original authors of the
texts. 3.3. Alcide De Gasperi’s Writings (ADG)
Table 1 displays an overview of the dataset.</p>
        <p>In the next subsections, we provide a quick descrip- Finally, we annotate 173 documents (164,537 tokens) from
tion of the domains included in the dataset. For more the corpus described in [13], spanning 50 years of
Euroinformation about the creation of the dataset, the text pean history. The corpus is composed of a comprehensive
processing, and the annotation guidelines please refer to collection of Alcide De Gasperi’s public documents, 2,762
[12].</p>
      </sec>
      <sec id="sec-2-2">
        <title>1https://github.com/dhfbk/KIND</title>
      </sec>
      <sec id="sec-2-3">
        <title>2https://it.wikinews.org/wiki/Speciale:Statistiche 3https://creativecommons.org/licenses/by/2.5/ 4https://www.liberliber.it/</title>
        <p>in total, written or transcribed between 1901 and 1954, capable of tackling a wide array of heterogeneous tasks
and it is available for consultation on the Alcide Digitale (among them, NERMuD).
website.5 The authors tested two diferent models:</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Baseline and Evaluation</title>
      <sec id="sec-3-1">
        <title>During the definition of the task, we proposed two base</title>
        <p>lines: an old-style Conditional Random Field [14], and
a plain BERT [15] implementation. These options
represent the most efective algorithms that can be
implemented without the use of GPUs, as well as the simplest
algorithms that can be performed using transformers.
Both implementations of the baselines can be found on
Github.6</p>
        <p>The CRF model is based on the classifier available in
scikit-learn out-of-the-box. In addition to standard
features extracted from the text, including vector
information from fastText models [16], we also used a set of
gazetteers (list of persons, organizations and locations)
collected from the Italian Wikipedia using some of the
classes contained in DBpedia [17]: Person, Organization,
and Place, respectively.</p>
        <p>The BERT NER classification model is inspired by the
blog post of Tobias Sterbak,7 using
BertForTokenClassification8 from Hugging Face.</p>
        <p>Final results will be calculated in terms of
macroaverage 1. The evaluation script is released in the KIND
oficial Github project. 9</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Participants</title>
      <sec id="sec-4-1">
        <title>The task has only one participant, the “ExtremITA” group [18], who participated in all the tasks presented at EVALITA 2023 with two unified multi-task learning approaches.</title>
        <p>The purpose of ExtremITA is to investigate how the
adoption of a Large Language Model can be taken to
its extreme consequences by proposing a single model</p>
      </sec>
      <sec id="sec-4-2">
        <title>5https://alcidedigitale.fbk.eu/</title>
        <p>6https://github.com/dhfbk/bert-ner
7https://bit.ly/ner-bert
8https://bit.ly/BertForTokenClassification
9https://github.com/dhfbk/KIND
extremIT5 - An Encoder-Decoder model based on
IT5 [19] consisting of approximately 110
million parameters. This model is trained by
concatenating the name of the task and the
input sentence/paragraph in the input texts,
each representing an example from a generic
EVALITA task. Its purpose is to generate a
piece of text that solves the target task. For
NERMuD, in particular, the list of expected
Named Entities is reported as a sequence of
text spans, each associated with the
corresponding entity type (in the form [〈entity_type〉]
〈text_span_that_evokes_entity〉).
extremITLLaMA - An instruction-tuned Decoder-only
model, built upon the LLaMA foundational
models [20], with a total of 7k million parameters. The
initial model was trained using the LoRA
technique [21] on Italian translations of Alpaca [22]
instruction data. The adapters are then merged
into the original model. A final fine-tuning phase
using LLaMA is then performed. For each
example from EVALITA, an input text is paired with
a manually crafted question that simulates an
instruction to be solved, representing the specific
task. The natural language instruction used in
NERMuD is “Scrivi le menzioni di entità nel testo,
indicandone il tipo: [PER] (persona), [LOC] (luogo),
[ORG] (organizzazione).”10.</p>
      </sec>
      <sec id="sec-4-3">
        <title>In both cases, NERMuD was transformed into a sequence-to-sequence task from its original token classiifcation format.</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Discussion</title>
      <p>10Write the entities’ mentions in the text, indicating their type: [PER]
(person), [LOC] (location), and [ORG] (organization)</p>
      <sec id="sec-5-1">
        <title>The evaluation of ORG entities for the fiction domain</title>
        <p>is missing, as none of the classifiers were able to correctly
identify the only ORG entity present in the test set (the
work “Borsa” in the sentence “Ha avuto disgrazie alla
Borsa”). Overall, the BERT baseline outperforms
ExtremITA in most runs, with the exception of LOC extraction
in fictional texts, where ExtremITLLaMA performs better.
This diference in performance can likely be attributed
to the textual data used to train the models.</p>
        <p>In general, it is possible to notice that the best
ExtremITA run overcomes almost always the classification in
terms of precision.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>7. Conclusions</title>
      <sec id="sec-6-1">
        <title>In this paper we described the first evaluation task for</title>
        <p>multi-domain named-entity recognition in Italian texts.
The task evaluated the performance of participant
systems in terms of extracting entities that refers to persons,
organizations, and location. The texts used for the tasks
cover three diferent domains: news, political speeches,
ifction.</p>
        <p>Unfortunately, the task attracted only one
participant, ExtremITA, who however presented an
interesting and very innovative multi-task approach, probably
the first one dealing with so many diferent tasks in
Italian. Although in general the results of ExtremITA do
not overcome the two strong baselines proposed (CRF
w/ gazetteers, and BERT), the diference in terms of 1
is very small, demonstrating a promising future for that
kind of approaches.</p>
      </sec>
      <sec id="sec-6-2">
        <title>As an outcome of the task, a new version of the KIND dataset is released, increasing its size with respect to the previous version.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , X. Han,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          , Q. Liu, ERNIE:
          <article-title>Enhanced language representation with informative entities, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Florence, Italy,
          <year>2019</year>
          , pp.
          <fpage>1441</fpage>
          -
          <lpage>1451</lpage>
          . URL: https://aclanthology.org/P19-1139. doi:10.1 8653/v1/
          <fpage>P19</fpage>
          -1139.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Xu</surname>
          </string-name>
          , X. Cheng, H. Li,
          <article-title>Named entity recognition in query</article-title>
          ,
          <source>in: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '09,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2009</year>
          , p.
          <fpage>267</fpage>
          -
          <lpage>274</lpage>
          . URL: https://doi.org/ 10.1145/1571941.1571989. doi:
          <volume>10</volume>
          .1145/1571941. 1571989.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cafarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.-M. Popescu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Shaked</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>D. S.</given-names>
          </string-name>
          <string-name>
            <surname>Weld</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Yates</surname>
          </string-name>
          ,
          <article-title>Unsupervised named-entity extraction from the web: An experimental study</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>165</volume>
          (
          <year>2005</year>
          )
          <fpage>91</fpage>
          -
          <lpage>134</lpage>
          . URL: https://www.sciencedirec t.com/science/article/pii/S0004370205000366. doi:https://doi.org/10.1016/j.
          <source>artint.2 005</source>
          .03.001.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Paccosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Palmero</surname>
          </string-name>
          <string-name>
            <surname>Aprosio</surname>
          </string-name>
          ,
          <article-title>Redit: A tool and</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>