<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of CAPITEL Shared Tasks at IberLEF 2020: Named Entity Recognition and Universal Dependencies Parsing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jordi Porta-Zamorano</string-name>
          <email>porta@rae.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luis Espinosa-Anke</string-name>
          <email>espinosa-anke@cardif.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro de Estudios de la Real Academia Española</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science and Informatics, Cardif University</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <fpage>31</fpage>
      <lpage>38</lpage>
      <abstract>
        <p>We present the results of the CAPITEL-EVAL shared task, held in the context of the IberLEF 2020 competition series. CAPITEL-EVAL consisted on two subtasks: (1) Named Entity Recognition and Classification and (2) Universal Dependency parsing. For both, the source data was a newly annotated corpus, CAPITEL, a collection of Spanish articles in the newswire domain. A total of seven teams participated in CAPITEL-EVAL, with a total of 13 runs submitted across all subtasks. Data, results and further information about this task can be found at sites.google.com/view/capitel2020.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;IberLEF</kwd>
        <kwd>named entity recognition and classification</kwd>
        <kwd>NERC</kwd>
        <kwd>Universal Dependencies parsing</kwd>
        <kwd>evaluation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Sub-task 1: NERC</title>
      <sec id="sec-2-1">
        <title>2.1. Description</title>
        <p>Information extraction tasks, formalized in the late 1980s, are designed to evaluate systems which
capture information present in free text, with the goal of enabling better and faster information and
content access. One important subset of this information comprises named entities (NE), which, roughly
speaking, are textual elements corresponding to names of people, places, organizations and others.
Three processes can be applied to NEs: recognition or identification (NER), categorization, i.e.,
assigning a type according to a predefined set of semantic categories (NERC), and linking, which consists of
disambiguating the in-text mention against a knowledge base or sense inventory (NEL). Since their
advent, NER tasks have had notable success, but despite the relative maturity of this subfield, work and
research continues to evolve, and new techniques and models appear alongside challenging datasets
in diferent languages, domains and textual genres. The aim of this sub-task, thus, was to challenge
participants to apply their systems or solutions to the problem of identifying and classifying NEs in
Spanish news articles. This two-stage process falls within the NERC evaluation framework.</p>
        <p>
          The following NE categories were evaluated: Person (PER), Location (LOC), Organization (ORG)
and Other (OTH) as defined in the Annotation Guidelines [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] that were shared with participants. The
criteria for the identification and classification of entities were based on the capitalization chapter of
the Spanish language orthography [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The contextual meaning has been considered in the
classiifcation of entities, so that an entity such as Madrid can be classified as PER (a surname), LOC (the
city), ORG (the football team) of even OTH (a book title). Moreover, in terms of nesting, only the
longest-spanning entities were considered, and coordinated entities are considered one single entity
except for those where the name indicating the nature of the NE is used in plural to introduce several
entities ([Islas Baleares]loc y [Canarias]loc).
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Dataset</title>
        <p>A one-million-word subset of the CAPITEL corpus was randomly sampled into three subsets:
training, development and test. The training set comprises 60% of the corpus, whereas the development
and test sets roughly amount to 20% each. Descriptive statistics for these splits are provided in Table 1.
Together with the test set release, an additional collection of documents (background set) was
delivered to ensure that participating teams were not be able to perform manual corrections, and also to
encourage features such as scalability to larger data collections. Finally, all documents were tokenized
and tagged with NEs following an IOBES format.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Evaluation Metrics</title>
        <p>The metrics used for evaluation were Precision (the percentage of named entities in the system’s
output that are correctly recognized and classified), Recall (the percentage of named entities in the
test set that were correctly recognized and classified) and macro averaged F 1 score (the harmonic
mean of Precision and Recall), with the latter being used as the oficial evaluation score and for the
ifnal ranking of the participating teams.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Systems and Results</title>
        <p>We had 22 registrations, 5 final participants with 9 systems submitted and 4 system descriptions.</p>
        <p>
          The Ragerri Team from HiTZ Center-Ixa UPV/EHU presents in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] the combination of several
systems based on Flair [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and Transformer architectures [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. They perform experiments with
Multilingual BERT (mBERT), XML-RoBERTa (base), BETO (BETO is a BERT-based model pre-trained with
Spanish texts [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]), and Flair of-the-shelf models for Spanish and a monolingual model trained with
the OSCAR corpus. All the individual systems’ F1 were within 88.29-89.95% and the combination of
ifve of them using a simple agreement scheme of three achieved the first rank with a 90.30% F 1.
        </p>
        <p>
          The Vicomtech Team presents in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] a system based on the BERT architecture and several
experiments using multilingual BERT (mBERT) and BETO pre-trained models. BERT models are used to
give each token a contextual embedding that are then passed to a fully connected layer to classify
each of these tokens. Their work addresses also several interesting issues with the BETO vocabulary
and tokenizer, namely: punctuation marks missing in the tokenizer’s vocabulary and problems with
certain diacritics and characters. Their systems were fine-tuned with CAPITEL training data and
results were 2-3% F1 lower than the best performing system.
        </p>
        <p>
          The Yanghao Team from Huawei Translation Service Center presents in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] a system that uses
Multilingual BERT as encoder and a linear layer as a classifier, and is trained with additional 38,000
sentences from WMT news translation corpus [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] annotated using Spacy [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Their experimental
results suggest that pre-training with the augmented set and then fine-tune on CAPITEL improves
performance when compared to training on any of them separately or mixed.
        </p>
        <p>
          The Lirondos Team from ISI-USC presents in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] two sequence labelling systems: A CRF model
with handcrafted features and a BiLSTM-CRF model with word and character embeddings. A feature
ablation study demonstrated that all features contribute positively to the CRF model, with word
embeddings being the most informative feature, yielding an F1 score of 84.39%. On the other hand, their
BiLSTM-CRF model obtained an F1 score of 83.01%. A interesting error analysis has shown that many
of the errors correspond to OTH entities, contextual annotation of some entities (OTH versus ORG
or LOC versus ORG), nested entities, and person nicknames with unusual typographical shapes.
        </p>
        <p>Finally, the LolaZarra Team was ranked the last and did not submit any system description paper.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Sub-task 2: UD Parsing</title>
      <sec id="sec-3-1">
        <title>3.1. Description</title>
        <p>Dependency-based syntactic parsing has become popular in NLP in recent years. One of the reasons
for this popularity is the transparent encoding of predicate-argument structures, which is useful in
many downstream applications. Another reason is that it is better suited than phrase-structure
grammars for languages with free or flexible word order. Universal Dependencies (UD) is a framework for
consistent annotation of grammar (parts of speech, morphological features and syntactic
dependencies) across diferent human languages. Moreover, the UD initiative is an open community efort with
over 200 contributors which has produced more than 100 treebanks in over 70 languages.</p>
        <p>The aim of this sub-task was to challenge participants to apply their systems or solutions to the
problem of Universal Dependency parsing of Spanish news articles as defined in the Annotation</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset</title>
        <p>A 300,000-word subset of CAPITEL was provided for this sub-task. In addition to head and
dependency relations in CoNLL-U format, this subset was also tokenized and annotated with lemmas and
UD tags and features. Similarly to the NERC dataset, we randomly sampled it into three subsets:
training, development and test. The training set comprises about 50% of the corpus, whereas the
development and test sets roughly amount to 25% each. The description of the data sets can be found in
Table 3. In addition, the distribution of labels in the test set is given in Table 5 along with the results
of the sub-task. Together with the test set release, an additional collection of documents (background
set) were included to ensure that participating teams were not be able to perform manual corrections,
and also to encourage features such as scalability to larger data collections.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Evaluation Metrics</title>
        <p>The metrics for the evaluation phase were Unlabeled Attachment Score (UAS): The percentage of
words that have the correct head, and Labeled Attachment Score (LAS): The percentage of words that
have the correct head and dependency label, with the latter being used as the oficial evaluation score,
and for the final ranking of the participating teams.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Systems and Results</title>
        <p>We had, in this subtask, 12 registrations, 2 final participants with 4 submitted systems and 2 system
descriptions.</p>
        <p>
          The Vicomtech Team presents in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] a system based on the BERT architecture and several
experiments using multilingual BERT (mBERT) and BETO pre-trained models. BERT models are used to
encode a matrix of all-vs-all token encoding vectors and then pass to several classification layers
predicting the connectivity of tokens and their relation types. Their work addresses also some issues that
had been explained in 2.1. Their systems were fine-tuned with CAPITEL training data and results on
the development set were slightly better using BETO (UAS: 91.540, LAS: 88.410) instead of mBERT
(UAS: 91.220, LAS: 87.860), so only the BETO results were submitted as their oficial run.
        </p>
        <p>MartínLendinez Team presents in [13] the combination of the output of diferent UD parsing
toolkits using a voting scheme and the augmentation of the training set with 14,305 annotated sentences
from the AnCora annotated corpus [14].2 Three diferent toolkits were selected not because of their
performance in similar tasks but for their accessibility and documentation. These toolkits were
UDPipe [15], NLP-Cube [16] and Stanza [17]. As we can see in the summary provided in Table 4, the final
submitted results were obtained with Stanza trained on CAPITEL (4), Stanza trained on CAPITEL and
AnCora (3), and the combination of the previous two plus NLP-Cube trained on CAPITEL (1).</p>
        <p>As it can be seen in Table 4, results on this sub-task are very tight, with first and second systems
being only 0.06% apart, and with only 0.193% between first and fourth. The submission by
MartínLendinez was the highest ranked, and Vicomtech the simplest, and acknowledged and described by
the authors as a sort of BERT-based baseline. We provide a breakdown of the results by relation type
in Table 5.</p>
        <p>2There is also a discussion on some diferences in terms of tokenization and analysis between CAPITEL and AnCora.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>Most of the submitted systems obtained good results overall. In both sub-tasks, the majority of them
uses BERT, either multilingual or monolingual and some systems combines the output of several
models. Also the augmentation of data from other corpora, or produced by other annotation systems,
added to the training data or used to fine-tune the models, despite the heterogeneity of the annotations
or domain diferences have shown some modest improvements.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Acknowledgements</title>
      <p>We would like to thank specially to David Pérez Fernández, Doaa Samy, and all the people involved
in the PlanTL, for their contribution in making these shared tasks possible, and José-Luis
SanchoSánchez and Rafael-J. Ureña-Ruiz from the Centro de Estudios de la RAE for their help in preparing
the data. We would also like to thank the task participants who provided helpful inputs to improve
the quality of the dataset and the task itself.
[13] F. Sánchez-León, Combining Diferent Parsers and Datasets for CAPITEL UD Parsing, in:
Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020), 2020.
[14] M. Taulé, M. A. Martí, M. Recasens, AnCora: Multilevel Annotated Corpora for Catalan and
Spanish, in: Proceedings of the Sixth International Conference on Language Resources and
Evaluation (LREC’08), 2008.
[15] M. Straka, J. Straková, Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe,
in: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal
Dependencies, 2017.
[16] T. Boros, S. D. Dumitrescu, R. Burtica, NLP-Cube: End-to-End Raw Text Processing With Neural
Networks, in: Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text
to Universal Dependencies, 2018.
[17] P. Qi, Y. Zhang, Y. Zhang, J. Bolton, C. D. Manning, Stanza: A Python Natural Language
Processing Toolkit for Many Human Languages, in: Proceedings of the 58th Annual Meeting of the
Association for Computational Linguistics: System Demonstrations, 2020.
(1)
(2)
(3)
(4)
Label</p>
      <p>acl
acl:relcl
advcl
advmod
amod
appos
aux
aux:pass
case
cc
ccomp
compound
conj
cop
csubj
dep
det
discourse</p>
      <p>expl
expl:impers
expl:pass
expl:pv
fixed
flat
flat:foreign
goeswith
iobj
mark
mark:iobj
mark:mod
mark:obj
mark:subj
nmod
nsubj
nsubj:pass
nummod
obj
obl
obl:agent
orphan
parataxis
punct
root
xcomp
Total</p>
      <p>LAS
67.27
78.19
71.02
83.75
94.24
74.50
46.72
83.93
98.24
92.72
84.21
45.45
74.29
89.84
63.96
3.57
99.17
8.33
41.30
20.69
82.50
74.05
65.75
53.85
70.17
0.00
72.95
86.75
44.44
83.69
55.46
87.01
88.15
89.27
55.17
95.36
90.65
81.78
85.11
0.00
60.39
88.02
93.32
72.31
88.600</p>
      <p>UAS
80.04
75.50
78.69
86.23
96.81
84.40
48.26
100.00
98.83
95.14
90.73
59.09
76.37
93.95
82.88
75.00
99.33
77.78
97.83
93.10
99.44
97.38
70.32
91.54
91.44
0.00
93.62
92.12
100.00
93.62
90.76
94.61
88.28
93.61
96.55
97.53
98.39
87.48
100.00
70.59
72.99
88.64
93.23
72.04
91.773</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Porta-Zamorano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Romeu</given-names>
            <surname>Fernández</surname>
          </string-name>
          , Esquema de anotación sintáctica de CAPITEL,
          <string-name>
            <surname>Technical</surname>
            <given-names>Report</given-names>
          </string-name>
          , Centro de Estudios de la Real Academia Española,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>RAE</surname>
          </string-name>
          ,
          <string-name>
            <surname>ASALE</surname>
          </string-name>
          , Ortografía de la lengua española,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agerri</surname>
          </string-name>
          , G. Rigau,
          <article-title>Projecting Heterogeneous Annotations for Named Entity Recognition</article-title>
          ,
          <source>in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2020</year>
          ),
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Akbik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blythe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vollgraf</surname>
          </string-name>
          ,
          <article-title>Contextual string embeddings for sequence labeling</article-title>
          ,
          <source>in: Proceedings of the 27th International Conference on Computational Linguistics</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cañete</surname>
          </string-name>
          , G. Chaperon,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fuentes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pérez</surname>
          </string-name>
          ,
          <article-title>Spanish Pre-Trained BERT Model and Evaluation Data</article-title>
          ,
          <source>in: Proceedings of the Practical ML for Developing Countries Workshop at the Eight International Conference on Learning Representations (ICLR</source>
          <year>2020</year>
          ),
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>García Pablos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cuadros</surname>
          </string-name>
          , E. Zotova, Vicomtech at CAPITEL 2020:
          <article-title>Facing Entity Recognition and Universal Dependency Parsing of Spanish News Articles with BERT models</article-title>
          ,
          <source>in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2020</year>
          ),
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>System Report of HW-TSC on the CAPITEL NER Evaluation, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</article-title>
          <year>2020</year>
          ),
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tiedemann</surname>
          </string-name>
          ,
          <article-title>Parallel Data, Tools and Interfaces in OPUS</article-title>
          ,
          <source>in: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Honnibal</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Montani</surname>
          </string-name>
          , spaCy 2:
          <article-title>Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing</article-title>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Álvarez</surname>
          </string-name>
          <string-name>
            <surname>Mellado</surname>
          </string-name>
          ,
          <article-title>Two Models for Named Entity Recognition in Spanish: Submission for the CAPITEL Shared Task at IberLEF 2020, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</article-title>
          <year>2020</year>
          ),
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Porta-Zamorano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Romeu</given-names>
            <surname>Fernández</surname>
          </string-name>
          , Esquema de anotación de entidades nombradas de CAPITEL,
          <string-name>
            <surname>Technical</surname>
            <given-names>Report</given-names>
          </string-name>
          , Centro de Estudios de la Real Academia Española,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>