<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of CLEF eHealth Task 1 - SpRadIE: A challenge on information extraction from Spanish Radiology Reports</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Viviana Cotik</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura Alonso Alemany</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Darío Filippo</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Franco Luque</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roland Roller</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jorge Vivaldi</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ammer Ayach</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fernando Carranza</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucas Defrancesca</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonella Dellanzo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Macarena Fernández Urquiza</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>CONICET</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Argentina</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departamento de Computación, FCEyN, Universidad de Buenos Aires</institution>
          ,
          <country country="AR">Argentina</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>FFyL, Universidad de Buenos Aires</institution>
          ,
          <country country="AR">Argentina</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>German Research Center for Artificial Intelligence (DFKI)</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Hospital de Pediatría 'Prof. Dr. Juan P.</institution>
          <addr-line>Garrahan'</addr-line>
          ,
          <country country="AR">Argentina</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Institut de Lingüística Aplicada, Universitat Pompeu Fabra</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Instituto de Filología y Literaturas Hispánicas "Dr. Amado Alonso", Universidad de Buenos Aires</institution>
          ,
          <addr-line>CONICET</addr-line>
          ,
          <country country="AR">Argentina</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>Universidad Nacional de Córdoba</institution>
          ,
          <country country="AR">Argentina</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper provides an overview of SpRadIE, the Multilingual Information Extraction Task of CLEF eHealth 2021 evaluation lab. The challenge targets information extraction from Spanish radiology reports, and aims at providing a standard evaluation framework to contribute to the advancement in the ifeld of clinical natural language processing in Spanish. Overall seven diferent teams participated, trying to detect seven named entities and hedge cues. Information extraction from radiology reports has particular challenges, such as domain specific language, telegraphic style, abundance of non-standard abbreviations and a large number of discontinuous, as well as overlapping entities. Participants addressed these challenges using a variety of diferent classifiers and introduced multiple solutions. The most successful approaches rely on multiple neural classifiers in order to deal with overlapping entities. As a result of the challenge, a manually annotated dataset of radiology reports in Spanish has been made available. To our knowledge this is the first public challenge for named entity recognition and hedge cue detection for radiology reports in Spanish.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Spanish Information Extraction</kwd>
        <kwd>BioNLP</kwd>
        <kwd>eHealth</kwd>
        <kwd>radiology reports</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivation</title>
      <p>
        In the last years, the volume of digitally available information of the medical domain has been
in constant growth. This is due especially to the widespread adoption of clinical information
systems and electronic health records (EHRs). Consequently, this is leading to the progressive
adoption of natural language processing applications in healthcare because of its recognized
potential to search, analyze and interpret patient datasets. Physicians spend a lot of time
inputting patients data into EHR systems, most of it stored as narratives of free text. The
extraction of information contained in these texts is useful for many purposes from which
some of the most relevant are: diagnostic surveillance through automated detection of critical
observations; query based case retrieval; quality assessment of radiologic practice; automatic
content analysis of report databases; and clinical support services integrated in the clinical
workflow [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>There are many types of medical reports within an electronic health record, such as chart
notes, case notes, progress notes, radiology reports and discharge reports. Some of them are
written in highly specialized and local vocabulary and in the special case of radiology reports,
they may have non-standard abbreviations, typos and ill-formed sentences. Because of the
particularities of the medical domain, clinical corpora are dificult to obtain. Clinical records
are of sensitive nature, so they are usually not published, and, if done so, they have to be
anonymized. Moreover, the highly specialized and local vocabulary makes the annotation a
dificult and expensive task.</p>
      <p>Most of the currently available resources on clinical report processing are for English. For
Spanish, the availability of resources is much more limited, despite being one of the languages
with more native speakers in the world. In particular, there are very few available annotated
corpora (see Section 2).</p>
      <p>
        In this context, we publish a novel corpus through the organization of the SpRadIE challenge,
a task proposed in the context of the CLEF eHealth 2021 evaluation lab [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]1. This corpus is a
reviewed version of a previously annotated and anonymized corpus of Spanish radiology reports
[
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. With SpRadIE we intend to collaborate to the advancement in the automatic processing
of medical texts in Spanish, while ofering participants the opportunity to submit novel systems
and compare their results using the same dataset and a standard evaluation framework. To our
knowledge, SpRadIE is the first information extraction challenge on Spanish radiology reports.
      </p>
      <p>More concretely, the SpRadIE challenge dataset consists of a corpus of pediatric ultrasound
reports from an Argentinian public hospital. These reports are generally written within a hospital
information system by direct typing into a computer a single section of plain text, where the
most relevant findings are described. The reports are written using standard boilerplate that
guide physicians on the structuring when there are no anomalous findings. However, most of
the times they are written in free text to be able to describe the findings discovered in anomalous
studies. The fact that input is free text and that anomalies are often found and reported results
in great variations in the content of the reports and in their size, ranging from 8 to 193 words.</p>
      <p>SpRadIE ofers multiple challenges that need to be addressed with creative solutions:
Low resources: The availability of linguistic resources in Spanish is greatly limited in
com</p>
      <sec id="sec-1-1">
        <title>1https://sites.google.com/view/spradie-2020/</title>
        <p>parison to high-resource languages such as English. In particular, there are no specific
terminologies for the radiology domain in Spanish.</p>
        <p>Domain-specific language: The vocabulary used in radiology reports is specific to the
radiology domain. The Radiological Society of North America (RSNA) produced Radlex
(Radiology Lexicon)2 an extensive, dedicated and comprehensive set of radiology terms
in English, for use in radiology reporting, decision support, data mining, data registries,
education and research. Besides, SNOMED CT (Systematized Nomenclature of Medicine,
Clinical Terms)3 is considered to be the most comprehensive, multilingual clinical
healthcare terminology (it includes Spanish) for medicine and has mappings to Radlex.
Ill-formed texts: Reports usually present ill-formed sentences, misspellings, inconsistencies
in the usage of abbreviations, and lack of punctuation and line breaks, as can be seen in
Figure 3.</p>
        <p>Semantic Split: Training, development and test sets cover diferent semantic fields, so that
various topics and their corresponding entities that occur in the test dataset have not
been previously seen in the training dataset.</p>
        <p>Small data: To approach realistic deployment conditions, only a small amount of annotated
reports has been available during training, and the rest has been used for evaluation.
Complex entities: The linguistic form of entities presents some particular dificulties:
lengthier entities with inner structure, embedded entities and discontinuities. Examples can be
found in Section 3.</p>
        <p>In the past, some challenges have been organized for information extraction in the medical
domain in Spanish (see Section 2). However, our proposal covers a part of the domain spectrum
that was not covered by previous work: actual short reports written in haste with mistakes and
variability.</p>
        <p>In this article we present the SpRadIE challenge and its results. The challenge aims at the
detection of seven diferent named entities as well as hedge cues from ultrasounds reports.
Targeted entities include anatomical entities and findings that describe a pathological or anomalous
event. Also negations and indicators of probability or future outcomes are to be detected. We
provide training, development and test datasets, and evaluate the participating systems using
metrics based on lenient and exact match.</p>
        <p>Overall seven diferent teams participated in the task, with participants from Spain, Italy,
United Kingdom, and Colombia. In more than 70% of them, there is at least one Spanish native
speaker. Most teams experimented with diferent variations of neural networks, particularly
BERT-based approaches. However, there were also submissions based on Conditional Random
Fields (CRFs) and pattern rules. The presence of overlapping and discontinuous entities was
one of the biggest challenges of the task. In order to overcome this problem, several teams
developed multiple classifiers running in parallel, together with pre and post-processing steps
for the input/output of the classifiers.</p>
      </sec>
      <sec id="sec-1-2">
        <title>2RadLex http://radlex.org/ 3SNOMED CT https://www.snomed.org/</title>
        <p>The remainder of the paper is structured as follows. In the following Section we present
previous work for Spanish medical information extraction and Spanish corpora in the medical
domain. In Section 3 we describe the target of the annotation, detailing types of entities, their
distribution in the annotated dataset, some of their most prominent features and the dificulty
of the task as measured by human inter-annotator agreement. Then, Section 4 presents the
Evaluation setting. Participating systems and baselines are described in Section 5, while results
are discussed in Section 6. We finish with some conclusions and a hope for the advancement in
the automatic treatment of medical text in Spanish.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Previous work</title>
      <p>In clinical care, many important patient related information is stored in textual format,
supplementing the structured information of electronic health records. To automatically access
and make use of this valuable information, methods of natural language processing (NLP),
like named entity recognition, relation extraction and negation detection, can be applied. In
order to train such methods, domain-related corpora have to be available. Medicine has many
sub-domains, such as radiology. The availability of specific corpora for handling them is of
utmost importance for the advancement of the BioNLP area.</p>
      <p>
        Given the sensitive nature of medical data and the dificulty of its annotation process, only
very few corpora are available, and most of them are in English (e.g., CheXpert [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and
MIMICCXR [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]). Moreover, most existing tools to process clinical text are also developed for English.
Nevertheless, in recent years the interest and need for processing non-English clinical text has
been increasing. In particular for Spanish, one of the languages with more native speakers in
the world.
      </p>
      <p>
        Annotation of medical texts is a dificult task, particularly for clinical records. Wilbur et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
defined annotation guidelines to categorize segments of scientific sentences in research articles
of the biomedical domain. The first published guideline for the annotation of radiology reports
that we are aware of [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] has been reviewed and enhanced for the annotation of the dataset
provided in this challenge. Besides, there are some corpora of negations in Spanish clinical
reports [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. Finally, recently, PadChest, a corpora of 27,593 Spanish annotated radiology
reports has been published [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        In the past, several challenges have been organized for information extraction in the medical
domain in Spanish. The CodiEsp shared task in CLEF-2020 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] addressed clinical cases (longer
sentences and paragrahps, more consistent use of vocabulary and less typos than radiology
reports). The target of CodiEsp was to assign tags at a document level, and to identify text
spans that support the assignation of the tags. The eHealth-KD challenge4 and the CANTEMIST
shared task ,5 both part of IberLEF-SEPLN 2020, targeted the identification of named entities
and relations (at diferent levels of granularity in the types of entities) but in medical research
papers instead of clinical reports. Other challenges that targeted information extraction from
Spanish biomedical texts were PharmaCoNER [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] (detection of drug and chemical entities)
MEDDOCAN [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] (anonymization), TASS eHealth-KD 2018 Task 3 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and IberLEF eHealth-KD
      </p>
      <sec id="sec-2-1">
        <title>4eHealth-KD: https://knowledge-learning.github.io/ehealthkd-2020/ 5CANTEMIST: https://temu.bsc.es/cantemist/</title>
        <p>
          2019 and 2020 [
          <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
          ].
        </p>
        <p>
          Spanish negation detection in the biomedical domain is also a current subject of interest (see
NEGES 2018 Workshop on Negation in Spanish)6, that has some works for the medical domain
and [
          <xref ref-type="bibr" rid="ref17 ref18 ref19 ref20 ref21">17, 18, 19, 20, 21</xref>
          ].
        </p>
        <p>
          Besides the approaches used in previously mentioned challenges, not much work has been
done for NER in Spanish clinical reports so far. Focusing in the radiology domain, only a few
publications target NER in the context of Spanish radiology reports [
          <xref ref-type="bibr" rid="ref22 ref23">22, 23</xref>
          ]. General overviews
about NLP in radiology can be seen in [
          <xref ref-type="bibr" rid="ref1 ref24">1, 24</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Target of the challenge</title>
      <p>The target of the task is Named Entity Recognition and Classification. As mentioned above,
these entities present several challenges. We describe the types of entities we are targeting and
then exemplify some of the challenges.</p>
      <sec id="sec-3-1">
        <title>3.1. Classes of entities</title>
        <p>Seven diferent classes of concepts in the radiology domain are distinguished. Since these
entities refer to very precise, complex concepts, they are realized by correspondingly complex
textual forms. Entities may be very long, sometimes even spanning over sentence boundaries,
embedded within other entities of diferent types and may be discontinuous. Moreover, diferent
text strings may be used to refer to the same entity, including abbreviations and typos.</p>
        <p>Entities are formed by a word or a sequence of words, not necessarily continuous, and entities
can be embedded within other entities. The following entities are distinguished: Anatomical
Entity, Finding, Location, Measure, Type of Measure, Degree, and Abbreviation. Hedge cues are
also identified, distinguishing: Negation, Uncertainty, and Conditional-Temporal. Examples can
be seen in Figure 1.</p>
        <p>As mentioned before, these entities present several challenges. Examples of longer,
discontinuous and overlapping entities can be found in Figure 2.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Annotated dataset</title>
        <p>
          The data consists of 513 ultrasonography reports provided by a public pediatric hospital in
Argentina. Reports are semi-structured and have orthographic and grammatical errors. They
have been anonymized in order to remove patient IDs, names and the enrollment numbers of
the physicians [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. An example of a report can be seen in Figure 3. The annotated training and
development partitions of the dataset, and the unannotated test partition are available at the
webpage of the SpRadIE challenge https://sites.google.com/view/spradie-2020/.
        </p>
        <p>
          Reports were annotated by clinical experts and then revised by linguists, using the brat
annotation tool [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. Annotation guidelines and training were provided for both rounds of
annotations (see [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] for the first round). An example of an annotated excerpt can be seen in
Figure 4.
        </p>
        <p>6NEGES 2018: https://aclweb.org/portal/content/neges-2018-workshop-negation-spanish
Anatomical Entity Entities
corresponding to an anatomical part, for example "breast"
(pecho), "liver" (hígado), "right thyroid lobe"
(lóbulo tiroideo derecho).</p>
        <p>Anatomical Entity
⏞ ⏟
vejiga llena
full bladder
Finding A pathological finding or
diagnosis, for example: "cyst", "cyanosis".</p>
        <p>Finding
⏞ ⏟
No se detectaron adenomegalias</p>
        <p>No adenomegalies were detected
Location It refers to a location in the body.</p>
        <p>The location could by itself indicate of which
part of the body it is being talked about or it
could have a relation to an anatomical entity.</p>
        <p>Examples of locations are: "walls", "cavity".</p>
        <p>Location
⏞ ⏟
quistes en región biliar</p>
        <p>cysts in biliary region
Measure</p>
        <p>Expression of measure.</p>
        <p>Measure
⏞ ⏟
Diametro longitudinal: 8.1 cm .</p>
        <p>Longitudinal diameter: 8.1 cm.</p>
        <p>Type of measure
kind of measure.</p>
        <p>Type of Measure
⏞ ⏟
Diametro longitudinal : 8.1 cm.</p>
        <p>Longitudinal diameter: 8.1 cm.</p>
        <p>Expression indicating a
Degree It indicates the degree of a finding
or some other property of an entity, for
example, “leve”, “levemente” (slight), “mínimo”
(minimal).</p>
        <p>Degree
⏞ ⏟
ligera esplenomegalia
slight splenomegaly</p>
        <p>Three subtypes of hedge cues are identified:
Negation</p>
        <p>Negation
⏞ ⏟
No se detectaron adenomegalias</p>
        <p>No adenomegalies were detected
Conditional - Temporal Hedge cues
indicating that something occurred in the past
or may occur in the future. Also indicating a
conditional form.</p>
        <p>Conditional-Temporal
⏞ ⏟
antecedentes de atresia</p>
        <p>history of atresia
Uncertainty Hedge cues indicating a
probability (not a certainty) that some finding may
be present in a given patient.</p>
        <p>Uncertainty
⏞ ⏟
compatible con hipertrofia pilórica
compatible with pyloric hypertrophy
3.2.1. Distribution of entities
The distribution of entities is shown in Figure 5. The most frequent type, Anatomical Entity, has
more than 2,000 occurrences, and there are almost 1,500 Findings, but there are only 163 hedges,
and only 15 Conditional-Temporal hedges. It can be expected that performance of automatic
systems is poor in types of entities with such few examples.</p>
        <p>Moreover, the diferent types of entities difer a lot among themselves. While entities of the
Finding type have an average length of 2.35 words and Anatomical Entities are in average 1.9
words long, which is not a big diference. However, we can see a big diference in the number
of times words are repeated within each type of entity. In Figure 6 we can see that most of the
words in Type of Measure and Negation occur at least 10 times, as is shown by the long box,
meaning that the majority of words occur up to 10 times or even more. With quite a tall box,
we can see that words in Anatomical Entities also tend to occur a high number of times. In
contrast, most of the words in Findings or Locations occur less than 2 or 3 times. If entities are
more repetitive in their wording, it is easier for an automatic classifier to identify them.
3.2.2. Inter-annotator Agreement
Automatic classifiers will be expected to perform well in those cases where human annotators
have strong agreement, and worse in cases that are dificult for human annotators to identify
consistently.</p>
        <p>We carried out a small study of inter-annotator agreement to assess the dificulty of the
task for trained human experts. Three trained linguists independently annotated 20 reports
(totalling 2,000 words and over 1,700 annotated entities) after reading the annotation guidelines
and sharing the annotations for two reports. The mean inter-annotator agreement was  = .85.</p>
        <p>In Figure 7 it can be seen that, among the frequent entities, Location is the one with lowest
agreement and variation in agreement. Degree also has low agreement, and the Uncertainty
hedge. No figures for Conditional-Temporal were obtained because there were few cases in the
dataset.</p>
        <p>Thus, it can be expected that automatic classiefirs perform worse in these categories than in
other that are more easily identified by humans, like Abbreviation, Anatomical Entity of Finding.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation setting</title>
      <sec id="sec-4-1">
        <title>4.1. Dataset Partitions</title>
        <p>Since reports are highly repetitive, almost half of the annotated corpus (207 reports) was used
as test set for evaluation. The test set was created by identifying terms belonging to a given
semantic field within the reports, and selecting all reports containing those terms. Thus, the
test set was guaranteed to contain words not in the training corpus, which was useful to assess
portability to (slightly) diferent domains. An additional development partition (45 reports)
was created with reports containing terms not in the training set or in the test set. The words
occurring only in development and test partitions can be seen in Table 1.</p>
        <p>The remaining part of the dataset consisted of a training partition (175 reports), a development
partition (47 reports) and a test partition (45 reports).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Metrics</title>
        <p>Submissions were evaluated using precision, recall and F1 scores, using both an exact and a
lenient matching scheme. Metrics were computed separately for each entity type. Therefore,
no credit is given for predictions with correct span but incorrect type. Global results were
obtained by micro-averaging. This way, the influence of each entity type in the global results is
proportional to its frequency in the test corpus.</p>
        <p>We believe that small variations in the span of named entities do not severely afect the
quality of the results. Minor diferences in the spans still provide useful information for possible
applications. Lenient matching can be used to give credit to partial matches in named entities.
For this challenge, we used as the main metric a micro-averaged F1 based on lenient match. We
also computed scores based on exact match as secondary metrics.</p>
        <p>
          The lenient scores are calculated using the Jaccard index and based on the metrics used in
the Bacteria Biotope task of BioNLP 2013 [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. The Jaccard index is used as a similarity measure
between a reference and a predicted entity. It is defined as the ratio of intersection over union
as follows:
 (, ) =
        </p>
        <p>(, )
ℎ( ) + ℎ() − (, )
where  represents the reference string of the gold standard and  the corresponding
string which was predicted. Both, overlap and length are measured in characters. An exact
match has a value of 1.</p>
        <p>To compute the lenient metrics, we first match reference and predicted entities pairwise. The
Jaccard index is used as the point-wise similarity to be optimized in the matching process. To
guarantee a global optimal matching in the general case, a bipartite graph matching algorithm
would be required. Instead, for simplicity, we implemented a greedy matching algorithm that
iterates over the ordered predicted entities and chooses the best matching reference entity. This
approach was tested using hand-crafted test cases specifically designed for complex situations,
and it always gave the expected matchings.</p>
        <p>The matching process returns a set  of matching pairs of reference and predicted entities.
Then, lenient precision and recall are computed as follows:
  =
 =
∑︀(,)∈  (, )</p>
        <p>∑︀(,)∈  (, )

where  and  are the total number of predicted and reference entities respectively.</p>
        <p>Exact precision and recall are computed using only exact matches, this is, those matches in
 with a similarity value of 1:
  = |{(, ) ∈  :  (, ) = 1}|</p>
        <p>= |{(, ) ∈  :  (, ) = 1}|</p>
        <p>Our oficial scripts to compute the metrics were ofered to the participants before evaluation
and are published in a public repository.7</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Participating systems</title>
      <p>Overall seven diferent teams participated in the shared task, with participants belonging to
institutions from Spain (4), Italy (2), UK (1) and Colombia (1). Most participating teams were
experimenting with diferent variations of neural networks, particularly transformer-based
approaches.</p>
      <p>
        Team EdIE [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] (University of Edinburgh and Health Data Research, UK) and SINAI [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]
(Universidad de Jaén, Spain) rely on a pre-trained BERT model for Spanish, namely BETO [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ].
EdIE uses an ensemble method, combining multiple BERT classifiers, with a dictionary, while
SINAI uses a single multiclass model for all entities. SWAP [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] (Università di Bari Aldo Moro,
Italy) instead relies on XLM-RoBERTa [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ], and CTB [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] (Universidad Politécnica de Madrid,
Spain and Universidad del Valle, Colombia) on a multilingual version of BERT.
      </p>
      <p>
        As an alternative to transformer-based models, team LSI [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] (Universidad Nacional de
Educación a Distancia and Escuela Nacional de Sanidad, Spain) uses a neural architecture with
a Bi-LSTM followed by a CRF layer.
      </p>
      <p>
        Aside from neural approaches, a classical CRF approach was used by team HULAT [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]
(Universidad Carlos III de Madrid, Spain), and team IMS [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] (Università di Padova, Italy)
applied a pattern based approach. Moreover, most teams also explored the usage of diferent
techniques, and diferent models. Each team was allowed to submit up to four diferent runs.
      </p>
      <p>Various teams opted for combinations of specialized classifiers instead of a single multiclass
model. This is the case of EdIE, combining multiple BETOs. CTB trained separate instances of
the same model to predict up to three overlapping entity types on the same token. The CRF
layer of the architecture implemented by the LSI team was actually a combination of parallel
CRFs specialized for diferent entity types. For negation, they used a separate model based on
transfer learning.</p>
      <p>Most teams put much efort into pre- and particularly in post-processing, making most of the
diferences within the four submissions for a team. Others submitted diferent architectures
or parameterizations of their neural architectures. This is the case for the SWAP team, which
experimented with architectures that are partially specialized for clinical text, partially optimized
for Spanish, and also multilingual approaches.</p>
      <p>More detail on the particulars of each system can be seen in the individual papers in the
proceedings of the challenge.</p>
      <sec id="sec-5-1">
        <title>5.1. Baselines</title>
        <p>Two baselines were constructed: an instance-based learner based on string matching and an
of-the-shelf neural learner.</p>
        <p>As previously mentioned, the annotated entities in SpRadIE dataset are very repetitive. For
this reason, the first simple baseline is an instance-based learner that relies on a simple string
matching approach. For each entity in the training set, we extract the diferent annotated strings
with a minimum length of two characters. Whenever a match is found in the test set, it is
classified just as it had been seen during training.</p>
        <p>
          A second baseline system uses the Flair framework [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ]. Very limited efort was put into
pre- and post-processing. Only spans of text tagged with overlapping entities were simplified
to a single entity, the most frequent one. The data was fed to a neural sequence tagger with
256 hidden layers and 0.3 locked dropout probability, including a CRF decoder appended at the
end. The model also utilizes a stack of Spanish fastText embeddings as well as contextual string
embeddings [37]. The model trained for a maximum of 20 epochs with a mini-batch size of 40,
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Analysis of Results</title>
      <p>In this section we present the results obtained by the seven teams in the task, together with
the baselines. Each team could submit up to four runs, however we just report the results of
the best performing run of each team in Table 2. Details for the rest of the runs can be seen
in the individual papers for each learner. Best scores for lenient precision, recall and F1 are
highlighted in bold. Table 3 shows the detail of performance for the 5 most frequent entity
types, which cover more than 80% of all entity mentions.</p>
      <p>Baselines The string matching baseline provides a reference for a very naïve approach to the
task, without any kind of generalization. The Flair baseline is a reference of the performance that
can be obtained with a more sophisticated learning architecture but without putting efort into
optimization, pre- or post-processing. This second baseline already shows that for our problem
machine learning quickly outperforms the simple string match approach. While resulting in a
similar recall, the precision of the machine learning approach clearly improves.
Participating Teams Overall, EdIE achieved the best results in the challenge, for both lenient
and exact F1 score and recall. EdIE achieves the best results for all five most frequent entity
types. However, its performance is worse for less frequent concepts, such as Degree (54%),
Uncertainty (34%) or ContionalTemporal (0%).</p>
      <p>The overall outcome of LSI is very close to EdIE in terms of F1, particularly for exact match.
In contrast to EdIE, LSI achieves a higher precision. Overall results are solid across entity types.
Similarly to EdIE, LSI has problems dealing with ContionalTemporal, with a performance of 0%.</p>
      <p>In contrast, CTB, the third ranked system, performs particularly well for ContionalTemporal
(67%), as well as for Anatomical Entities and Location. For the rest of concepts, results tend to
be about 5-10 points below the best system.</p>
      <p>HULAT, opposed to the other three systems, uses a CRF instead of a neural architecture. It
achieves very similar results regarding lenient F1 in comparison to CTB, but has a drop in exact
F1. Overall the system performs quite well regarding Location, Finding, Negation, Uncertainty
and ContionalTemporal. In case of Measure, the system has a strong drop of performance in
comparison to the best system. More focus on this concept, would have certainly boosted the
performance further.</p>
      <p>The SINAI team, while fifth in overall performance, achieved highest scoring results for two
of the most challenging entity types, namely, Finding and Location. Location was one of the
concepts where human annotators showed less consistency. Conversely, the system has got
a strong drop in performance regarding Abbreviations - about 70 points in comparison to the
lenient F1 of the best system. This might have strongly influenced the overall performance of
the system, as abbreviations occurs very frequently.</p>
      <p>SWAP achieved mostly fair performance for all concepts. It performs well above the string
matching baseline and better than the Flair baseline for exact match. IMS provides a simple
pattern based approach, similar to our string matching baseline. It shows how such a simple
approach can easily obtain a lenient recall around 60%, which may be useful for applications
like information retrieval.</p>
      <p>Performance across entity types It can be seen that the performance across entity types
has some correlation with repetitiveness of strings within entities of a given type and with the
consistency of human annotators.</p>
      <p>Indeed, entities where annotators were less consistent, mainly Location (see Figure 7), overall
performance was lower, with a drop of more than 10% with respect to overall performance
in most cases, and 15% in the best performing systems. We believe this may be due to these
entities having a less defined reference than others, like Anatomical Entities.</p>
      <p>The other major entity type with lower performance is Finding. In this case, we believe
less defined semantics may be a cause for dificulty, but also the form of these entities itself;
as described in Figure 6, words in Findings occur much less frequently than in other kinds of
entities.</p>
      <p>Discussion Participating systems can be diferentiated in three coarse groups with respect to
performance. The first group, consisting of the first two teams, both provide quite similar results
and have got some distance to the group in the middle field, consisting of the next three teams.
The remaining two teams show lower performance. While IMS describes an easy and quick
system to start with, and achieves therefore baseline performance, SWAP might have chosen an
inadequate architecture for the task. The authors finally submitted a run with XLM-RoBERTa,
although other systems performed better during development of their system. However, while
multilingual BERT performed best, BETO might have been a better option, as it is solely trained
on Spanish language data.</p>
      <p>With respect to best performing systems, what seems to have the biggest impact in
performance is an architecture based on multiple classifiers, instead of a single multiclass model. The
only exception to this would be CTB, scoring third with a multiclass model. This seems to be
related to the phenomenon of overlapping entities, which is pervasive in the dataset.</p>
      <p>Pre- and post-processing also made slight diferences in performance, but less than diferences
in the architecture of learners.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>We have presented the results of the SpRadIE challenge for detection and classification of named
entities and hedge cues in radiology reports in Spanish.</p>
      <p>Seven teams participated in the challenge, achieving good performance, with results well
above baselines. Although challenging entity types, like Finding, Location or hedge cues like
Conditional-Temporal barely reach 74% F1, Anatomical Entities, Measure or Abbreviation can
be recognized at almost 90% F1. This shows promising performance for integration within
productive workflows.</p>
      <p>Among the diferent approaches to the problem, we have found that combinations of multiple
classifiers clearly outperform single multiclass models. Neural approaches specifically trained
for Spanish also tend to perform better than generic or multilingual approaches. Also pre- and
post-processing have a positive impact in performance.</p>
      <p>Although Spanish has hundreds of millions of native speakers worldwide, not much work
has been done in information extraction from Spanish medical reports. It is important to note
that at least five of the seven participating teams have at least a Spanish native speaker. With
this challenge, we provided a standard evaluation framework for a domain of Spanish medical
text processing, namely radiology reports, that had not been previously addressed in this kind
of efort.</p>
      <p>We hope that this challenge and the promising results obtained by participating systems
encourage other institutions to make resources publicly available, and thus contribute to the
advancement in the automatic processing of medical texts, specially in Spanish.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We want to thank Mariana Neves, for helping us shape the construction of the challenge.</p>
    </sec>
    <sec id="sec-9">
      <title>Author Contribution</title>
      <p>VC &amp; RR conceived the idea of the challenge. With equal contribution, VC &amp; LAA co-led the
Task, and with JV they reviewed the previously created annotation guidelines. LAA re-annotated
reports. VC, LAA, FL &amp; RR organized the task. RR &amp; FL proposed the evaluation method and
FL implemented the evaluation scripts, DF solved annotation criteria issues, AD calculated
inter-annotator agreement, AA &amp; LDF implemented the baselines, FC and MFU annotated
for inter-annotator agreement metrics. LAA, RR, FL, JV, DF &amp; VC discussed the results and
contributed to the final manuscript.
framework for state-of-the-art nlp, in: NAACL 2019, 2019 Annual Conference of the North
American Chapter of the Association for Computational Linguistics (Demonstrations),
2019, pp. 54–59.
[37] A. Akbik, D. Blythe, R. Vollgraf, Contextual string embeddings for sequence labeling,
in: COLING 2018, 27th International Conference on Computational Linguistics, 2018, pp.
1638–1649.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Pons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Braun</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Hunink</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Kors</surname>
          </string-name>
          ,
          <article-title>Natural language processing in radiology: a systematic review</article-title>
          ,
          <source>Radiology</source>
          <volume>279</volume>
          (
          <year>2016</year>
          )
          <fpage>329</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Suominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Alemany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Brew-Sam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cotik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Filippo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Saez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Luque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          , G. Pasi,
          <string-name>
            <given-names>R.</given-names>
            <surname>Roller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Seneviratne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vivaldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Viviani</surname>
          </string-name>
          , C. Xu,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <article-title>eHealth 2021 Evaluation Lab</article-title>
          ,
          <source>in: Advances in Information Retrieval - 43st European Conference on IR Research</source>
          , Springer, Heidelberg, Germany,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cotik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Filippo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Roller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Annotation of Entities and Relations in Spanish Radiology Reports</article-title>
          ,
          <source>in: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP</source>
          <year>2017</year>
          ,
          <year>2017</year>
          , pp.
          <fpage>177</fpage>
          -
          <lpage>184</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cotik</surname>
          </string-name>
          ,
          <article-title>Information extraction from Spanish radiology reports</article-title>
          , in:
          <source>PhD Thesis</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Irvin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rajpurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ciurea-Ilcus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chute</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Marklund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Haghgoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ball</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shpanskaya</surname>
          </string-name>
          , et al.,
          <article-title>Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>33</volume>
          ,
          <year>2019</year>
          , pp.
          <fpage>590</fpage>
          -
          <lpage>597</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , T. J.
          <string-name>
            <surname>Pollard</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          <string-name>
            <surname>Berkowitz</surname>
            ,
            <given-names>N. R.</given-names>
          </string-name>
          <string-name>
            <surname>Greenbaum</surname>
            ,
            <given-names>M. P.</given-names>
          </string-name>
          <string-name>
            <surname>Lungren</surname>
            , C.-y. Deng,
            <given-names>R. G.</given-names>
          </string-name>
          <string-name>
            <surname>Mark</surname>
          </string-name>
          , S. Horng,
          <article-title>Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports</article-title>
          ,
          <source>Scientific data 6</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Wilbur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rzhetsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shatkay</surname>
          </string-name>
          ,
          <article-title>New directions in biomedical text annotation: definitions, guidelines and corpus construction</article-title>
          ,
          <source>BMC bioinformatics 7</source>
          (
          <year>2006</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Marimon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vivaldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. Bel</given-names>
            <surname>Rafecas</surname>
          </string-name>
          ,
          <article-title>Annotation of negation in the iula spanish clinical record corpus</article-title>
          ,
          <string-name>
            <surname>Blanco</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morante</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saurí</surname>
            <given-names>R</given-names>
          </string-name>
          , editors.
          <source>SemBEaR 2017</source>
          .
          <article-title>Computational Semantics Beyond Events and Roles; 2017 Apr 4; Valencia, Spain</article-title>
          . Stroudsburg (PA): ACL;
          <year>2017</year>
          . p.
          <fpage>43</fpage>
          -
          <lpage>52</lpage>
          . (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lima Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cuadros</surname>
          </string-name>
          , G. Rigau,
          <article-title>NUBes: A corpus of negation and uncertainty in Spanish clinical texts</article-title>
          ,
          <source>in: Proceedings of the 12th Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>5772</fpage>
          -
          <lpage>5781</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .lrec-
          <volume>1</volume>
          .
          <fpage>708</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bustos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pertusa</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-M. Salinas</surname>
          </string-name>
          , M. de la Iglesia-Vayá,
          <article-title>Padchest: A large chest x-ray image dataset with multi-label annotated reports</article-title>
          ,
          <source>Medical image analysis 66</source>
          (
          <year>2020</year>
          )
          <fpage>101797</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Miranda-Escalada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez-Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Armengol-Estapé</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Krallinger, Overview of automatic clinical coding: annotations, guidelines, and solutions for non-english clinical cases at codiesp track of clef ehealth 2020</article-title>
          , in: Working Notes of Conference and
          <article-title>Labs of the Evaluation (CLEF) Forum</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez-Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marimon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Intxaurrondo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Rabal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Villegas</surname>
          </string-name>
          , M. Krallinger,
          <article-title>PharmaCoNER: Pharmacological substances, compounds and proteins named entity recognition track</article-title>
          ,
          <source>in: Proceedings of The 5th Workshop on BioNLP Open</source>
          Shared Tasks, Association for Computational Linguistics, Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . URL: https://www.aclweb.org/anthology/D19-5701. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -5701.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Marimon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez-Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Intxaurrondo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Villegas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          ,
          <article-title>Automatic de-identification of medical texts in spanish: the meddocan track, corpus, guidelines, methods and evaluation of results</article-title>
          , in: IberLEF@SEPLN,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Cámara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Galiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Estévez-Velarde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A. G.</given-names>
            <surname>Cumbreras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Vega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gutiérrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Ráez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Montoyo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piad-Morfis</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. V. R</surname>
          </string-name>
          . (eds.), Overview of tass 2018:
          <article-title>Opinions, health and emotions</article-title>
          ,
          <source>in: Proceedings of TASS 2018: Workshop on Semantic Analysis at SEPLN (TASS</source>
          <year>2018</year>
          ),
          <source>Sun SITE Central Europe</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Piad-Morfis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Consuegra-Ayala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Estevez-Velarde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Almeida-Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Montoyo</surname>
          </string-name>
          ,
          <article-title>Overview of the ehealth knowledge discovery challenge at iberlef 2019s</article-title>
          ,
          <source>in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ),
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Piad-Morfis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gutiérrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Canizares-Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Estevez-Velarde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Montoyo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Almeida-Cruz</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the ehealth knowledge discovery challenge at iberlef</article-title>
          <year>2020</year>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Costumero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gonzalo-Martín</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Millan</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Menasalvas,</surname>
          </string-name>
          <article-title>An approach to detect negation on medical documents in spanish</article-title>
          ,
          <source>in: International Conference on Brain Informatics and Health</source>
          , Springer,
          <year>2014</year>
          , pp.
          <fpage>366</fpage>
          -
          <lpage>375</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cotik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stricker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vivaldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <article-title>Syntactic methods for negation detection in radiology reports in Spanish</article-title>
          , in: ACL - Workshop on Replicability and
          <article-title>Reproducibility in Natural Language Processing: adaptative methods, resources and software</article-title>
          , Buenos Aires, Argentina,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>W.</given-names>
            <surname>Koza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Filippo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cotik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stricker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Godoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rivas</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. MartínezGamboa</surname>
          </string-name>
          ,
          <article-title>Automatic detection of negated findings in radiological reports for spanish language: Methodology based on lexicon-grammatical information processing</article-title>
          ,
          <source>Journal of digital imaging 32</source>
          (
          <year>2019</year>
          )
          <fpage>19</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Santiso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Casillas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Oronoz</surname>
          </string-name>
          ,
          <article-title>Neural negated entity recognition in spanish electronic health records</article-title>
          ,
          <source>Journal of biomedical informatics 105</source>
          (
          <year>2020</year>
          )
          <fpage>103419</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Zavala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <article-title>The impact of pretrained language models on negation and speculation detection in cross-lingual medical text: Comparative study</article-title>
          ,
          <source>JMIR Medical Informatics</source>
          <volume>8</volume>
          (
          <year>2020</year>
          )
          <article-title>e18953</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cotik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Filippo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Castaño</surname>
          </string-name>
          ,
          <article-title>An approach for automatic classification of radiology reports in Spanish.</article-title>
          , in: MedInfo,
          <year>2015</year>
          , pp.
          <fpage>634</fpage>
          -
          <lpage>638</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cotik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vivaldi</surname>
          </string-name>
          ,
          <article-title>Spanish named entity recognition in the biomedical domain</article-title>
          ,
          <source>in: Annual International Symposium on Information Management and Big Data</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>233</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Casey</surname>
          </string-name>
          , E. Davidson,
          <string-name>
            <given-names>M.</given-names>
            <surname>Poon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Duma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Grivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Grover</surname>
          </string-name>
          , V. SuárezPaniagua,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tobin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Whiteley</surname>
          </string-name>
          , et al.,
          <article-title>A systematic review of natural language processing applied to radiology reports, BMC medical informatics and decision making 21 (</article-title>
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>P.</given-names>
            <surname>Stenetorp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pyysalo</surname>
          </string-name>
          , G. Topić,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ohta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ananiadou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tsujii</surname>
          </string-name>
          ,
          <article-title>Brat: a web-based tool for nlp-assisted text annotation</article-title>
          ,
          <source>in: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>102</fpage>
          -
          <lpage>107</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bossy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Golik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ratkovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bessières</surname>
          </string-name>
          , C. Nédellec,
          <article-title>BioNLP shared task 2013 - an overview of the Bacteria Biotope Task</article-title>
          ,
          <source>in: Proceedings of the BioNLP Shared Task 2013 Workshop</source>
          , Association for Computational Linguistics, Sofia, Bulgaria,
          <year>2013</year>
          , pp.
          <fpage>161</fpage>
          -
          <lpage>169</lpage>
          . URL: https://www.aclweb.org/anthology/W13-2024.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>V.</given-names>
            <surname>Suárez-Paniagua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Casey</surname>
          </string-name>
          ,
          <article-title>A multi-BERT hybrid system for Named Entity Recognition in Spanish radiology reports</article-title>
          ,
          <source>in: CLEF eHealth</source>
          <year>2021</year>
          .
          <article-title>CLEF 2021 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>P.</given-names>
            <surname>López-Úbeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Díaz-Galiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Ureña-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Martín-Valdivia</surname>
          </string-name>
          ,
          <article-title>Pre-trained language models to extract information from radiological reports</article-title>
          ,
          <source>in: CLEF eHealth</source>
          <year>2021</year>
          .
          <article-title>CLEF 2021 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J.</given-names>
            <surname>Canete</surname>
          </string-name>
          , G. Chaperon,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fuentes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pérez</surname>
          </string-name>
          ,
          <article-title>Spanish pre-trained bert model and evaluation data, PML4DC at ICLR 2020</article-title>
          <year>2020</year>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          , M. de Gemmis, G. Semeraro,
          <article-title>Comparing Transformer-based NER approaches for analysing textual medical diagnoses</article-title>
          ,
          <source>in: CLEF eHealth</source>
          <year>2021</year>
          .
          <article-title>CLEF 2021 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , É. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          ,
          <source>in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>8440</fpage>
          -
          <lpage>8451</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>O.</given-names>
            <surname>Solarte-Pabón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Montenegro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Blazquez-Herranz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Saputro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rodriguez-González</surname>
          </string-name>
          , E. Menasalvas,
          <article-title>Information Extraction from Spanish Radiology Reports using multilingual BERT, in: CLEF eHealth 2021</article-title>
          .
          <article-title>CLEF 2021 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>H.</given-names>
            <surname>Fabregat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Duque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Araujo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martinez-Romo</surname>
          </string-name>
          ,
          <article-title>LSI_UNED at CLEF eHealth2021: Exploring the efects of transfer learning in negation detection and entity recognition in clinical texts</article-title>
          , in: CLEF eHealth
          <year>2021</year>
          .
          <article-title>CLEF 2021 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ángel</surname>
          </string-name>
          Martín-Caro
          <string-name>
            <surname>García-Largo</surname>
            ,
            <given-names>I. S.</given-names>
          </string-name>
          <string-name>
            <surname>Bedmar</surname>
          </string-name>
          ,
          <article-title>Extracting information from radiology reports by Natural Language Processing and Deep Learning</article-title>
          , in: CLEF eHealth
          <year>2021</year>
          .
          <article-title>CLEF 2021 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>G. M. D. Nunzio</surname>
          </string-name>
          , IMS-UNIPD @
          <article-title>CLEF eHealth Task 1: A Memory Based Reproducible Baseline</article-title>
          , in: CLEF eHealth
          <year>2021</year>
          .
          <article-title>CLEF 2021 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>A.</given-names>
            <surname>Akbik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blythe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rasul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schweter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vollgraf</surname>
          </string-name>
          ,
          <string-name>
            <surname>Flair:</surname>
          </string-name>
          <article-title>An easy-to-use</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>