<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the eHealth Knowledge Discovery Challenge at IberLEF 2019</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alejandro Piad-Mor s</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yoan Gutierrez</string-name>
          <email>ygutierrez@dlsi.ua.es</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan Pablo Consuegra-Ayala</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Suilan Estevez-Velarde</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yudivian Almeida-Cruz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Mun~oz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andres Montoyo</string-name>
          <email>montoyog@dlsi.ua.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Languages and Computing Systems, University of Alicante</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Math and Computer Science, University of Havana</institution>
          ,
          <country country="CU">Cuba</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University Institute for Computing Research (IUII), University of Alicante</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>The eHealth Knowledge Discovery Challenge, hosted at IberLEF 2019, proposes an evaluation task for the automatic identi cation of key phrases and the semantic relations between them in health-related documents in Spanish language. This paper describes the challenge design, evaluation metrics, participants and main results. The most promising approaches are analyzed and the signi cant challenges are highlighted and discussed. Analysis of the participant systems shows an overall trend of sequence-based deep learning architectures coupled with domain-speci c or domain-agnostic unsupervised language representations. Successful approaches suggest that modeling the problem as an end-to-end learning task rather than separated in two subtasks improves performance. Interesting lines for future development were recognized, such as the option of increasing the corpus size with semi-automated approaches and designing more robust evaluation metrics.</p>
      </abstract>
      <kwd-group>
        <kwd>eHealth</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Knowledge Dis- covery</kwd>
        <kwd>Spanish Language</kwd>
        <kwd>Entity Detection</kwd>
        <kwd>Relation Extraction</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Knowledge-Based Systems</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Knowledge discovery is a growing eld in computer science, with applications in
several domains, from databases [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to images [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and Natural Language
Processing [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] (NLP). NLP methods are increasingly being used to mine knowledge
from unstructured health texts. Recent advances in health text processing
techniques are encouraging researchers and health domain experts to go beyond just
reading the information included in published texts (e.g. academic manuscripts,
clinical reports, etc.) and structured questionnaires, to discover new knowledge
by mining health contents. This has allowed other perspectives to surface that
were not previously available. These NLP tasks are often aided by the use of
domain-speci c annotated corpora. However, though di erent, many of them
share common characteristics, such as the detection of relevant entities and
relations. For this reason, domain-independent semantic representations, such as
AMR [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], PropBank [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and FrameNet [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are useful for addressing cross-domain
problems.
      </p>
      <p>
        Speci cally in the health domain, there is a growing number of scienti c
publications that are virtually impossible to analyze manually. This surplus of
data encourages the design of knowledge discovery systems that can leverage
the large amount of information available for building, for example, automated
diagnostic systems [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In this context, the eHealth Knowledge Discovery
Challenge (eHealth-KD) seeks to encourage research on a general-purpose knowledge
representation model applied to the health domain. The aim is to bridge the gap
between general-purpose knowledge discovery techniques and domain-speci c
techniques, especially in scenarios where there is insu cient domain-speci c
corpora and resources.
      </p>
      <p>
        The representation model used in eHealth-KD 2019 [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] allows the
representation of concepts and their interrelation, oblivious of domain-speci c semantics.
The domain-speci c semantics are in turn captured by the use of actions that
represent how concepts are modi ed. This model is inspired by research in
Teleologies [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and it is an extension of the representation model used in a previous
TASS challenge [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], named SAT+R (Subject-Action-Target + Relations). The
semantic model presented in this new challenge extends SAT+R [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] with new
entities and relations that provide a better coverage of the semantic content in
natural language sentences. The eHealth-KD Challenge proposes two subtasks
related to capturing the semantic meaning of health related sentences in the
Spanish language.
      </p>
      <p>This paper describes and evaluates the results of the 10 di erent systems
designed by the participants in the 2019 edition of the eHealth Knowledge
Discovery Challenge. Additional insights on the most promising lines for future research
are outlined. Section 2 describes the challenge, evaluation criteria and corpora.
Section 3 brie y describes the solutions presented in the challenge. Section 4
presents the main results and additional analysis about the best performing
approaches. Finally, Section 5 discusses the main highlights of the challenge, and
Section 6 concludes and provides ideas for future development.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Challenge description</title>
      <p>
        Even though this challenge is oriented to the health domain, the structure of the
knowledge to be extracted is general-purpose. The semantic structure proposed
models four types of information units. Each one represents a speci c semantic
interpretation, and they make use of thirteen semantic relations among them.
The following sections provide a detailed presentation of each unit and relation
type. Additional details about the annotation model and the exact semantic
de nition of each entity and relation are available in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>Based on previous experience with similar challenges, the process for
identifying the entities and relations de ned is divided in two subtasks. The rst
subtask deals with identifying the spans of text that de ne entities, and their
categories (see Section 2.1). The second subtask deals with identifying the
semantic relations that connect the entities previously identi ed (see Section 2.2).
2.1</p>
      <sec id="sec-2-1">
        <title>Subtask A: Key phrase Extraction and Classi cation</title>
        <p>Given a list of eHealth documents written in Spanish language, the goal of this
subtask is to identify all the key phrases per document and characterise them
with the concepts (i.e. classes) that represent them. These key phrases are all
the relevant terms (single word or multiple words) that represent semantically
important elements in a sentence. Figure 1 shows the relevant key phrases that
appear in an example set of sentences.</p>
        <p>Some key phrases (e.g., \v as respiratorias " and \60 an~os") span more than
one word. Key phrases always consist of one or more complete words (i.e., not
a pre x or a su x of a word), and never include any surrounding punctuation
symbols. There are four categories or classes for key phrases:
Concept: a general category that indicates the key phrase is a relevant term,
concept, idea, in the knowledge domain of the sentence.</p>
        <p>Action: a concept that indicates a process or modi cation of other concepts. It
can be indicated by a verb or verbal construction, such as \afecta" (a ects),
but also by nouns, such as \exposicion " (exposition), where it denotes the
act of being exposed to the Sun, and \dan~os" (damages), where it denotes
the act of damaging the skin.</p>
        <p>Predicate: used to represent a function or lter of another set of elements,
which has a semantic label in the text, such as \mayores " (older), and is
applied to a concept, such as \personas " (people) with some additional
arguments such as \60 an~os " (60 years).
Reference: A textual element that refers to a concept {in the same sentence
or in di erent one{, which can be indicated by textual clues such as \esta",
\aquel ", and similar.</p>
        <p>The input for Subtask A is a text document with a sentence per line. All
sentences have been tokenized at the word level (i.e., punctuation signs, parenthesis,
etc, are separated from the surrounding text).
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Subtask B: Relation Extraction</title>
        <p>Subtask B bene ts from the output of Subtask A, by linking the key phrases
detected and labeled in each sentence. The purpose of this subtask is to recognize
all relevant semantic relationships between the entities recognized. Eight of the
thirteen semantic relations de ned for this challenge can be identi ed in Figure 2.</p>
        <p>The semantic relations are divided into di erent categories:
General relations (6): general-purpose relations between two concepts that
have a speci c semantic: is-a, same-as, has-property, part-of, causes, and
entails.</p>
        <p>Contextual relations (3): allow a concept to be re ned by attaching the
modi ers: in-time, in-place, and in-context.</p>
        <p>Action roles (2): indicate which concepts play a role related to an Action,
which can be subject and target.</p>
        <p>Predicate roles (2): indicate concepts play a role in relation to a Predicate,
which can be the domain and additional arguments.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Evaluation Metrics</title>
        <p>The challenge proposed a main evaluation scenario (Scenario 1) where both
subtasks, previously described, are performed in sequence. The submission that
obtained the highest F1 score for the Scenario 1 was considered the best overall
performing system of the challenge. Additionally, participants had have the
opportunity to address speci c subtasks by submitting to two optional scenarios,
once for each subtask. These two additional scenarios measured the performance
in individual subtasks independently of each other.</p>
        <p>Scenario 1 is considered more complex than solving each optional scenario
separately, since errors that systems generate when facing the subtask A are
transmitted to subtask B. For this reason it is considered the main evaluation
metric. Additionally, this scenario also provides the possibility of integrating
endto-end solutions that solve both subtasks simultaneously. The evaluation metric
is a standard F1 where precision and recall are de ned in terms of (C)orrect,
(M)issing, (S)purious, (I)ncorrect and (P)artial matches. Incorrect matches
are reported when key phrases are correctly identi ed regarding the text span,
but they are not assigned to the correct category. Partial matches are reported
when key phrases overlap but do not match exactly with the correct text span.</p>
        <p>A higher precision means that the number of spurious identi cations is smaller
compared to the number of missing identi cations, and a higher recall means the
opposite. Partial matches are given half the score of correct matches, while
missing and spurious identi cations are given no score. The evaluation formulas for
scenario 1 are de ned as follows:</p>
        <p>RecallAB =
P recisionAB =</p>
        <p>F1AB = 2</p>
        <p>CA + CB + 21 PA
CA + IA + CB + PA + MA + MB</p>
        <p>CA + CB + 21 PA
CA + IA + CB + PA + SA + SB</p>
        <p>P recisionAB RecallAB
P recisionAB + RecallAB
(1)
(2)
(3)</p>
        <p>Likewise, similar formulas are de ned for scenarios 2 and 3, using respectively
only the statistics for subtask A and B. Additional details about the evaluation
metrics are available in the eHealth-KD Challenge website4.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Corpus Description</title>
        <p>For the purpose of the challenge, a corpus containing 1; 045 sentences was
distributed in several collections to participants. A set of 600 sentences for training
and 100 for model validation was distributed in the rst stage along with gold
annotations. For the test phase, 300 sentences were distributed, 100 per scenario,
and gold annotations were kept blind until the end of the challenge. An
additional 8,700 unannotated sentences were distributed in the test phase, which can
be used for a semi-automatic extension of the corpus via an ensemble of the best
performing submissions. All 8; 800 sentences in scenario 1 were shu ed; hence,
participants had no information on which were the actual 100 or the 8; 700
additional sentences, and were thus forced to submit responses for all the sentences.
4 https://knowledge-learning.github.io/ehealthkd-2019
This also had the e ect of discouraging a manual annotation or other forms of
gaining unfair advantage on the test set.</p>
        <p>
          The corpus annotation process followed closely the methodology proposed in
the previous edition [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. In contrast with the previous edition, no intentional
e ort was made to ensure balance between the training and test collections in
terms of the relative number of each annotation type. Table 1 summarizes the
main statistics of the corpus.
        </p>
        <sec id="sec-2-4-1">
          <title>Metric</title>
          <p>Sentences
Key phrases
- Concept
- Action
- Predicate
- Reference</p>
        </sec>
        <sec id="sec-2-4-2">
          <title>Total</title>
        </sec>
        <sec id="sec-2-4-3">
          <title>Trial</title>
        </sec>
        <sec id="sec-2-4-4">
          <title>Training Development</title>
          <p>
            1,045
In the eHealth-KD challenge 2019, 30 teams were registered from which 10
submitted their approaches successfully. They were characterized by the use of a
variable range of algorithms and techniques. The most common approaches
involved knowledge bases, deep learning and natural language processing
techniques. This section brie y describes each participant system. To simplify the
comparison and better understand the characteristics of each system, we de ne
several tags to describe the kind of techniques used by each team: (C)onditional
(r)andom elds; (P)retrained or (C)ustom word embeddings; (Ch)aracter-level
embeddings; hand-crafted (R)rules; natural language processing (F)eatures;
dealing with the (O)verlapping of entities; (At)tention mechanisms; (Co)nvolutional
layers; dataset (Au)gmentation techniques; and, if they solve both subtasks in a
(J)oint form rather than separated. The 10 systems are subsequently described,
and they are distinguished by the name of the team responsible for their creation.
coin ipper (P-R-F) [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]: Their system is based on ensembles of LSTMs
architectures using FastText embeddings and Part-of-Speech tags as main
features. They de ne a surrogate continuous loss function to approximate the
F1 score during training, and avoid domain-speci c NLP tools to promote
cross-domain reusability.
          </p>
          <p>
            Hulat-TaskA (Cr-P-Ch-Au) [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ]: Their system uses Bi-LSTM architecture
with character-level and word-level embeddings as input features, and a CRF
layer for decoding tags, for Subtask A. The team used the previous year's
challenge dataset to extend the word and character vocabulary with more
vectors
HULAT-TaskAB (Cr-P-Ch-Au) [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]: Their system consists of two Bi-LSTM
layers and a nal CRF layer, fed with token-level and character-level
embedding, for solving Subtask A. The task is encoded using the BIOES entity
tagging code.
          </p>
          <p>
            IxaMed (Cr-Cu-F-At) [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]: Their system uses a Bi-LSTM with a CRF nal
layer in Subtask A. For Subtask B they present three approaches to identify
relations: a Bi-LSTM with a CRF, a Joint AB-LSTM and a dependency
parser. Word embeddings for this speci c domain are learned from Electronic
Health Records.
          </p>
          <p>
            LASTUS-TALN (Cr-Cu-F-At) [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]: Their system uses a Bi-LSTM-CRF and
CNN with ELMo-based representations for Subtask A. For Subtask B, the
model is also based on a Bi-LSTM architecture, following a multi-task
learning approach for relation extraction (selection, classi cation and orientation
of relations).
          </p>
          <p>
            LSI2 UNED (P-Ch-F-Co) [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]: Their system is based on a hybrid Bi-LSTM
and CNN model with four input layers (PoS, casing types, and character and
word-level representations) that can recognize multi-word entities using the
BIO encoding, for Subtask A. Convolutional layers are used to obtain the
character-level representation of each word. Additionally, Wikidata entities
are used to extend the vocabulary.
          </p>
          <p>
            NLP UNED (P-F-At) [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]: Their system uses a Bi-LSTM architecture with
word embeddings, POS-tag and letter case features, in Subtask A, with
additional post-processing rules to x systematic errors. For Subtask B, the
Bi-LSTM architecture considers also dependency parsing features, and an
attention layer for merging word-level features into sentence-level feature
vectors.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-5">
        <title>TALP-UPC (Cr-P-F-O-At-Co-J-Au) [18]: Their system jointly recognizes</title>
        <p>entities and relations simultaneously using BERT embedded sentences
combined with GRUs and Convolutional architectures. Both Subtasks are solved
at the same time, modelling the dependency between entity labels and the
possible relations between them. They reuse the previous challenge data to
improve performance.</p>
      </sec>
      <sec id="sec-2-6">
        <title>UH-Maja-KD (Cr-Cu-Ch-R-F-O) [17]: Their system uses a Bi-LSTM-CRF</title>
        <p>
          architecture, with word embeddings trained in a Wikipedia-based medical
corpus, and additional POS tagging features in Subtask A. For Subtask B,
the model is a Bi-LSTM multiclass classi er that uses the longest path
between keyphrases in the dependency tree as phrase-level features.
VSP (-) [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]: Their system combines Bi-LSTM cells with a Softmax that
classi es all the relation classes in one model, with automatically trained word
embeddings, for Subtask B. Token, entity type and position embedding are
automatically learning during training.
        </p>
        <p>Baseline (R): A hand-crafted baseline was built by the challenge organizers to
provide a minimum working solution for participants and a measuring point.
This baseline stores every key phrase and relation tuple seen in the training
set, and outputs the exact label when a 100% match is found in the set.</p>
        <p>By far the most common approach involves deep learning architectures,
speci cally Bi-LSTM layers, which some teams combine with other types of
neural network architectures. This is to be expected, since LSTM architectures
are commonly used for natural language processing given their ability to learn
correlations between elements of a sequence. Several systems use Conditional
Random Fields (CRF) to decode the outputs for Subtask A. In contrast with the
previous edition, there are no pure rule-based or knowledge-based approaches,
although some systems incorporate domain knowledge in the form of custom
embeddings. One team (LSI2 UNED) uses Wikidata entities, which can be
considered a knowledge-based approach combined with a deep learning
architecture. Two teams (IxaMed and UH-Maja-KD) train custom embeddings
on external sources with domain knowledge, which can be considered an
unsupervised approach. All teams except one (TALP-UPC) solve both subtasks
separately, even though some reuse the same architecture in both.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>The results obtained by each team are summarized in Table 2 and are ranked in
order of best performance for Scenario 1. Highlighted in bold are the top three
results per scenario, except for Scenario 3 (Subtask B) where four results are
highlighted because two of them are very similar.</p>
      <p>
        Overall, the best performing system was presented by TALP-UPC [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ],
which consists of an end-to-end deep learning solution. This stands in stark
contrast with most of the alternatives that prefer to solve each subtask separately,
even though some systems share the same architecture in both subtasks but train
their models separately. TALP-UPC presents the only approach that actually
solves both subtasks simultaneously. The most signi cant di erence is obtained
in Subtask B, where a large margin of 9:3% separates the top result from the
second best.
      </p>
      <p>In Subtask A, the top three systems obtain very similar results, which can
be explained in part by the similarity of their approaches, i.e., LSTM-based
architectures with di erent types of embeddings as input features. In Subtask B,
a larger margin exists between the top result and the rest, which is an argument
in favor of end-to-end solutions. However, since the architectures of di erent
submissions have di erent characteristics, it is unclear whether this advantage
comes from a better model or actually from the joint training. Further
experimentation is necessary to determine the degree to which end-to-end training
in uences the overall performance.
4.1</p>
      <sec id="sec-3-1">
        <title>Analysis of Systems Performance</title>
        <p>In this section we present an analysis of the performance of participant systems
with respect to two qualitative criteria. First, we analyze the characteristics (as
de ned by the tags in Section 2) that are correlated with a higher performance
in each scenario. Next, we analyze the di culty of recognizing each type of
annotation independently, and the impact of having more annotations.</p>
        <p>To analyze the most signi cant strategies and approaches, we t a linear
regression model on the challenge results. For each participant, this model
approximates its score as a weighted average of the tags that describe the corresponding
system. For example, for the team coin ipper with description P-R-F and
index 2 in the table, the approximation formula is WP +WR +WF +error2 = 0:621
for Scenario 1, and correspondingly for all teams and scenarios (except the
baseline). The weights that minimize the approximation error P errori2 are thus
considered as the relative impact of a speci c tag. The R2 score for all three
scenarios is respectively 0:773, 0:857 and 0:936 which indicate that these tags
provides an adequate, if not perfect, description of the evaluated systems. Table 3
shows the weighting adjustment for all tags and all evaluation scenarios.</p>
        <p>According to these weightings, one of the most signi cant factors for
increasing performance in Scenario 1 is the use of an end-to-end system that
solves all tasks jointly. This was expected since the most e ective system created
by (TALP) is the only one that exhibits this feature. Other signi cant factors
include: using NLP features in addition to word embeddings; employing some
form of dataset augmentation; and, adding custom domain rules (e.g., identifying
which tokens to merge into a single key phrase, such as done by coin ipper).
The use of custom word embeddings (trained on domain-speci c datasets), as
opposed to generic word embedding produces a marginally negative e ect. This
may be due to the di culty of training embeddings on domain-speci c text,
where its hard to obtain a su ciently large corpus.</p>
        <p>In Scenario 2 (subtask A), solving the overlapping problem provides a marginal
advantage, since it increases the recall of some overlapping key phrases that
otherwise would be missing. The use of customized rules to solve the key phrase
discontinuities (e.g., as applied by UH-Maja-KD) are also a relevant strategy,
since several key phrases are not always formed by continuous tokens.
Considering the overlapping issue is key to Scenario 3 (subtask B) also, presumably
because otherwise all the relations between unreported overlapping key phrases
would be counted as missing. The next most important feature is the use of
attention mechanisms, which obtain a negative weighting in previous scenarios,
but appear to be favorable in subtask B. Attention mechanisms could aid in
identifying complex semantic relations that are far apart in the same sentence,
in which LSTM networks alone fail to capture long-range dependencies.</p>
        <p>Table 4 shows the cumulative distribution of correct matches for each type
of annotation. For each instance of each annotation, we count the number of
systems that output that speci c annotation correctly. Then we report the
percentage of each type of annotation (key phrase or relation) that is correctly
identi ed by at least X systems. Hence, these results are more indicative of
recall than precision (without considering partial matches). Given that systems
could produce unlimited spurious annotations, measuring a similar distribution
with respect to precision is unfeasible.</p>
        <p>Since several teams did not participate in Subtask B (relation extraction), it
is to be expected that relations have a lower recall than key phrases in general.
However, as explained in Section 4, the best performing systems in Subtask
B obtained a lower score than in Subtask A. Both these factors indicate that
Subtask B is considerably more di cult to solve than Subtask A.</p>
        <p>With respect to speci c key phrase labels, Concepts appear to be marginally
easier to identify than Actions and the remaining labels. Given that Concepts
Annotation
Key phrases
Relations
Concept
Action
Predicate
Reference</p>
        <p>1
are considerably more frequent in the dataset than the remaining labels, a larger
di erence is to be expected. This may be an indication that low-dimensional
features (such as POS-tags) are likely to be su cient to di erentiate key phrases
from non key phrases, since a surplus of annotation does not produce a similar
improvement in recall.</p>
        <p>Regarding relations, the distribution shows that the least common types are
also considerably harder to recognize. Given the unbalanced nature of the
corpus, some participants e ectively decided not to target all possible labels, and
only consider the most common ones. Increasing the number of output
predictions can harm a model's performance more than the relative improvement in F1
score, especially when some labels have a marginal impact on the overall score,
given their low count. This situation creates a scenario where it is preferable to
simply not consider some of the labels. In future challenges we will reconsider
the scoring metrics to mitigate this e ect. Key phrases or relations that appear
more frequently in the training set are found to be more easily identi able from
the semantic perspective. Figure 3 shows a scatter plot of all the annotation
types. The horizontal axis measures their relative rank with respect to instances
in the training set, i.e, annotation types are ordered from left to right according
to frequency. The vertical axis measures the relative rank of annotation type
with respect to the average number of systems that identify them; for example,
annotation types are ranked in ascending order according to identi cation
complexity {IC{. A perfect correlation between the instances in the training set and
their IC would be represented by a diagonal arrangement of annotatation types.
Annotations above the diagonal (e.g., Reference) are considerably easier to
identify even with a lower frequency, whereas annotations below the diagonal (e.g.,
causes) are more di cult regardless of the higher frequency.</p>
        <p>e
r
o
M
d
e
iif
t
n
e
d
i
s
e
c
n
a
t
s
n
I
s
s
e
L</p>
        <p>Predicate
in-place</p>
        <p>Concept</p>
        <p>Action
target
in-context
is-a</p>
        <p>subject
entails
argument
domain</p>
        <p>causes
same-as
has-property</p>
        <p>in-time
part-of</p>
        <p>Less -- Instances in training set -- More</p>
        <p>The correlation coe cient between these two magnitudes (i.e., rank by
frequency in corpus and IC) is 0:811, which, as expected, indicates a high relation
between the number of annotations of a given type and how easy they are to
identify. However, since correlation is not perfect, there is still a factor of variance
that needs explanation. For example, References are considerably easier to
identify than what their frequency would suggest, since there are only 215 instances
in the training set. In contrast, causes annotations have a higher frequency but
a much lower recall overall. This is to be expected, since Reference annotations
arguably have less syntactic variation than all the patterns in which, for
example, a causality can be expressed. These are examples of the general hypothesis
that key phrases are consistently easier to identify correctly than relations.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>The results of the eHealth-KD Challenge 2019 show the task of knowledge
discovery in Spanish health-related documents is still challenging. However, important
advances have taken place since the previous edition, which indicate that
research in this area is active and progressing. Most approaches have converged
towards a common factor, i.e., using Bi-LSTM models, possibly coupled with
other, more sophisticated, deep learning techniques. Solving both tasks with an
end-to-end system appears to be a promising approach, although more
experiments are necessary to e ectively measure the impact of this design strategy
isolated from other models and training strategies. In contrast with previous
challenges, domain-speci c knowledge did not provide a signi cant advantage
against black-box deep learning methods. However, some domain-speci c rules
for solving key phrase overlapping and discontinuity issues do increase
performance. As indicated earlier, the subtask B of relation extraction is considerably
more di cult to solve than the key phrase identi cation, although subtask A is
still not completely solved, given the large number of di erent annotation types
de ned.</p>
      <p>The large correlation between identi ed annotations and their relative
frequency in the training set suggests that there is still a large space for
improvement simply by using more annotations. Since the corpus was not intentionally
balanced in terms of the di erent annotation types, the less common patterns
(e.g., part-of ) naturally occurred less frequently. A possible suggestion that arises
from this analysis is considering oversampling the less frequent patterns during
annotation, to ensure a more balanced training set. Likewise, systems that
perform dataset augmentation or transfer learning from similar domains will bene t
from additional training examples. To this end, we will pursue the construction of
a larger, semi-automated corpus, by means of pooling the annotations provided
by participants in the 8; 700 raw sentences included in Scenario 1.</p>
      <p>An interesting issue that emerges from this analysis is the design of a better
evaluation metric. The F1 score de ned, though intuitive, promotes undesirable
behaviors when attempting to optimize the score. For example, since all
annotation types are micro-averaged, the less frequent ones have a much smaller impact
on the overall score. Since adding more outputs to a model usually increases the
parameters and harms learning in general, systems optimizing F1 could
potentially completely ignore the least frequent relation types and improve their score.
On the other hand, it is still unclear how to balance the relative importance of
subtask A and subtask B in a single metric, especially since mistakes in subtask
A necessary translate to mistakes in subtask B. However, small mistakes in
subtask A can have a large impact on subtask B, since a single missing or spurious
key phrase can participate in many relations.</p>
      <p>Finally, the F1 score fails to capture the essence of the problem at hand,
which is extracting the semantic meaning of a sentence. Since the F1 score
measures each decision independently, two systems can obtain the same score even
though one makes a \small" mistake by missing, for example, an argument, while
the other may leave the sentence completely disconnected by failing to recognize
an entailment between two main ideas. This suggests the need to design a more
robust metric that promotes systems which attempt to solve both subtasks
effectively and correctly captures the relative importance of the di erent semantic
elements to be identi ed.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>The eHealth-KD Challenge 2019 presented a problem of key phrase identi
cation and relation extraction in Spanish health-related texts. A total of 10 teams
presented a variety of approaches, with a common factor involving the use of
Bi-LSTM networks and embedding-based representations. An analysis of the
most successful approaches indicates that some domain-speci c rules are
helpful, even though most of the progress has been achieved with domain-agnostic
representations and generic NLP features. An interesting open issue is the use
of end-to-end systems that solve both subtasks simultaneously versus a more
classic pipeline with a speci c design tailored for each subtask.</p>
      <p>
        The most immediate e orts will focus on using the 8; 700 automatically
annotated sentences to build a semi-automatic corpus by pooling the predictions
of the most e ective systems. This corpus will then be used to train the most
promising models and con rm the impact of additional data. Given that most
approaches are domain-agnostic, in future challenges we will introduce
crossdomain tasks that require generalizable models. We are also interested in the
design of alternative evaluation metrics that capture the semantic nature of the
task. Finally, given the variety of models proposed, we will investigate the use of
ensembles and Automatic Machine Learning (AutoML) techniques [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to explore
potential Arti cial Intelligence architectures.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>Funding: This research has been supported by a Carolina Foundation grant in
agreement with University of Alicante and University of Havana. Moreover, it
has also been partially funded by both aforementioned universities, the Spanish
Government( Ministerio de Econom a y Competitividad) and the Generalitat
Valenciana (Conselleria d'Educacio, Investigacio, Cultura i Esport) through the
projects PROMETEU/2018/089, RTI2018-094653-B-C22 and
RTI2018-094649B-I00.</p>
      <p>The authors would like to thank the team of annotators from the School of
Math and Computer Science, at the University of Havana.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>C.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fillmore</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          :
          <article-title>The berkeley framenet project</article-title>
          . pp.
          <volume>86</volume>
          {
          <fpage>90</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>1998</year>
          ). https://doi.org/10.3115/980451.980860, http://dx.doi.org/10.3115/980451.980860
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Banarescu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonial</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Georgescu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gri</surname>
            <given-names>tt</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Hermjakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            ,
            <surname>Knight</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Koehn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Palmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          :
          <article-title>Abstract meaning representation for sembanking</article-title>
          . pp.
          <volume>178</volume>
          {
          <fpage>186</fpage>
          .
          <article-title>Association for Computational Linguistics, So a</article-title>
          ,
          <source>Bulgaria (Aug</source>
          <year>2013</year>
          ), https://www.aclweb.org/anthology/W13-2322
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bravo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Accuosto</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saggion</surname>
          </string-name>
          , H.:
          <article-title>Lastus-taln at iberlef 2019 ehealth-kd challenge: Deep learning approaches to information extraction in biomedical texts</article-title>
          .
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ) (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Byrd</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steinhubl</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ebadollahi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stewart</surname>
            ,
            <given-names>W.F.</given-names>
          </string-name>
          :
          <article-title>Automatic identi cation of heart failure diagnostic criteria, using text analysis of clinical notes from electronic health records</article-title>
          .
          <source>International journal of medical informatics</source>
          <volume>83</volume>
          (
          <issue>12</issue>
          ),
          <volume>983</volume>
          {
          <fpage>992</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Carlson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Betteridge</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kisiel</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Settles</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hruschka</surname>
          </string-name>
          , Jr., E.R., Mitchell, T.M.:
          <article-title>Toward an architecture for never-ending language learning</article-title>
          . pp.
          <volume>1306</volume>
          {
          <fpage>1313</fpage>
          . AAAI'
          <fpage>10</fpage>
          , AAAI Press (
          <year>2010</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>2898607</volume>
          .
          <fpage>2898816</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Catala</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>: coin ipper at ehealth-kd challenge 2019: Voting lstms for key phrases and semantic relation identi cation applied to spanish ehealth texts</article-title>
          .
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ) (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Colon-Ruiz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Segura-Bedmar</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Hulat-taskab at ehealth-kd challenge 2019: Knowledge recognition from health documents by bilstm-crf</article-title>
          .
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ) (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Estevez-Velarde</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montoyo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almeida-Cruz</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Automl strategy based on grammatical evolution: A case study about knowledge discovery from text</article-title>
          .
          <source>Proceedings of ACL</source>
          <year>2018</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Fabregat</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duque</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez-Romo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Araujo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Nlp uned at ehealth-kd challenge 2019: Deep learning for named entity recognition and attentive relation extraction</article-title>
          .
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ) (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Fayyad</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piatetsky-Shapiro</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>From data mining to knowledge discovery in databases</article-title>
          .
          <source>AI</source>
          magazine
          <volume>17</volume>
          (
          <issue>3</issue>
          ),
          <volume>37</volume>
          (
          <year>1996</year>
          ). https://doi.org/10.1609/aimag.v17i3.1230, https://doi.org/10.1609/aimag.v17i3.
          <fpage>1230</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Giunchiglia</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fumagalli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Teleologies: Objects, actions and functions</article-title>
          . pp.
          <volume>520</volume>
          {
          <fpage>534</fpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Goenaga</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santana</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santiso</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gojenola</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Casillas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Ixamed at ehealth-kd challenge 2019: Using di erent paradigms to solve clinical relation extraction</article-title>
          .
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ) (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ruiz-de laCuadra</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Cuadrado</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Carrasco</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruiz-Mezcua</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Hulat-taska at ehealth-kd challenge 2019: Sequence key phrases recognition in the spanish clinical narrative</article-title>
          .
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ) (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lara-Clares</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Serrano</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Lsi2 uned at ehealth-kd challenge 2019: A fewshot learning model for knowledge discovery from ehealth documents</article-title>
          .
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ) (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krishna</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fei-Fei</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Visual relationship detection with language priors</article-title>
          . pp.
          <volume>852</volume>
          {
          <fpage>869</fpage>
          . Springer International Publishing (
          <year>2016</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -46448-0 51, https://doi.org/10.1007%
          <fpage>2F978</fpage>
          -3
          <source>-319-46448-0 51</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Mart nez Camara</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almeida Cruz</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , D az Galiano,
          <string-name>
            <given-names>M.C.</given-names>
            ,
            <surname>Estevez-Velarde</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          , Garc a Cumbreras,
          <string-name>
            <surname>M.A.</surname>
          </string-name>
          , Garc a Vega,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Montejo</surname>
          </string-name>
          <string-name>
            <surname>Raez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Montoyo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , Mun~oz, R., et al.:
          <article-title>Overview of tass 2018: Opinions, health and emotions (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Mederos-Alvarado</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quevedo-Caballero</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodr</surname>
            guez-Perez,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cruz-Linares</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Uh-maja-kd at ehealth-kd challenge 2019: Deep learning models for knowledge discovery in spanish ehealth documents</article-title>
          .
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ) (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Medina</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turmo</surname>
          </string-name>
          , J.:
          <article-title>Talp-upc at ehealth-kd challenge 2019: A joint model with contextual embeddings for clinical information extraction</article-title>
          .
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ) (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gildea</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kingsbury</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>The proposition bank: An annotated corpus of semantic roles</article-title>
          .
          <source>Computational linguistics 31(1)</source>
          ,
          <volume>71</volume>
          {
          <fpage>106</fpage>
          (
          <year>2005</year>
          ). https://doi.org/10.1162/0891201053630264, http://dx.doi.org/10.1162/ 0891201053630264
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Piad-Mor s</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guiterrez</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Estevez-Velarde</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Mun~oz, R.:
          <article-title>A general-purpose annotation model for knowledge discovery: Case study in Spanish clinical text</article-title>
          . pp.
          <volume>79</volume>
          {
          <fpage>88</fpage>
          . Association for Computational Linguistics, Minneapolis, Minnesota, USA (Jun
          <year>2019</year>
          ), https://www.aclweb.org/anthology/W19-1910
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Piad-Mor s</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , Mun~oz, R.:
          <article-title>A corpus to support ehealth knowledge discovery technologies</article-title>
          .
          <source>Journal of biomedical informatics 94</source>
          ,
          <issue>103172</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Suarez-Paniagua</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Vsp at ehealth-kd challenge 2019: Recurrent neural networks for relation classi cation in spanish ehealth documents</article-title>
          .
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ) (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>