<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEPLN-</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>for Drug-Disease Evidence Search based on Scientific Articles</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elvira Amador-Domínguez</string-name>
          <email>elvira.amador@upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Badenes-Olmedo</string-name>
          <email>carlos.badenes@upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Madrid</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Spain</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departamento de Sistemas Informáticos, ETSI Sistemas Informáticos, Universidad Politécnica de Madrid</institution>
          ,
          <addr-line>28031 Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid</institution>
          ,
          <addr-line>28660 Boadilla del Monte</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>40</volume>
      <abstract>
        <p>The COVID-19 pandemic propelled the development of several Natual Language Processing (NLP) resources. These resources, however, are mostly developed exclusively in English, and lack accessibility for non-expert users, such as practitioners. This paper presents a platform for the analysis of clinical documents in Spanish, especially those related to COVID-19. This platform takes text in Spanish as input and comprises three diferent NLP modules: a named entity recognition module capable of detecting diseases and chemicals, a term linking module that links the detected entities with existing medical terminologies, and an evidence search module that retrieves scientific articles evidencing the relationship between pairs of entities, both diseases and medications. A series of experiments using the SPACCC corpus were conducted to assess the eficiency of the platform and the quality of the results.</p>
      </abstract>
      <kwd-group>
        <kwd>Named entity recognition</kwd>
        <kwd>term extraction</kwd>
        <kwd>term linking</kwd>
        <kwd>biomedical text processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The COVID-19 pandemic propelled the release of
considerable amounts of medical documents that, up to that time,
were largely inaccessible to researchers [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. This
phenomenon noticeably impacted the area of Natural Language
Processing (NLP), leading to an increase in the
development of models and data sets that were specifically targeted
in COVID-19 documents. In 2024, the search “COVID” in
HuggingFace1 returns over 100 dataset results and over
500 language models. Considering that HuggingFace has
around 500k pre-trained models available, COVID-19
related models represent around 10% of the total. This value
is quite remarkable, especially considering that there may
be models related to COVID-19 that do not include the term
“COVID” in their name and are therefore not retrieved by
search.
      </p>
      <p>
        However, most of the available resources present two
main shortcomings. First, most resources, both datasets and
models, are available only in English [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Most of the data
collected in the available datasets are gathered from two
main sources: research papers and Twitter posts. Research
papers are mostly published in English, and therefore data
extracted from this source will also be in that same language.
This is also the case with Twitter posts, in which, while there
is a wider variety of languages, most available datasets are
exclusively in English. Subsequently, most COVID-19
related NLP models are in English, since they are trained on
datasets in this language. Therefore, there is a scarcity of
COVID-19 NLP models available in Spanish. The scarcity
of models in Spanish can hinder the exploitation of textual
resources in this language and, as a result, limits the
generation of new models and datasets. Providing tools capable
of processing scientific resources in Spanish may not only
be useful for the NLP community, but also may increase the
impact of those resources.
      </p>
      <p>Second, while these models are intended for their use in
clinical practice, their use is not intuitive to practitioners.
For non-expert users, using pre-trained NLP models can
be a dificult task. Doting NLP models with an intuitive
user interface based on natural language queries may help
to approach these models to practitioners, subsequently
enhancing their use.</p>
      <p>This paper addresses the aforementioned issues,
presenting a biomedical NLP platform that facilitates the
exploration of large collections of scientific articles with clinical
information to extract evidence of drug-disease, drug-drug,
and disease-disease interactions. This platform includes
several NLP models, including a Named Entity Recognition
(NER) model to automatically detect diseases and
chemicals within the text, as well as a term-linking module to
retrieve additional information on the detected terms. In
addition, the platform includes an evidence module to
retrieve existing research documents that mention or relate
selected terms. Moreover, since the medical field is highly
expert-oriented, the platform follows a human-in-the-loop
approach, enabling users to select and find information on
medical terms that may not be correctly detected by the</p>
      <sec id="sec-1-1">
        <title>NER module.</title>
        <p>The remainder of this paper is structured as follows.
Section 2 presents the related work, Section 3 introduces the
developed platform, describing its composing modules.
Section 4 presents the experiments conducted on the platform
to address its eficiency and results, and finally, Section 5
draws conclusions and future research lines.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related</title>
    </sec>
    <sec id="sec-3">
      <title>Works</title>
      <p>
        In the context of NLP, we can distinguish two types of
resources developed during the COVID-19 pandemic:
language models and datasets. The work of Shuja et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
presents a comprehensive review of the diferent data sets,
classified by task, that were developed during the pandemic.
This work analyzes not only textual datasets but also speech
and medical image datasets. One of the first corpus available
was CORD-19 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], released in 2020 by the Allen Institute
for Artificial Intelligence. This corpus contained research
articles in English extracted from diferent open repositories,
CEUR
      </p>
      <p>
        ceur-ws.org
such as PubMed or bioRxiv. The corpus was first released
in 2020, with a size of less than 1GB. By its last release, in
2022, it had a size of more than 18GB worth of COVID-19
related articles. Similarly, DisGeNet [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] provides an extensive
dataset that labels diseases, genes, and variants within
medical articles, providing evidence of their labeling. In addition
to the benefits of academic sources, other works focused
on the recollection of textual data from non-expert users.
Lamsal et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] present the COV19Tweets dataset, which
contained more than 310 million tweets in English, intended
for the sentiment analysis task. A geolocalized version of
the dataset, named GeoCOV19Tweets, is also available [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Similarly, the COVIDSENTI dataset [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] provides more than
90,000 tweets collected in the early stages of the pandemic.
Other works, such as Cheng et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Depoux et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
focused on recollection of rumors, fake news, and general
misinformation for their further analysis.
      </p>
      <p>
        Regarding the Spanish language, most of the available
COVID-19 related datasets are extracted directly from social
media sources. This is the case of the work by Yu et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],
which provides a dataset of tweets posted by two of the main
Spanish newspapers during the pandemic. Similarly,
AllésTorrent et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] present DHCovid, a curated Twitter corpus
of digital conversations in the context of the pandemic.
      </p>
      <p>
        Although there is no specific COVID-19 corpus available
in Spanish, several biomedical corpus in this language have
been released over the years. The most important corpus
is SPACCC [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which contains 1,000 examples of clinical
case reports. This corpus covers a variety of medical areas,
such as oncology, urology, or pneumology. This corpus has
served as the base for more targeted datasets, such as
PharmacoNER [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], a version of SPACCC specifically targeted
at the identification of pharmacological substances in texts.
Similarly, CANTEMIST [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] presents a set of oncological
clinical reports focused on the identification of concepts
related to tumor morphology. More recently, DISTEMIST [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
released a set of manually annotated clinical cases to
identify clinical terms, including pharmacological substances,
diseases, and symptoms.
      </p>
      <p>
        These Spanish clinical data sets served as a scafold for
the development of several NLP models, each focused on
diferent tasks. The most prominent work in this area is that
conducted by the Barcelona Supercomputing Center [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],
providing a set of pretrained models in Spanish for diferent
tasks and datasets. More recently, the BioLORD [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] model
was released, ofering a multilingual solution for diferent
NLP tasks, such as answering questions and recognizing
named entities.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Description of the platform</title>
      <p>
        This paper presents a platform designed for the extraction of
evidence in Spanish from COVID-19 documents in English.
The proposed platform 2 addresses the main shortcomings
outlined in Section 1. First, it is devised for its use by
nonexpert uses, especially for practitioners, as it allows queries
to be made in natural language. Therefore, it presents a
centralized resource that comprises three NLP modules: a
named entity recognition module, a term linking module,
and an evidence retrieval module. The interaction between
these modules is seamless to the user, who is only required
to introduce an input text and make simple setting choices
to visualize the output of the models without changing or
2https://drugs4covid.oeg.fi.upm.es/platform_es
interacting directly with the models. Secondly, the
platform has been created in Spanish, both for the interface and
for the language models. The platform extends previous
work aimed at facilitating the exploration of large
collections of clinical scientific articles, such as the EBOCA [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
ontology for evidence retrieval or the term linking approach
presented in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>Figure 1 shows the workflow of the proposed platform.
Once the user introduces the input text, the named entity
recognition module finds and highlights all diseases and
chemicals detected within the text. Diseases are highlighted
in purple, while chemicals are highlighted in blue. As stated
above, the user can manually select terms within the text
and include them in the processing pipeline. In the sample,
the user selects the term “respuesta febril” (“febrile response”
in English), highlighted in gray. The platform then asks the
user to denote whether the selected term represents a
disease or a chemical and adds it to the list of detected entities.
The term linking module then provides extended
information on the detected entities, such as their identification
codes on diferent medical taxonomies, such as MeSH or
ATC.</p>
      <p>A notable feature of the proposed platform is that it
enables the inclusion of expert knowledge in real time. In this
case, if there are entities related to diseases or chemicals that
have not been detected by the NER model, they can be
manually identified by the expert. Manually identified entities
are added to the list of recognized terms and subsequently
processed by the remaining modules.</p>
      <p>The platform is devised for use both by non-expert users
and by medical practitioners. In the second case, the
platform can be used as an assistant, since it can easily retrieve
external references of medical terms, which can then be
queried on the corresponding medical taxonomies for
further information. Moreover, thanks to the evidence module,
the tool can be used to assess whether or not a certain
chemical can be used to treat a certain disease. Given a
pair of disease-drug, the evidence module returns
scientific evidence relating both elements, which either supports
or discourages the use of that chemical to treat the given
disease.</p>
      <sec id="sec-4-1">
        <title>3.1. Named-Entity Recognition Module</title>
        <p>
          The first module comprises the Named Entity Recognition
task. As stated in Section 2, there is a lack of NER models
in Spanish trained specifically to detect COVID-19 related
terms. More specifically, there is no benchmark NER model
in Spanish dedicated specifically to the detection of diseases.
In order to address this issue, a fine-tuned version of the BSC
biomedical model is generated. This model is focused on
disease detection and is trained using the DISTEMIST [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]
corpus. The trained model is available in HuggingFace3.
        </p>
        <p>For drug detection, the BSC Pharmaconer model is used,
as it provides accurate results. In the first NER strategy,
which will be referred to as the default strategy, these two
models work in parallel with the input Spanish text,
providing a set of detected diseases and chemicals in Spanish
as output. A second NER strategy is considered,
following a translation-based approach. Since most COVID-19
related NER models are in English and they ofer a
remarkable performance, a second processing pipeline is devised
in which the original Spanish text is translated into English</p>
        <sec id="sec-4-1-1">
          <title>3https://huggingface.co/oeg/BioNER-es</title>
          <p>using the GoogleTranslator library, and processed by an
existing pre-trained NER model. In this case, the selected
model is the BC5CDR NER model. This model is contained
within the Spacy library, and it accurately detects diseases
and chemicals. The results provided by the NER model
are then translated back into Spanish, making this process
completely opaque to the user.</p>
          <p>By default, the first strategy is followed, but the user
can specify whether to use the default approach or the
translation-based one.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Term Linking Module</title>
        <p>Once biomedical entities are detected, the term linking
module retrieves additional information related to each of them.
The previous term linking resources developed within the
DRUGS4COVID++ project were developed entirely in
English. Therefore, a mechanism to precisely find the
equivalent English term for a given entity in Spanish is required in
order to recover its additional information from the already
existing repositories. Some chemicals receive diferent
commercial names according to their country of distribution.
For example, the chemical “acetaminophen” is sold under
the commercial name “paracetamol” in Spain and “tylenol”
in the USA. Both terms refer to the same chemical, and
therefore, they should be linked to the same term and return the
same information.</p>
        <p>The same translation strategy used for NER (Section 3.1)
can be followed. However, literal translation may not be
optimal in this case, as context is essential to correctly
identify medical terms. An example of this is the term “callo”,
which refers to a hardening of the skin caused by friction.
Without context, this term literally translates to “callous”,
which is not related to the medical domain.</p>
        <p>Therefore, and to avoid mistranslated terms, the medical
terminology SNOMED-CT is used as a pivotal element
between the two languages. SNOMED-CT provides an oficial
version in diferent languages, including English and
Spanish. In order to maintain consistency between versions, each
concept is assigned a unique identifier, which is identical
for each version. This is exemplified in Figure 2, where the
terms “flu” and its equivalent term in Spanish “gripe” are
searched, and both relate to the same SNOMED-CT
identiifer.</p>
        <p>
          Therefore, the Spanish term is queried into the Spanish
version of SNOMED-CT, retrieving its SNOMED ID. This ID
is then used to query the English version and retrieve the
correct translation of the concept. The concept is then queried
in our database [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], which includes the diferent identifiers
for each medical concept in diferent terminologies, as well
as the existing synonyms. In the case of diseases, additional
information includes its CUI, MeSH ID, and ICD10, as well
as its semantic type. For chemicals, its CID, MeSH ID, CID,
ATC, and ATC levels are provided.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Evidence Retrieval Module</title>
        <p>
          Once the terms have been identified and presented to the
user, the platform provides the option of finding scientific
evidence related to a term. The evidence is modeled in the form
of a knowledge graph (KG), which uses the EBOCA [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]
ontology as its basis. The EBOCA graph has two purposes.
First, it describes the entities in the DISNET database and
relates them with their corresponding scientific resources.
Secondly, it also models metadata on evidences of
associations between concepts. All evidence is extracted from
curated sources, mostly research papers belonging to the
CORD-19 corpus.
        </p>
        <p>For each identified (or manually specified) entity, based
on its disease/chemical classification, the system
automatically enables a feature for user interaction. To access the
evidence related to an entity, users initiate a process that
explores the EBOCA KG, fetching the top ten results related
to the term. These results are paragraphs retrieved from
scientific resources in which the selected term is mentioned.</p>
        <p>The evidence content is presented in Spanish to improve
comprehension, with occurrences of the relevant term
emphasized to aid in reading.</p>
        <p>In the EBOCA graph, the evidence involving compounds
is organized in pairs, allowing only two terms to be
chosen simultaneously to find related evidence. By selecting
two terms, the system automatically retrieves the scientific
literature that connects both entities from the graph by
performing the appropriate SPARQL query. In these cases, the
retrieved evidence is composed of paragraphs in which both
entities are related.</p>
      </sec>
      <sec id="sec-4-4">
        <title>3.4. Design of the platform</title>
        <p>As previously stated, the platform is intended for use by
individuals in the clinical domain who do not have experience
with natural language processing (NLP). Therefore, it is
essential to maintain simplicity, keeping the user oblivious
to the internal working of the system while maintaining
eficiency.</p>
        <p>Figure 3 provides an overview of the visual aspect of the
platform. Since the platform is self-contained, all modules
are centralized on the same page, appearing dynamically
following the workflow described in Figure 1. Therefore,
the user first introduces a piece of text and selects the NER
Femenino de 42 años. Manifiesta 8 meses con herpes recurrente en boca. Ingresa por padecimiento de 2 meses con tos seca en accesos y
disnea rápidamente progresiva. Además dolor torácico bilateral, fiebre hasta 39oC y pérdida de peso de 14kg. La recurrencia de la lesión
herpética le ocasionaba disfagia y odinofagia. Al examen físico presentaba placas blanquecinas en orofaringe; la exploración del tórax,
con disminución del ruido respiratorio y estertores finos. Al aire ambiente la saturación de oxígeno era del 86%. El reporte de la
gasometría arterial con oxígeno suplementario al 70% fue: pH 7,30, pCO2 40,5mmHg, pO2 132mmHg, HCO3 19,5mmol/l, exceso de base
-5,8mmol/l, saturación de oxígeno al 97,9%. Índice de oxigenación (IO) de 188. Los exámenes de laboratorio al ingreso destacan:
linfopenia de 600células/mm3, Hb 11,8gr/dl, deshidrogenasa láctica de 971UI/l y albúmina 3,3gr/dl. La Rx de tórax presentaba opacidades
bilaterales en parche con vidrio deslustrado y neumomediastino, por lo cual, en el diagnóstico diferencial se incluyó inmunosupresión
asociada a VIH y neumonía por P. jirovecii (PJP). El análisis para VIH por ELISA fue POSITIVO, se confirmó por Western Blot. Se le
realizó broncoscopia con biopsia transbronquial y lavado broncoalveolar (LBA). El estudio histopatológico se reporta en la figura 1.
Recibió tratamiento con Trimetoprim/Sulfametoxasol y Prednisona en dosis de reducción por 21 días.</p>
        <p>En el 7.o día de tratamiento presentó deterioro respiratorio y el IO desciende a 110, por lo cual ingresa a terapia intensiva en estado de
choque y apoyo con ventilación mecánica invasiva. Al ingresar, los exámenes de laboratorio destacan leucocitos de 24,300células/mm3,
Hb 10,8gr/dl, deshidrogenasa láctica 2033UI/l y albúmina de 2,26gr/dl. Se agrega Imipenem por sospecha de neumonía intrahospitalaria
y luego de 12 días mejora la cifra leucocitaria a 5800células/mm3, Hb 8,7gr/dl, deshidrogenasa láctica 879UI/l, albúmina 2.41gr/dl e IO en
243.5, lográndose extubar. En las siguientes 24h, presenta hemoptisis masiva (volumen 250ml). Desciende la cifra de Hb a 6gr/dl, el IO
disminuye a 106 y se apoya con ventilación mecánica invasiva. Por Swan-Ganz se mide una POAP de 10mmHg. La radiografía de tórax
se muestra en la figura 2A. Se somete a LBA cuyo estudio de patología confirma la presencia de hemorragia alveolar reciente y activa.
Además, por rt-PCR se documenta infección por CMV e inicia tratamiento con Ganciclovir 350mg/d durante 14 días.
Tiene buena evolución, mejora el IO hasta 277 retirándose de ventilación mecánica invasiva 9 días posteriores al evento. Luego de 37 días
egresa a su domicilio. En el seguimiento, la cuenta de CD4 es de 109células/µl y la carga viral &lt;40copias/µl.
strategy to be followed. The system will then output the
results of the NER module, providing a version of the text
in which the detected entities are highlighted. The
corresponding list of entities is also depicted, which can then
be selected for evidence retrieval. Then, below the entities,
the results of the term linking process are depicted, listing
all codes and additional information related to the found
entities. Finally, once the user selects one or more entities
from the list, the corresponding evidence is presented.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Experimentation</title>
      <p>According to the design of the workflow, the NER module
can be perceived as the cornerstone of the system. The
process begins when entities are detected within the text,
unchaining the execution of the subsequent modules.
Therefore, ensuring that the NER module provides accurate results
is essential for the correct functioning of the platform.</p>
      <p>As presented in Section 3.1, two strategies are considered
for the detection of named entities within the text: a default
strategy based on Spanish NER models, and a
translationbased strategy. Ideally, both approaches should yield the
same results, identifying the same elements as diseases and
chemicals, respectively. Experiments are conducted to
assess the performance of both strategies. Since the Spanish
NER model is fine-tuned on the DISTEMIST corpus, an
unseen, valid corpus is required to evaluate such that the
comparison is fair. For this purpose, the SPACCC corpus,
described in Section 2, is used. Both performance and accuracy
aspects are considered for the evaluation.</p>
      <p>(a) Number of diseases detected.</p>
      <p>(b) Number of chemicals detected.</p>
      <sec id="sec-5-1">
        <title>From an eficiency perspective, the computation time</title>
        <p>per sample is computed for both strategies. The results
are reported in Figure 5. As shown, on average, using a
translation-based approach is slightly faster. Moreover, the
results suggest that the default approach is more stable in
terms of eficiency, even though the average time is slightly
longer. Although the diference is almost undetectable for
single-document processing, it is definitely noticeable when
processing large amounts of documents. The total amount
of time required to process the 1,000 documents that
comprise the SPACCC corpus is around 400 seconds when using
the default approach and is reduced by almost half when
using the translation-based approach (≈ 220 seconds). This
may be due to the fact that the processing time required by
the NER model is significantly higher than the time required
for document translation. The default approach comprises
two separate NER models, while the translation approach
employs only one NER model. Subsequently, the overall
computation time, on average, is lower in the second
approach.</p>
        <p>Then both NER strategies are evaluated in the SPACCC
corpus. The corpus is not devised for its evaluation on the
NER task and therefore does not contain a list of the entities
contained within the text. Therefore, a quantitative and
manual approach is followed to assess the performance of
both strategies. The SPACCC corpus comprises 1,000
medical reports samples. Figure 4 shows a sample document
extracted from the SPACCC corpus. Each report is
processed by both strategies, extracting the list of diseases and
chemicals detected by each strategy. The results are then
ifltered to remove stop words or empty characters that may
have been misdetected by the models. Then, the overlap
between the results obtained by both strategies is measured.
Ideally, both approaches should result in the same number
of entities detected, and thus the overlap should be close
to 100%. Figure 6 outlines the results achieved by both
approaches for NER. The default approach identifies more
entities than the translation-based approach. However, as
can be observed, the diference is not quite significant. The
number of elements detected by both approaches, while not
identical, is fairly homogeneous. This is especially
noticeable in chemical detection (Figure 6b), where while there
are some files in which there is some disparity between the
number of chemicals detected, the results are fairly
homogeneous. In the case of disease detection (Figure 6a), the
disparity between the results is quite noticeable. Therefore,
an analysis of the entities detected per strategy is conducted
to compare the results achieved.</p>
        <p>Table 1 presents an example of the entities recognized
by both strategies in a sample SPACCC documents. As can
be observed for the case of chemicals, the default approach,
while detecting a lower number of entities, correctly
classiifes all terms detected. In the translation-based strategy, all
entities detected by the default approach are also correctly
detected, but several incorrect terms are also detected. For
example, the terms “gotas” and “surgir”, which do not
refer to any chemical. A similar phenomenon occurs in the
case of diseases, where all elements detected by the default
approach are detected by the translation-based approach.
However, in this case, the translation-based approach
detects a series of terms that are correct but undetected by the
default approach, such as “linfopenia” of “fiebre”.
Nevertheless, disease detection is not as trivial as chemical detection,
since there is no hard criteria on how to distinguish a disease
from a symptom. For example, in the DISTEMIST corpus,
which was used to train the NER model in the default
approach, entities such as “dolor en el pecho” or “tos seca” are
not labeled diseases, as they correspond to the symptoms
of the labeled diseases. In the BC5CDR dataset, symptoms
are also considered as diseases and thus labeled as such.
This disparity in the labeling criteria of the datasets, paired
with the potential mistakes introduced by translation (for
example, the detection of the term “abucheo”, which is an
incorrect translation from the detected term “  2”), leads
to results that, while diferent, both can be considered as
correct up to a certain degree. This same pattern repeats in
the remaining of the analyzed samples, thus leading to the
conclusion that while the default approach is more precise,
the translation-based approach has a higher recall.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions and Future Work</title>
      <p>This paper presents a self-contained platform that processes
large-scale English biomedical documents to extract
evidence through a natural language-based interface in
Spanish, focusing especially on those related to COVID-19. The
presented platform centralizes and adapts a series of
resources developed within the DRUGS4COVID++ project,
which were initially developed in English, for the Spanish
language. The platform comprises three NLP resources: a
named-entity recognition module, which detects diseases
and chemicals within the text using two diferent strategies;
a term-linking module that ofers extended information
on the detected entities; and an evidence retrieval module,
which provides evidence in the form of scientific reports
on the pairwise interaction between diseases and chemicals.
Moreover, the platform considers the introduction of expert
knowledge within the detection process, since NER modules
can be flawed, such that users can manually incorporate
diseases and entities in real time within the execution process.
The NER module is then tested on the two ofered
strategies, a default strategy which uses two separate NER models
in Spanish for the detection of diseases and chemicals,
respectively, and a translation-based strategy that translates
the text into English and uses existing pre-trained models
for the detection of diseases and chemicals simultaneously.
The experimentation shows a slight disparity in the results
obtained by both approaches: while the results regarding
chemicals are fairly similar between both approaches, the
results regarding disease detection vary since the models
employed in the default approach only consider diseases,
while the translation-based approach labels symptoms as
diseases as well, thus leading to a higher number of detected
entities.</p>
      <p>In future work, it would be interesting to evaluate the
performance of the platform with respect to real expert users,
to measure its utility, usability, and reliance. Moreover, it
would be interesting to include evidence in Spanish, since
the current evidence repository only considers scientific
articles in English. Therefore, incorporating evidence in
Spanish would enrich the evidence repository and reduce
the need of use translation mechanisms, which can result
in the loss of information.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The research work presented in this paper has been funded
by the DRUGS4COVID++ project, with the financial support
of BBVA.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chandrasekhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Reas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Burdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Eide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Funk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Katsis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Kinney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Merrill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mooney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Murdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rishi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sheehan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stilson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Wade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. X. R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wilhelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Raymond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Weld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kohlmeier</surname>
          </string-name>
          , CORD-
          <volume>19</volume>
          : The COVID-19 open research dataset, in: K. Verspoor,
          <string-name>
            <given-names>K. B.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dredze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          , J. May,
          <string-name>
            <given-names>R.</given-names>
            <surname>Munro</surname>
          </string-name>
          , C. Paris, B. Wallace (Eds.),
          <source>Proceedings of the 1st Workshop on NLP for COVID-19 at ACL</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .nlpcovid19-acl.1.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Allot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Keep up with the latest coronavirus research</article-title>
          ,
          <source>Nature</source>
          <volume>579</volume>
          (
          <year>2020</year>
          )
          <fpage>193</fpage>
          --
          <lpage>193</lpage>
          . doi:
          <volume>10</volume>
          .1038/d41586- 020- 00694- 1.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Shuja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Alanazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Alasmary</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Alashaikh,</surname>
          </string-name>
          <article-title>Covid19 open source data sets: a comprehensive survey</article-title>
          ,
          <source>Applied Intelligence</source>
          <volume>51</volume>
          (
          <year>2021</year>
          )
          <fpage>1296</fpage>
          -
          <lpage>1325</lpage>
          . doi:
          <volume>10</volume>
          .1007/ s10489- 020- 01862- 6.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Lamsal</surname>
          </string-name>
          ,
          <article-title>Design and analysis of a large-scale covid19 tweets dataset</article-title>
          ,
          <source>Applied Intelligence</source>
          <volume>51</volume>
          (
          <year>2020</year>
          )
          <fpage>2790</fpage>
          --
          <lpage>2804</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10489- 020- 02029- z.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>U.</given-names>
            <surname>Naseem</surname>
          </string-name>
          , I. Razzak,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khushi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Eklund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Covidsenti: A large-scale benchmark twitter data set for covid-19 sentiment analysis</article-title>
          ,
          <source>IEEE Transactions on Computational Social Systems</source>
          <volume>8</volume>
          (
          <year>2021</year>
          )
          <fpage>1003</fpage>
          -
          <lpage>1015</lpage>
          . doi:
          <volume>10</volume>
          .1109/TCSS.
          <year>2021</year>
          .
          <volume>3051189</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nazarian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bogdan</surname>
          </string-name>
          ,
          <article-title>A covid19 rumor dataset</article-title>
          ,
          <source>Frontiers in Psychology</source>
          <volume>12</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .3389/fpsyg.
          <year>2021</year>
          .
          <volume>644801</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Depoux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Karafillakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Preet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wilder-Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <article-title>The pandemic of social media panic travels faster than the covid-19 outbreak</article-title>
          ,
          <source>Journal of Travel Medicine</source>
          <volume>27</volume>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1093/ jtm/taaa031.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Muñoz-Justicia</surname>
          </string-name>
          ,
          <article-title>Analyzing spanish news frames on twitter during covid-19-a network study of el país and el mundo</article-title>
          ,
          <source>International Journal of Environmental Research and Public Health</source>
          <volume>17</volume>
          (
          <year>2020</year>
          )
          <article-title>5414</article-title>
          . URL: https://www.mdpi.com/1660-4601/17/15/ 5414. doi:
          <volume>10</volume>
          .3390/ijerph17155414.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Allés-Torrent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. del Rio</given-names>
            <surname>Riande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bonnell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hernández</surname>
          </string-name>
          ,
          <article-title>Digital narratives of covid-19: A twitter dataset for text analysis in spanish</article-title>
          ,
          <source>Journal of Open Humanities Data</source>
          <volume>7</volume>
          (
          <year>2021</year>
          )
          <article-title>5</article-title>
          . doi:
          <volume>10</volume>
          .5334/johd.28.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Intxaurrondo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          , Spaccc,
          <year>2019</year>
          . URL: https://doi.org/10.5281/zenodo.2560316. doi:
          <volume>10</volume>
          .5281/ zenodo.2560316.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez-Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marimon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Intxaurrondo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Rabal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Villegas</surname>
          </string-name>
          , M. Krallinger,
          <article-title>PharmaCoNER: Pharmacological substances, compounds and proteins named entity recognition track</article-title>
          , in: K.
          <string-name>
            <surname>Jin-Dong</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Claire</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Robert</surname>
          </string-name>
          , D. Louise (Eds.),
          <source>Proceedings of the 5th Workshop on BioNLP Open</source>
          Shared Tasks, Association for Computational Linguistics, Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . URL: https://aclanthology.org/ D19-5701. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          - 5701.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Miranda-Escalada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Farré</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          ,
          <article-title>Named entity recognition, concept normalization and clinical coding: Overview of the cantemist track for cancer text mining in spanish, corpus, guidelines, methods and results</article-title>
          ,
          <source>in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2020</year>
          ),
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>303</fpage>
          -
          <lpage>323</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2664</volume>
          /cantemist_overview.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Miranda-Escalada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lima-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Farré- Maduell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Estrada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          , G. Katsimpras, G. Paliouras,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Krallinger, Overview of distemist at bioasq: Automatic detection and normalization of diseases from clinical texts: results, methods, evaluation and multilingual resources</article-title>
          , in: Working Notes of Conference and
          <article-title>Labs of the Evaluation (CLEF) Forum</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2022</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3180</volume>
          /paper-11.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Carrino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Llop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pàmies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gutiérrez-Fandiño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Armengol-Estapé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Silveira-Ocampo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Valencia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez-Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Villegas</surname>
          </string-name>
          ,
          <article-title>Pretrained biomedical language models for clinical NLP in Spanish</article-title>
          ,
          <source>in: Proceedings of the 21st Workshop on Biomedical Language Processing</source>
          , Association for Computational Linguistics, Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>193</fpage>
          -
          <lpage>199</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .bionlp-
          <volume>1</volume>
          . 19. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .bionlp-
          <volume>1</volume>
          .
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Remy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Demuynck</surname>
          </string-name>
          , T. Demeester, BioLORD-2023:
          <article-title>semantic textual representations fusing large language models and clinical knowledge graph insights</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          (
          <year>2024</year>
          )
          <article-title>38412333</article-title>
          . URL: https://doi.org/10.1093/jamia/ ocae029. doi:
          <volume>10</volume>
          .1093/jamia/ocae029.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Á. Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Iglesias-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Santamaría</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Poveda-Villalón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Badenes-Olmedo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>RodríguezGonzález, EBOCA: Evidences for BiOmedical Concepts Association Ontology</article-title>
          , Springer International Publishing,
          <year>2022</year>
          , pp.
          <fpage>152</fpage>
          --
          <lpage>166</lpage>
          . URL: http://dx. doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -17105-5_
          <fpage>11</fpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>031</fpage>
          -17105-5_
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Badenes-Olmedo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>PovedaVillalón, A</article-title>
          .
          <string-name>
            <surname>Iglesias-Molina</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Calleja</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bernardos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Martín-Chozas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Fernández-Izquierdo</surname>
            , E. AmadorDomínguez,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Espinoza-Arias</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Pozo-Gilo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Ruckhaus</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>González-Guardia</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Cedazo</surname>
            ,
            <given-names>B. LópezCenteno</given-names>
          </string-name>
          , Ó. Corcho, Drugs4covid:
          <article-title>Drug-driven knowledge exploitation based on scientific publications</article-title>
          , CoRR abs/
          <year>2012</year>
          .
          <year>01953</year>
          (
          <year>2020</year>
          ). URL: https://arxiv.org/ abs/
          <year>2012</year>
          .
          <year>01953</year>
          . arXiv:
          <year>2012</year>
          .
          <year>01953</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>