<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MetaMap versus BERT models with explainable active learning: ontology-based experiments with prior knowledge for COVID-19</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>M. Arguello-Casteleiro</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>C. Henson</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>N. Maroto</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S. Li</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J. Des-Diz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M.J. Fernandez- Prieto</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S. Peters</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>T. Furmston</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>C. Sevillano Torrado</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D. Maseda Fernandez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. Kulshrestha</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J. Keane</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R. Stevens</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>C. Wroe</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hospital do Salnés</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Midcheshire Hospital Foundation Trust</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad Politécnica de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Manchester</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Salford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University of Stirling</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Emergence of the Coronavirus 2019 Disease has highlighted further the need for timely support for clinicians as they manage severely ill patients. We combine Semantic Web technologies with Deep Learning for Natural Language Processing with the aim of converting human-readable best evidence/practice for COVID-19 into that which is computer-interpretable. We present the results of experiments with 1212 clinical ideas (medical terms and expressions) from two UK national healthcare services specialty guides for COVID-19 and three versions of two BMJ Best Practice documents for COVID-19. The paper seeks to recognise and categorise clinical ideas, performing a Named Entity Recognition (NER) task, with an ontology providing extra terms as context and describing the intended meaning of categories understandable by clinicians. The paper investigates: 1) the performance of classical NER using MetaMap versus NER with fine-tuned BERT models; 2) the integration of both NER approaches using a lightweight ontology developed in close collaboration with senior doctors; and 3) the easy interpretation by junior doctors of the main classes from the ontology once populated with NER results. We report the NER performance and the observed agreement for human audits.</p>
      </abstract>
      <kwd-group>
        <kwd>Ontologies</kwd>
        <kwd>Deep Learning for Natural Language Processing</kwd>
        <kwd>static embeddings</kwd>
        <kwd>transformer-based language models</kwd>
        <kwd>COVID-19</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        be reviewed and updated frequently and rapidly. There are datasets for COVID-19
such as CORD-19 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and LitCovid [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] that are updated regularly, including
thousands of articles from PubMed/MEDLINE [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, not all the publications
included in those datasets have the same clinical value as resources for Evidence-Based
Medicine (EBM) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that integrates clinical experience with the best scientifically
sound research available [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        The body of scientific evidence for healthcare is not limited to information in
PubMed/MEDLINE articles, but also includes clinical point-of-care summaries and
clinical practice guidelines from healthcare services. BMJ Best Practice [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and
UpToDate [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] are examples of clinical evidence summaries that aim to bring the latest
evidence from health research into healthcare practice.
      </p>
      <p>
        Biomedical facts and clinical recommendations are made of natural language
statements, typically complex sentences that are human-readable and intended for
expert-to-expert communication. Named Entity Recognition (NER) is one
wellknown natural language processing (NLP) task that seeks to recognise specific words
or phrases (‘entities’) from natural language statements and categorise them [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In
this study, we adhere to a functional perspective on ontologies [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and explore how
ontologies can be used for categorisation in support of NER task. We consider
ontologies as artifacts that can [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]: a) provide background knowledge about a domain; b)
contain a list of terms associated with the ontology’s classes and relations; and c)
supply formal machine-readable definitions and axioms represented in many forms.
      </p>
      <p>
        This paper addresses three critical questions regarding prior knowledge for
COVID-19, i.e. best evidence/practice provided by UK clinical practice guidelines
and BMJ Best Practice documents for COVID-19. Firstly, what is the performance of
classical NER using MetaMap [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] versus the state-of-the-art NER with
transformerbased language models from Deep Learning for NLP [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], such as BERT
(Bidirectional Encoder Representations from Transformers) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]? Secondly, to what extent
does a lightweight ontology facilitate the integration of results from both NER
approaches? Thirdly, to what extent can the main classes from the lightweight ontology,
once populated with NER results, be easily interpreted by junior doctors?.
      </p>
      <p>
        The novelty of this paper is three-fold: 1) presenting the Evidence-Based
Recommendation Ontology (EBRO), a light-weight ontology co-created with close
collaboration with senior doctors (medical consultants from UK and Spain) that aims to
contain main classes easily interpretable by junior doctors; 2) proposing a different
problem formulation for NER as a fine-tuning specific task with transformer-based
language models, considering NER as a sequence-level task instead of a token-level task
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]; and 3) exploring NER performance for BMJ Best Practice text excerpts for
COVID-19 using biomedical-specific transformer models and general-domain
transformer models (e.g. BERT) fine-tuned for NER with titles and available abstracts
from PubMed/MEDLINE articles about COVID-19.
      </p>
      <p>The approach presented follows Semantic Deep Learning [13] combining Semantic
Web technologies and Deep Learning for NLP. The paper belongs to explainable
artificial intelligence [14]. The fine-tuning of transformer-based language models fits
in the new field of explainable active learning (XAL) [15], differing from traditional
active learning (AL) in providing the model’s prediction together with an explanation.</p>
    </sec>
    <sec id="sec-2">
      <title>Experiments with prior knowledge for COVID-19</title>
      <p>We start by presenting the informal and formal meanings for the main classes within
the EBRO. Next, we illustrate how the outcome of both MetaMap and
transformerbased language models fine-tuned for NER can be incorporated into the EBRO. We
introduced six principles developed to assess the outcome from MetaMap. We
provide details of XAL setup for fine-tuning transformer-based language models for
NER. Finally, we recapitulate the experimental design and the measures for
evaluating the performance of the experiments, including human audits.
2.1</p>
      <sec id="sec-2-1">
        <title>The Evidence-Based Recommendation Ontology (EBRO)</title>
        <p>This study presents the EBRO, an ontology represented in the W3C Web Ontology
Language (OWL) [16]. We take a pragmatic approach to the ontology building and
prioritise re-use over other considerations. The EBRO reuses axioms from several
ontologies, such as: the Ontology Lexicon (Ontolex) [17]; the Semantic science
Integrated Ontology (SIO) [18]; the Basic Formal Ontology (BFO) [19] and the
Information Artifact Ontology (IAO) [20]. The EBRO can be downloaded from [21].</p>
        <p>Table 1 illustrates informal descriptions of the main EBRO classes according to
senior doctors along with the more formal descriptions in the Manchester OWL
syntax [22] with classes from BFO and IAO. The EBRO reuses some of the Unified
Medical Language System (UMLS) Semantic Types [23] like ‘T121|Pharmacologic
Substance’. The EBRO reuses the classes ‘Condition’ and ‘Population’ from the
PICO ontology [24].</p>
        <sec id="sec-2-1-1">
          <title>EBRO class Patient's healthcare problem</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>Process of care</title>
        </sec>
        <sec id="sec-2-1-3">
          <title>Patient’s</title>
          <p>treatment
Informal description
cAmliahnle/iacabaltlnhfocirnamrdeianlpgcrsolibinnlieccmlauldismitnapgtelissey,smatnhpdetodpmiraesgs,nenonoscree-so.f 'SoubboC:rleaaslsiOzafb:le entity'
“The processes through which patient care is
delivered” [25].
“Action taken by a health professional, in the
context of contact with a treatment recipient, SubClassOf: 'Process of
to alter the functioning of an individual with a care'
disability or at risk of a disability” [26].</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>SubClassOf: obo:process</title>
        </sec>
        <sec id="sec-2-1-5">
          <title>Manchester OWL syntax</title>
        </sec>
        <sec id="sec-2-1-6">
          <title>Patient’s test "All types of tests are eligible” [27].</title>
        </sec>
        <sec id="sec-2-1-7">
          <title>Chemicals &amp;</title>
          <p>Drugs
Evidencebased
information source</p>
        </sec>
        <sec id="sec-2-1-8">
          <title>Some UMLS Semantic Types like ‘Pharma</title>
          <p>cologic Substance’ are included as subtypes.
Examples are: PubMed articles; clinical
evidence summaries (e.g. BMJ Best Practice);
and clinical practice guidelines.</p>
        </sec>
        <sec id="sec-2-1-9">
          <title>SubClassOf: 'Process of</title>
          <p>care'
SubClassOf:
'obo:material entity'
SubClassOf: obo:document
and ('obo:has evidence'
some obo:evidence)
2.2</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>EBRO and NER: MetaMap versus BERT models</title>
        <p>
          NER with MetaMap. To express a clinical idea, a medical term - more generally, a
medical expression - may be needed. A clinical idea may match fully or partially a
concept from existing clinical/biomedical terminologies. In UMLS Metathesaurus
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], each concept has a Concept Unique Identifier (CUI) and one or more UMLS
Semantic Types. In UMLS, a Metathesaurus concept is mapped to zero, one, or more
than one concept from the clinical terminology SNOMED CT (Systematized
Nomenclature of Medicine - Clinical Terms) [28]. MetaMap can map a clinical idea to
UMLS Metathesaurus concepts and SNOMED CT concepts. Occasionally,
professional terminologists may suggest a bio-health informatics (BHI) concept that merges
multiple similar UMLS CUIs into one concept or is a concept non-existent in UMLS.
Six principles to assess NER with MetaMap. This study selects the focus concept(s)
among the CUIs provided by MetaMap, but in some cases the mapping has been
performed manually. The selection of the focus concept(s) is guided by six principles:
1. The focus concept is interpreted in this study as the CUI that captures the
key and more specific biomedical/clinical meaning (i.e. governing term).
2. When selecting the focus concept, avoid general biomedical/clinical terms in
favour of more specific terms.
3. When selecting the focus concept, favour CUIs that have a wider coverage in
vocabulary sources as well as a wider meaning, and if pertinent, are already
included in SNOMED CT. If the CUI covers “literally” the clinical idea, this
should be selected, even if it is Not in SNOMED CT. If the CUI covers the
clinical idea and is mapped to SNOMED CT, this should be selected.
4.
5.
6.
        </p>
        <p>A focus concept can have one or more refinements (i.e. dependent terms).
Negation is interpreted in this study as a refinement.</p>
        <p>Multiple focus concepts should be considered only if there is more than one
governing term, and if possible, belonging to the same UMLS Semantic
Type. When using multiple focus concepts the meaning is their combined
meaning, i.e. with logical “OR” for connecting the multiple focus concepts.</p>
      </sec>
      <sec id="sec-2-3">
        <title>NER as fine-tuning transformer-based language models. In this work, we use the</title>
        <p>
          BERT base model as a baseline [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. For the experiments, we consider the
generaldomain pre-trained language models BERT and RoBERTa [29]. We also consider
five biomedical-specific pre-trained language models: BioBERT [30], SciBERT [31],
ClinicalBERT [32], BlueBERT [33], and PubMedBERT [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Table 2 has
information about the corpus from which transformer-based models were pre-trained.
SciBERT was pre-trained using PubMed Central [34] and computer science (CS)
literature. The model names (last column) are from the python library transformers by
Hugging Face [35], which is used to fine-tune the transformer-based models.
        </p>
        <p>
          The fine-tuning of BERT for a NLP downstream task is a problem formulation
with two alternatives: token classification or sequence classification [
          <xref ref-type="bibr" rid="ref11 ref12">11,12</xref>
          ]. The
problem formulation for NER is token classification [
          <xref ref-type="bibr" rid="ref11 ref12">11,12</xref>
          ]. A novelty of our work is
to consider NER as sequence classification like question answering, and thus, the
AutoModelForSequenceClassification implemented in Hugging Face [35] is utilised.
NER fine-tuning with XAL setup. Figure 2 outlines the AL cycle. We used
word2phrase from word2vec [36] to obtain n-grams for each PubMed dataset (fourth
column in Table 3). Each PubMed article has a unique identifier (PMID). The
experimental setup for XAL considers 10 iterations, leveraging on word2vec models [36]
created with the skip-gram algorithm using titles and available abstracts from
PubMed/MEDLINE articles. For each iteration, the model Mi from Table 3 with i=[
          <xref ref-type="bibr" rid="ref1 ref10">1,10</xref>
          ]
provided some instances for training the transformer-based language models.
        </p>
        <p>
          M1 to M6 are created with titles and available abstracts (raw text) from PMIDs that
appear among the bibliographic references of the BMJ Best Practice for COVID-19
[37] released around the date shown in Table 3 (first column). M7 to M10 are created
with titles and available abstracts from files downloaded from PubMed from
December 2019 until the date displayed in Table 3 and having terms such as 'COVID-19',
'SARS-CoV-2', and 'coronaviruses' in the title, abstract, and original subject headings.
The last column in Table 3 has the number of PMIDs from CORD-19 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] dataset of
31 May 2021. Comparing the last two columns of Table 3, few PMIs included in the
BMJ Best Practice for COVID-19 [37] are not included in CORD-19.
The fine-tuning can be construed as reading comprehension [38] with a training
dataset having 3-tuples (label, question, passage). The label is the prediction True/False
also interpretable as a yes/no answer for question answering task. The question
consists of an n-gram representing a general medical term — providing the explicit
meaning and conveying a lexical sense (see Figure 1) — together with an input/output
ngram (appearing as ‘candidate n-gram’ in Figure 1) from vector arithmetic formulas
[39] applied to word2vec model Mi. The passage is a sentence obtained by retracing
the input/output n-gram into the PubMed dataset, which was re-organised by date and
source. The sentence acts as a local explanation, justifying only the reason for the
prediction on a specific input instance [40].
        </p>
        <p>The vector arithmetic formulas [39] (see Figure 2) act as the active learning
sampling strategy, i.e. the scoring functions to select the ‘candidate n-gram’ for the
queries to fine-tune pre-trained transformer-based language models. The total number of
unique n-grams from models M1 to M10 is 98,901 and the vector arithmetic formulas
selected 2060 instances for training: 1575 are False and 485 are True.
2.3</p>
      </sec>
      <sec id="sec-2-4">
        <title>Experimental design and evaluation metrics</title>
      </sec>
      <sec id="sec-2-5">
        <title>NER experiments with prior knowledge for COVID-19. For NER with MetaMap</title>
        <p>we used 1212 clinical ideas (see ‘clinical idea’ in Figure 1) appearing in textual
excerpts from two UK national healthcare service specialty guides for COVID-19 [41]
and two BMJ Best Practice documents for COVID-19 [37,42]. We considered a
3month chronology: documents released around 10th of May, June, and July 2020. For
NER with fine-tuned transformer-based models we used 259 clinical ideas appearing
in 345 textual excerpts (interpreted here as passage) that are new in the June 2020
version when compared with May 2020 version of the two BMJ Best Practice
documents for COVID-19 [37,42].</p>
      </sec>
      <sec id="sec-2-6">
        <title>NER performance metrics using the human gold standard labels. A professional</title>
        <p>terminologist and a senior doctor, both with many years of experience as clinical
coders, provide the human gold standard labels [38]. We report precision, recall, and
Fmeasure [38] for NER using the human gold standard labels.</p>
        <p>For NER with MetaMap, considering the six principles introduced in the previous
subsection, there are two possibilities when mapping the meaning of a clinical idea to
UMLS CUI(s): a) full match in meaning, e.g. synonym, expressed as a “a clinical
idea is-a focus concept, which ‘refers to (full match)’ CUI”; b) partial match in
meaning, i.e. something is not captured by the CUI(s), expressed as “a clinical idea has at
least one focus concept, which ‘evokes (partial match)’ CUI”. If there are multiple
focus concepts, each single CUI is a partial match in meaning, i.e. “evokes”.</p>
        <p>For NER with fine-tuned transformer-based language models, a clinical idea may
appear in one or more textual excerpts from BMJ Best Practice. Each textual excerpt
is considered a passage. Every lexical sense from Figure 1 is systematically
considered, i.e. composing questions with the clinical idea and the n-gram representing a
general medical term. Transformers map sequences of input vectors {x1, ..., xn} to
sequences of output vectors {y1, ..., yn} of the same length [38]. The NER result is
interpreted by looking at the output label included in the output vector. The output
label indicates if the clinical idea belongs (True/False) to the lexical sense.</p>
      </sec>
      <sec id="sec-2-7">
        <title>Human audit: measuring agreement with the human gold standard labels. We</title>
        <p>carried out three human audits with domain experts as indicated in Figure 3. We
report the observed agreement and kappa coefficient [43]. The human audit with
clinicians judge the classification of the clinical ideas as: (1) children or descendants of
EBRO main classes using the HermiT reasoner [44]; and (2) lexical senses (see
Figure 1) relatable to the output labels included in the output vectors from BERT models.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results: NER performance and human audit</title>
      <p>For the 1212 clinical ideas, NER with MetaMap using the six principles introduced
earlier obtained F1-measure=90.96% with Precision=84.25% and Recall=98.83% for
UMLS version 2016AA. For UMLS version 2020AA, F1-measure=91.93% with
Precision=85.49% and Recall=99.42%.</p>
      <p>Terminologist A had an observed agreement of 98.10% for 888 clinical ideas.
Terminologist B had an observed agreement of 97.49% for 314 clinical ideas. Kappa
K=0.887 for terminologists A and B is interpreted as “almost perfect agreement” [43].</p>
      <p>For 972 children or descendants of EBRO main classes, a UK junior doctor had an
observed agreement of 95.27%. For the 259 clinical ideas classified according to
lexical senses, the same UK junior doctor had an observed agreement of 87.64%.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Concluding remarks</title>
      <p>Whether physicians are ready to use evidence from big data remains unclear.
However, this study suggests a high level of agreement by junior doctors for categories
proposed by senior doctors. From an ontological point of view, the EBRO has many
weaknesses, like including ambiguous lexical senses exploitable by BERT models.
Indeed, the observed agreement is lower for the clinical senses after being populated
for COVID-19 than the observed agreement for the EBRO main classes after being
populated for COVID-19.</p>
      <p>The six principles introduced to assess NER with MetaMap seem to foster a high
agreement with professional terminologists and a performance quite close to NER
with BERT models, where SciBERT obtained the highest F1-measure=92.87%.
13. Semantic Deep Learning,
http://www.semantic-web-journal.net/content/special-issuesemantic-deep-learning
14. Gunning, D., et al.: XAI—Explainable artificial intelligence. Science Robotics. doi:
10.1126/scirobotics.aay712 (2019).
15. Ghai, B., et al. Explainable active learning (xal) toward ai explanations as interfaces for
machine teachers. In: ACM proceedings on Human-Computer Interaction, pp. 1-28 (2021).
16. OWL, https://www.w3.org/TR/owl2-quick-reference/
17. Ontolex, https://www.w3.org/community/ontolex/wiki/Final_Model_Specification
18. SIO, https://bioportal.bioontology.org/ontologies/SIO
19. BFO, http://www.obofoundry.org/ontology/bfo.html
20. IAO, http://www.obofoundry.org/ontology/iao.html
21. EBRO, https://github.com/arguellocasteleiro/OWL/
22. Manchester OWL syntax, https://www.w3.org/TR/owl2-manchester-syntax/
23. UMLS Semantic Types, https://lhncbc.nlm.nih.gov/semanticnetwork/
24. PICO ontology, https://linkeddata.cochrane.org/pico-ontology
25. Medicare: A Strategy for Quality Assurance. doi: 10.17226/1547 (1990).
26. Hart, T., et al.: Toward a theory-driven classification of rehabilitation treatments. doi:
10.1016/j.apmr.2013.05.032 (2014).
27. Cochrane Handbook for Systematic Reviews of Interventions, Version 6.2, 2021.</p>
      <p>https://training.cochrane.org/handbook/current/chapter-i
28. SNOMED CT, https://www.snomed.org/snomed-ct/five-step-briefing
29. Liu, Y., et al.: Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692
(2019).
30. Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for
biomedical text mining. Bioinformatics, 36(4), pp.1234-1240 (2020).
31. Beltagy, I., Lo, K. and Cohan, A.: Scibert: A pretrained language model for scientific text.</p>
      <p>In: 2019 EMNLP-IJCNLP, pp. 3615–3620 (2019).
32. Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: NAACL, pp. 72–78
(2019).
33. Peng, Y., Yan, S. and Lu, Z.: Transfer learning in biomedical natural language processing:
an evaluation of BERT and ELMo on ten benchmarking datasets. In: 18th BioNLP
Workshop and Shared Task, pp. 58–65 (2019).
34. PubMed Central (PMC), https://www.ncbi.nlm.nih.gov/pmc/
35. Transformers library, https://huggingface.co/transformers/
36. word2vec, http://code.google.com/p/word2vec/
37. BMJ Best Practice “COVID-19”, https://bestpractice.bmj.com/topics/en-gb/3000168
38. Jurafsky, D. and Martin, J.H.: Speech and language processing (3rd ed. draft). December
2020. https://web.stanford.edu/~jurafsky/slp3/
39. Levy, O., Goldberg, Y.: Linguistic Regularities in Sparse and Explicit Word
Representations. In: ACL'14. doi: 10.3115/v1/w14-1618 (2014).
40. Guidotti, R., et al.: A survey of methods for explaining black box models. ACM, doi:
10.1145/3236009, pp.1-42 (2018).
41. COVID-19 specialty guides,
https://www.england.nhs.uk/coronavirus/secondarycare/other-resources/specialty-guides/
42. BMJ Best Practice “Management of coexisting conditions in the context of COVID-19”,
https://bestpractice.bmj.com/topics/en-gb/3000190
43. Viera, A.J., Garrett, J.M.: Understanding interobserver agreement: the kappa statistic. Fam</p>
      <p>Med. 37(5), pp. 360-3 (2005).
44. HermiT Reasoner, http://www.hermit-reasoner.com/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. WHO: COVID-19, https://www.who.int/director-general/speeches/detail/who
          <article-title>-directorgeneral-s-opening-remarks-at-the-media-briefing-on-</article-title>
          <string-name>
            <surname>covid-</surname>
          </string-name>
          19
          <string-name>
            <surname>---</surname>
          </string-name>
          11-march-2020
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. CORD-
          <volume>19</volume>
          , https://www.kaggle.com/allen-institute-for-ai/CORD-19
          <string-name>
            <surname>-</surname>
          </string-name>
          research-challenge
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>3. LitCovid, https://www.ncbi.nlm.nih.gov/research/coronavirus/</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>4. PubMed/MEDLINE, https://pubmed.ncbi.nlm.nih.gov/</mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Masic</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miokovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muhamedagic</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Evidence based medicine - new approaches and challenges</article-title>
          .
          <source>Acta Inform Med</source>
          .
          <volume>16</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>219</fpage>
          -
          <lpage>25</lpage>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>BMJ</given-names>
            <surname>Best Practice</surname>
          </string-name>
          , https://bestpractice.bmj.com/info/
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>7. UpToDate, https://www.uptodate.com/home</mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nadkarni</surname>
            ,
            <given-names>P.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohno-Machado</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Chapman</surname>
          </string-name>
          , W.W.:
          <article-title>Natural language processing: an introduction</article-title>
          .
          <source>J Am Med Inform Assoc</source>
          ,
          <volume>18</volume>
          (
          <issue>5</issue>
          ), pp.
          <fpage>544</fpage>
          -
          <lpage>551</lpage>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hoehndorf</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schofield</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gkoutos</surname>
            ,
            <given-names>G.V.</given-names>
          </string-name>
          :
          <article-title>The role of ontologies in biological and biomedical research: a functional perspective</article-title>
          .
          <source>Brief Bioinform</source>
          .
          <volume>16</volume>
          (
          <issue>6</issue>
          ), pp.
          <fpage>1069</fpage>
          -
          <lpage>80</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>An overview of MetaMap: historical perspective and recent advances</article-title>
          .
          <source>J Am Med Inform Assoc</source>
          .
          <volume>17</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>229</fpage>
          -
          <lpage>236</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tinn</surname>
          </string-name>
          , et al.:
          <article-title>Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing</article-title>
          . doi:
          <volume>10</volume>
          .1145/3458754 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: 2019 NAACL</source>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>