<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deeper Clinical Document Understanding Using Relation Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hasham Ul Haq</string-name>
          <email>hasham@johnsnowlabs.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Veysel Kocaman</string-name>
          <email>veysel@johnsnowlabs.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Talby</string-name>
          <email>david@johnsnowlabs.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Figure 1: A Relation Extraction model semantically relating</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>John Snow Labs inc.</institution>
          <addr-line>16192 Coastal Highway, Lewes, DE 19958</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Relation Extraction</institution>
          ,
          <addr-line>Natural Language Understanding, Natural Language Processing, BERT, Spark, Deep Learning, Adverse</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>The surging amount of biomedical literature &amp; digital clinical records presents a growing need for text mining techniques that can not only identify but also semantically relate entities in unstructured data. In this paper we propose a text mining framework comprising of Named Entity Recognition (NER) and Relation Extraction (RE) models, which expands on previous work in three main ways. First, we introduce two new RE model architectures - an accuracy-optimized one based on BioBERT and a speed-optimized one utilizing crafted features over a Fully Connected Neural Network (FCNN). Second, we evaluate both models on public benchmark datasets and obtain new state-of-the-art F1 scores on the 2012 i2b2 Clinical Temporal Relations challenge (F1 of 73.6, +1.2% over the previous SOTA), the 2010 i2b2 Clinical Relations challenge (F1 of 69.1, +1.2%), the 2019 Phenotype-Gene Relations dataset (F1 of 87.9, +8.5%), the 2012 Adverse Drug Events Drug-Reaction dataset (F1 of 90.0, +6.3%), and the 2018 n2c2 Posology Relations dataset (F1 of 96.7, +0.6%). Third, we show two practical applications of this framework - for building a biomedical knowledge graph and for improving the accuracy of mapping entities to clinical codes. The system is built using the Spark NLP library which provides a production-grade, natively scalable, hardware-optimized, trainable &amp; tunable NLP framework.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Biomedical literature has witnessed exponential rise in</title>
        <p>
          the past decade. MEDLINE currently holds more than
dexed more than 5 million records in the past seven years
alone [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Furthermore, public databases like
https://clinicaltrials.gov have seen an explosion of trials data as the
aftermath of the novel Covid-19 outbreak.
        </p>
        <p>
          In addition, wide-spread adoption of Electronic Health
Records (EHRs), has made copious amount of free-text
data available in digital format. This unstructured data is
usually documented by healthcare professionals during
the course of patient care, such as clinical notes, discharge
summaries, lab reports, and pathology reports [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. While
publications and literature are growing rapidly, there still
lacks structured knowledge that can be easily processed
by computer programs. Relation Extraction becomes
even more pertinent in biomedical research as it can
provide the critical links required to generate knowledge
graphs for better analysis and research, and even text
summarization. Relating entities also help us improve
medical coding by enriching vanilla entity chunks with
nEvelop-O
        </p>
        <p>While the trend of training large transformer models
continues, applying them on large datasets remains a
challenge as they require significant computational
resources. Furthermore, long documents containing high
number of entity spans can exponentially increase
probable entity pairs for RE classification - requiring
significantly more resources and processing time.</p>
        <p>In this study we focus on three major aspects of RE; the
model architectures and their scalability, evaluating the
models on benchmark datasets, and training and using RE
for general use-cases. We also study the application of RE
for understanding diferent aspects of clinical documents
like extracting and relating dates to generate timeline of a
patient’s data on a timeline, or parsing and understanding
trial results on large cohorts for analysis.</p>
        <p>Following are the novel contributions of this paper:
• Introducing two new RE architectures.
• Evaluating and comparing performance of the</p>
        <p>proposed models on benchmark datasets.
• Training the models on custom datasets and demon- Figure 2: Overview of the first RE model. All the features are
strating how RE can be used to get a structured vertically stacked in a single feature vector. The feature vector
output for specific use-cases. is kept dynamic with additional padding for compatibility
across diferent embedding sizes, and complex dependency
• Studying the use-case of putting the history and structures.</p>
        <p>medical history of patients on a timeline.
• Analyzing the benefits of using RE to get more
precise entity chunks for achieving better
performance while mapping them to medical codes.</p>
        <p>the model in Apache Spark for scalability, take
checkpoints from the BioBERT model, and train an end-to-end
BERT model for RE. Similar to the first solution, this
ar2. Approach chitecture also depends on the entity spans identified by
the base NER model, and uses the entire document as
conWe treat RE as a classification problem where each exam- text string while training the model. The original paper
ple is a pair of biomedical entities appearing in a given used sequence length of 128 tokens for the context string,
context - the entities being NER chunks, and context be- which we keep constant, and instead experiment with the
ing the sentence / entire document - and develop two content of the context string, training data augmentation,
novel solutions; the first one comprising of a simpler and fine-tuning techniques.</p>
        <p>
          FCNN architecture for speed, and the second one based We use Spark NLP’s [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] NER models [8] as foundation
on the BioBERT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] architecture for accuracy. We exper- for the RE models as these NER models provide entity
iment both approaches and compare their results. spans required for performing RE. In a single inference
        </p>
        <p>For our first RE solution we rely on entity spans and pipeline, the RE models are placed sequentially after the
types identified by the NER model to develop distinct fea- the NER model, and are fed the results of the NER model,
tures to feed to an FCNN for classification. At first we gen- the context, embeddings, and dependency tree for feature
erate distinct pairs of entities (e.g. symptom-treatment), generation. Apart from feature generation, the
depenand then generate custom features for each pair. These dency tree also helps regularize candidate entity pairs
features include semantic similarity of the entities, syn- for RE classification as we can eliminate pairs having
tactic distance of the two entities, dependency structure a larger syntactic distance. This modular approach of
of the entire document, embedding vectors of the entity arranging components reduces coupling and achieves a
spans, as well as embedding vectors for 100 tokens within higher degree of memory and computational eficiency
the vicinity of each entity. Figure 2 explains our model as components like sentences, tokens, and embeddings
architecture in detail. We then concatenate these features are shared between NER and RE models and don’t need
and feed them to fully connected layers with leaky relu to be executed again. Since the NER model is essentially
activation. We also use batch normalisation after each a token classifier and produces prediction per token, we
afine transformation before feeding to the final softmax convert the tokens to chunks using BIO tags.
layer with cross-entropy loss function. We use softmax
cross-entropy instead of binary cross-entropy loss to keep
the architecture flexible for scaling on datasets having 3. Experiments
multiple relation types.</p>
        <p>
          Our second solution focuses on a higher accuracy, as
well as exploration of relations across long documents,
and is based on [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. In our implementation, we implement
We test the models on public datasets, report evaluation
metrics, and analyse the results on examples. In addition
to public datasets, we explain the process of annotating
and training models on new datasets. We then study the
utility of applying RE for some use-cases like knowledge
graph generation and improved entity resolution (the
process of mapping entity chunks to medical codes).
        </p>
        <sec id="sec-1-1-1">
          <title>Dataset</title>
        </sec>
        <sec id="sec-1-1-2">
          <title>1k Notes 10k Notes</title>
        </sec>
        <sec id="sec-1-1-3">
          <title>FCNN RE Model</title>
        </sec>
        <sec id="sec-1-1-4">
          <title>BERT RE Model</title>
          <p>3.1. Performance on Public Datasets</p>
          <p>We test both model architectures on seven public datasets
by using the oficial training-test split for training and
testing the models, and report macro-average f1 scores for
each one of them in Table 1. These datasets include the
2012 i2b2 challenge for evaluating temporal relations in test result) that can compliment core entities (e.g,
sympclinical text [9], the 2010 i2b2/VA challenge on concepts, tom, procedure, test) as the first entity and disjoint entity
assertions, and relations in clinical text [10], the Drug- types - meaning the entities should not have relation
Drug-Interaction (DDI) dataset for linking drugs with among themselves - from the the core entities for the
dispositions and reactions [11], the Chemical–protein second entity as explained in Table 3. Since the first
eninteraction (CPI) dataset for linking genes/proteins with tity can relate to multiple entities in the second column,
drug chemicals [12], the Phenotype-Gene Relations (PGR) we can define the relation between the two entity types
dataset for relating human phenotypes and genes [13], as one-to-many, and can keep the relation types to a
the adverse drug events dataset for relating drugs with minimum i.e. are the two entities related or not. This
their reactions [14], and the posology relations task based approach helps reduce annotation complexity resulting
on the 2018 n2c2 task [15]. For the sake of brevity we in faster annotation times, and a higher inter-annotator
don’t delve into the details for each dataset, and specific agreement. For annotation purposes we utilized the
pubdetails for each dataset can be found in the cited resources. licly available Annotation Lab tool.
As explained in Table 1, the BERT model achieves new
SOTA metrics on 5 public datasets, and out performs model Entity 1 Entity 2
tnheessl.igHhtoewrFeCveNr,Nitmisodmeolrdeuethtaonbe3ttteimrceosnltoewxteuraal nadwahraes- re_bodypart_procedure_test Body Part ProTceesdture
much higher memory requirements. Table 2 compares
trhamesepteeredsedttiifenrgenacnedosfatmhepltewPoyathrcohnitceocdtuerfeosr. tHrayipneinrgpaa-n re_test_result_date Test TesDtRateesult
RE model from scratch is in Appendix A &amp; C. re_bodypart_problem Body Part Symptom</p>
        </sec>
        <sec id="sec-1-1-5">
          <title>FCNN</title>
        </sec>
        <sec id="sec-1-1-6">
          <title>BioBERT</title>
        </sec>
        <sec id="sec-1-1-7">
          <title>Curr-SOTA</title>
        </sec>
        <sec id="sec-1-1-8">
          <title>Dataset i2b2-Temporal i2b2-Clinical DDI</title>
          <p>CPI
PGR</p>
        </sec>
        <sec id="sec-1-1-9">
          <title>ADE Corpus Posology</title>
          <p>68.7
60.4
69.2
65.8
81.2
89.2
87.8
73.6
69.1
72.1
74.3
87.9
90.0
96.7
72.41
67.97
84.1
88.9
79.4
83.7
96.1
re_test_problem_finding
re_bodypart_directions
Test
Body Part
Symptom
Direction</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Practical Applications of</title>
    </sec>
    <sec id="sec-3">
      <title>Relation Extraction</title>
      <p>4.1. Generating Knowledge Graph with</p>
      <p>Relations
Most notable benefit of RE is the ability to generate
knowledge graphs from unstructured text. For this
experiment, we used pretrained Spark NLP NER models and
the general-purpose RE models explained in the previous
section to process medical reports with the primary goal
of generating a concise structured output of a report. For
instance, we relate procedures with dates and findings
to recognize dates of a procedure and its findings along
with any existing condition. We use the relations
between body parts and procedures to get more specific
details of the location of the procedure. Similarly, relating
body parts with findings like test results and
measurements can add more details to the final output in specific
use-cases. More granularity can be achieved by having
further subdivisions of body parts. For instance, in our
experiment, we divide the body part in three parts; the
primary body part (e.g, lung), a sub-part (e.g, lobe), and
direction/laterlity (e.g, left) of the body part. In practice,
these specifc entities trickle from the NER model down to
the RE models. A graph generated from a sample report
can be seen in Figure 5.</p>
      <p>Furthermore, the structured data can help create a
patient timeline which can show progress of a certain
4.2. Enriching Chunks for more Accurate</p>
      <p>Coding</p>
      <sec id="sec-3-1">
        <title>Entity Resolver models map entity chunks to medical</title>
        <p>codes like CPT [23], ICD [24], SNOMED [25], MeSH [26],
RxNorm [27] etc based on semantic similarity. This task
becomes challenging due to two major reasons. First, the
inherent noise of the text like abbreviations, acronyms,
and synonyms can result in false positive results.
Second, medical codes are sensitive to variables like severity,
location in human body, administration type, diagnosis
method, etc; For a given condition or treatment, there
could be diferent codes (within the same ontology)
depending on the aforementioned factors. This challenge</p>
        <sec id="sec-3-1-1">
          <title>Ontology</title>
          <p>CPT
SNOMED CT
ICD-10
SNOMED
CT Scan
CT Scan
Lesion
Lesion
is more prominent in ontologies with wider vocabularies relations within a certain syntactic span as even BERT
like SNOMED. models have token sequence limit. A future research</p>
          <p>RE provides solution to both problems; First, it intrin- direction could be to focus on improving contextual
repsically cleans the input for the resolver models of stop resentation of large documents to allow relations over
words and noise without additional efort. Second, it adds lengthy contextual spans. A second future research
diadditional information to the core entity chunks from rection is to test whether auxiliary data - either from
surrounding context; With the help of relations, simple medically annotated data or through transfer learning
entities can be enriched with precise information to get from healthcare-specific language models - can deliver
accurate codes. For example, a chunk CT Scan - identified higher accuracy Relation Extraction on the same neural
as a procedure - can be enriched with the imaging tech- network architectures.
nique to achieve a more accurate CPT/SNOMED code.</p>
          <p>Enriching it further with the location of the procedure
(e.g, chest) would result in an even accurate chunk that References
can be resolved to a more specific CPT/SNOMED code.</p>
          <p>Table 4 compares base chunks with enriched chunks that
include body parts, demonstrating the benefits of
enriched entity chunks for improved coding.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <sec id="sec-4-1">
        <title>In this paper we presented two new model architectures</title>
        <p>for RE while enabling scalability. We then tested the
models on public datasets and reported evaluation
metrics. The model metrics show that the BioBERT based
model outperforms the lighter FCNN model, and obtains
new state-of-the-art accuracy on on three benchmarks.</p>
        <p>However, for datasets with a small number of relation
types, the simpler FCNN model may be a compelling
option not only due to faster run times, but also much
lower memory requirements compared to the BioBERT
model, allowing to process larger datasets on commodity
hardware. We also explain how to train RE models from
scratch and describe the design behind the pre-trained
models available as part of Spark NLP library.</p>
        <p>We then study practical use cases where RE plays the
salient role of linking entities together to generate
knowledge graphs, patient timelines, and structured summaries
of medical notes. Relating dates to primary procedures
and problems can help create a timeline for each patient.</p>
        <p>Finally, using granular NER models together with
discrete RE models to clean and enrich entity chunks enables
better entity resolution to clinical codes.</p>
        <p>Given the complex nature of RE, and the pivotal role
of contextual information, a common approach is to limit
understanding at scale, Software Impacts 8 (2021) drug events and medication extraction in electronic
100058. URL: https://www.sciencedirect.com/ health records, Journal of the American Medical
science/article/pii/S2665963821000063. doi:https: Informatics Association 27 (2020) 3–12.</p>
        <p>//doi.org/10.1016/j.simpa.2021.100058. [16] H. Guan, J. Li, H. Xu, M. V. Devarakonda,
[8] V. Kocaman, D. Talby, Improving clinical document Robustly pre-trained neural model for direct
understanding on COVID-19 research with spark temporal relation extraction, CoRR abs/2004.06216
NLP, CoRR abs/2012.04005 (2020). URL: https:// (2020). URL: https://arxiv.org/abs/2004.06216.
arxiv.org/abs/2012.04005. arXiv:2012.04005. arXiv:2004.06216.
[9] W. Sun, A. Rumshisky, O. Uzuner, Evaluating [17] D. Ningthoujam, S. Yadav, P. Bhattacharyya, A.
Ektemporal relations in clinical text: 2012 i2b2 chal- bal, Relation extraction between the clinical
entilenge, Journal of the American Medical Informat- ties based on the shortest dependency path based
ics Association : JAMIA 20 (2013). doi:10.1136/ LSTM, CoRR abs/1903.09941 (2019). URL: http:
amiajnl-2013-001628. //arxiv.org/abs/1903.09941. arXiv:1903.09941.
[10] Ö. Uzuner, B. R. South, S. Shen, S. L. DuVall, 2010 [18] M. Asada, M. Miwa, Y. Sasaki, Using drug
i2b2/va challenge on concepts, assertions, and re- descriptions and molecular structures for
lations in clinical text, Journal of the American drug–drug interaction extraction from
literaMedical Informatics Association 18 (2011) 552–556. ture, Bioinformatics 37 (2020) 1739–1746. URL:
[11] M. Herrero-Zazo, I. Segura-Bedmar, P. Martínez, https://doi.org/10.1093/bioinformatics/btaa907.</p>
        <p>T. Declerck, The ddi corpus: An annotated doi:10.1093/bioinformatics/btaa907.
corpus with pharmacological substances
arXiv:https://academic.oup.com/bioinformatics/articleand drug–drug interactions, Journal of pdf/37/12/1739/39119268/btaa907.pdf.
Biomedical Informatics 46 (2013) 914–920. [19] L. N. Phan, J. T. Anibal, H. Tran, S. Chanana, E.
BaURL: https://www.sciencedirect.com/science/ hadroglu, A. Peltekian, G. Altan-Bonnet, Scifive: a
article/pii/S1532046413001123. doi:https: text-to-text transformer model for biomedical
lit//doi.org/10.1016/j.jbi.2013.07.011. erature, CoRR abs/2106.03598 (2021). URL: https:
[12] M. Krallinger, O. Rabal, S. A. Akhondi, M. P. Pérez, //arxiv.org/abs/2106.03598. arXiv:2106.03598.</p>
        <p>J. Santamaría, G. P. Rodríguez, G. Tsatsaronis, A. In- [20] D. Sousa, F. M. Couto, Biont: Deep learning
ustxaurrondo, J. A. B. López, U. K. Nandal, E. M. van ing multiple biomedical ontologies for relation
exBuel, A. Chandrasekhar, M. Rodenburg, A. Laegreid, traction, CoRR abs/2001.07139 (2020). URL: https:
M. A. Doornenbal, J. Oyarzábal, A. Lourenço, A. Va- //arxiv.org/abs/2001.07139. arXiv:2001.07139.
lencia, Overview of the biocreative vi chemical- [21] P. Crone, Deeper task-specificity improves
protein interaction track, 2017. joint entity and relation extraction, CoRR
[13] D. Sousa, A. Lamurias, F. M. Couto, A silver abs/2002.06424 (2020). URL: https://arxiv.org/abs/
standard corpus of human phenotype-gene rela- 2002.06424. arXiv:2002.06424.
tions, in: Proceedings of the 2019 Conference of [22] X. Yang, Z. Yu, Y. Guo, J. Bian, Y. Wu, Clinical
the North American Chapter of the Association relation extraction using transformer-based models,
for Computational Linguistics: Human Language CoRR abs/2107.08957 (2021). URL: https://arxiv.org/
Technologies, Volume 1 (Long and Short Papers), abs/2107.08957. arXiv:2107.08957.</p>
        <p>Association for Computational Linguistics, Min- [23] AMA, Cpt, https://www.ama-assn.org/amaone/
neapolis, Minnesota, 2019, pp. 1487–1492. URL: cpt-current-procedural-terminology, 2020.
Achttps://aclanthology.org/N19-1152. doi:10.18653/ cessed: 2021-12-22.</p>
        <p>v1/N19-1152. [24] WHO, Icd10, https://www.who.int/standards/
[14] H. Gurulingappa, A. M. Rajput, A. Roberts, classifications/classification-of-diseases, 2019.</p>
        <p>J. Fluck, M. Hofmann-Apitius, L. Toldo, De- Accessed: 2021-12-22.
velopment of a benchmark corpus to sup- [25] NLM, Snomed ct, https://www.nlm.nih.gov/
port the automatic extraction of drug-related healthit/snomedct/index.html, 2019. Accessed:
adverse efects from medical case reports, 2021-12-22.</p>
        <p>Journal of Biomedical Informatics 45 (2012) [26] NLM, Mesh, https://www.nlm.nih.gov/mesh/
885–892. URL: https://www.sciencedirect.com/ meshhome.html, 2021. Accessed: 2021-12-22.
science/article/pii/S1532046412000615. doi:https: [27] NLM, Rxnorm, https://www.nlm.nih.gov/research/
//doi.org/10.1016/j.jbi.2012.04.008, text umls/rxnorm/index.html, 2021. Accessed:
2021-12Mining and Natural Language Processing in 22.</p>
        <p>Pharmacogenomics. [28] JSL, Training code for re, https://github.com/
[15] S. Henry, K. Buchan, M. Filannino, A. Stubbs, JohnSnowLabs/spark-nlp-workshop/blob/master/
O. Uzuner, 2018 n2c2 shared task on adverse tutorials/Certification_Trainings/Healthcare/10.3.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>A. A. Hyperparameter Settings</title>
      <sec id="sec-5-1">
        <title>Since optimal hyperparameter values vary for each dataset, a range of values which performed best in all the datasets can be seen in Table 5.</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>B. B. Preparing training data for</title>
    </sec>
    <sec id="sec-7">
      <title>RE model in Spark NLP</title>
      <p>Since RE is a classification task, the primary inputs are
the context string (sentence), and a pair of entities. If
there are multiple pairs in a single context string, we
treat them as disjoint inputs as each input encapsulates
the required inputs like entity chunk pairs and context
- which are then used to create input features. We can
create a csv formatted file where each row is a training
example for the model, and contains the aforementioned
inputs. Exact schema of the training file can be found in
the training notebook [28].</p>
    </sec>
    <sec id="sec-8">
      <title>C. C. Training an RE Model in</title>
    </sec>
    <sec id="sec-9">
      <title>Spark NLP</title>
      <sec id="sec-9-1">
        <title>Code for training an RE mode is provided as a google</title>
        <p>colab notebook [28]. As majority of the public datasets
are protected and can not be shared, they need to be
obtain from their oficial websites and converted to the
required format before training.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yadav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekbal</surname>
          </string-name>
          ,
          <article-title>Relation extraction from biomedical and clinical text: Unified multitask learning framework</article-title>
          , CoRR abs/
          <year>2009</year>
          .09509 (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2009</year>
          .09509. arXiv:
          <year>2009</year>
          .09509.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tiryaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Relation extraction from clinical narratives using pre-trained language models</article-title>
          ,
          <source>in: AMIA Annual Symposium Proceedings</source>
          , volume
          <volume>2019</volume>
          , American Medical Informatics Association,
          <year>2019</year>
          , p.
          <fpage>1236</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          , CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1810</year>
          .04805. arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Two are better than one: Joint entity and relation extraction with table-sequence encoders</article-title>
          , CoRR abs/
          <year>2010</year>
          .03851 (
          <year>2020</year>
          ). URL: https: //arxiv.org/abs/
          <year>2010</year>
          .03851. arXiv:
          <year>2010</year>
          .03851.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>So</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <article-title>Biobert: a pre-trained biomedical language representation model for biomedical text mining</article-title>
          , CoRR abs/
          <year>1901</year>
          .08746 (
          <year>2019</year>
          ). URL: http://arxiv.org/ abs/
          <year>1901</year>
          .08746. arXiv:
          <year>1901</year>
          .08746.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L. B.</given-names>
            <surname>Soares</surname>
          </string-name>
          , N. FitzGerald, J. Ling, T. Kwiatkowski,
          <article-title>Matching the blanks: Distributional similarity for relation learning</article-title>
          , CoRR abs/
          <year>1906</year>
          .03158 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1906</year>
          .03158. arXiv:
          <year>1906</year>
          .03158.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kocaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Talby</surname>
          </string-name>
          , Spark nlp: Natural language Clinical_RE_SparkNLP_Paper_Reproduce.ipynb,
          <year>2021</year>
          . Accessed:
          <fpage>2021</fpage>
          -12-23.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>