<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Relation Extraction in Medical Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leo Xinyue Zhang</string-name>
          <email>leo.xinyue.zhang@kcl.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Angus Roberts</string-name>
          <email>angus.roberts@kcl.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Zeki</string-name>
          <email>sebastian.zeki@gstt.nhs.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>King's College London</institution>
          ,
          <addr-line>Strand, London, WC2R 2LS</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>St Thomas' Hospital</institution>
          ,
          <addr-line>Westminster Bridge Road, London SE1 7EH, United Kingdon</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Named Entity Recognition and Relation Extraction are two fundamental tasks for medical information extraction. Typically, these tasks are done in a pipeline way. However, this approach ignores the interactions between these two tasks. In addition, modelling with two models takes time to train and deploy. There is research on modelling the two tasks together. However, some research only considers entities that are in some relations. Moreover, there is seldom research on joining modelling in the medical domain. In this paper, we implemented a promising generative joining modelling method on a medical dataset. We extend the modelling mechanism to incorporate non-relation entities into the output as a self-concept relation. As such, we are able to output the entire entities and relations in one step for medical extraction.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>medical extraction, named entity recognition, relation extraction, joint modelling,</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Named Entity Recognition (NER) and Relation Extraction
(RE) are two fundamental tasks in Information Extraction
(IE) from free text. NER is the process of identifying
entities from free text, and categorise them if needed, while
RE is the process of identifying any existing relations
between the entities. Typically, these two tasks are done
in a sequential manner, i.e. named entities are extracted
ifrst before passing on to relation extraction. However,
there are two main drawbacks with this approach:
• This method disregards the interaction between
NER and RE tasks[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Because the NER and RE
module are two separated modules, the
information cannot flow between the two tasks. These
information can be helpful. Consider the
following example, “London is the capital of the United
Kingdom”, the information for relation extraction
“capital of” can help the NER task as that
relation indicates that the left hand entity will be a
city and the right hand entity will be a country,
province or equivalent.
• Errors from the NER task will propagate to NE
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In the previous example, if London is wrongly
identified as a person during the NER stage, this
error will not be corrected during the RE stage.
      </p>
      <p>
        One way to solve this problem is to model NER and RE
as one task, i.e. one model creates a single output
con[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, these end-to-end methods only generate
relation triplets which means that entities that are not
part of any relationship will not be extracted. These
entities can be important in real life applications. In this
work, we propose sequence formatting that incorporates
non-relation entities into the relation triplets. We
compare whether incorporating these non-relation entities
improves the performance of relation extraction. The
contributions of this work are two fold. First, it provides
a method for researchers that benefits from end-to-end
relation extraction models such as REBEL without the
need to set up a separate entity model for non-relation
into whether non-relation entities can help relation
extractions.
contributions of this work are two fold. First, it provides
a method for researchers that benefits from end-to-end
relation extraction models such as REBEL without the
need to set up a separate entity model for non-relation
entity extraction. Second, this work shines some insight
into whether non-relation entities can help relation
extractions.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Related Work</title>
      <sec id="sec-3-1">
        <title>3.1. Sequence to Sequence Modelling</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>2. Introduction</title>
      <p>Named Entity Recognition (NER) and Relation Extraction
(RE) are two fundamental tasks in Information Extraction
(IE) from free text. NER is the process of identifying
entities from free text, and categorise them if needed, while
RE is the process of identifying any existing relations
between the entities. Typically, these two tasks are done
in a sequential manner, i.e. named entities are extracted
ifrst before passing on to relation extraction. However,
there are two main drawbacks with this approach:
• This method disregards the interaction between</p>
      <p>
        NER and RE tasks[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Because the NER and RE Sequence to sequence (seq2seq) modelling is an
impormodule are two separated modules, the informa- tant task in NLP, generating a target sequence given a
tion cannot flow between the two tasks. These source sentence. Unlike classification tasks, where the
information can be helpful. Consider the follow- model generates a fixed-length output. Seq2seq tasks
ing example, “London is the capital of the United require a flexible length of output. Current seq2seq
modKingdom”, the information for relation extraction elling uses encoder-decoder models. These models have
“capital of” can help the NER task as that rela- two parts; an encoder model to encode the input sentence
tion indicates that the left hand entity will be a into some internal representation, and a decoder model
city and the right hand entity will be a country, to generate the output sentence from this representation.
province or equivalent. Early examples included two Recurrent Neural network
• Errors from the NER task will propagate to NE based models as the encoder and decoder for machine
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In the previous example, if London is wrongly translation [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and text summarisation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. A recent trend
identified as a person during the NER stage, this has switched the focus to attention-based models after the
error will not be corrected during the RE stage. proposal of the transformer model [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Attention-based
models have shown generally better performance, and
      </p>
      <p>
        One way to solve this problem is to model NER and RE the ability for parallel computing. The BART model[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
as one task, i.e. one model creates a single output con- made use of transformer architecture, which enabled it
taining both entity and relation extractions. For example, to do seq2seq tasks with no restrictions on output
sesequence-to-sequence (seq2seq) models directly output quence length. BART model is pre-trained on denoised
relation triplets which includes entities and relations be- and corrupted corpora to gain language reconstruction
tween entities [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], or graph models where the nodes are ability. The BART model achieved state-of-the-art results
entities and edges are relations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], or question answer- on a number of text generation tasks by the time it was
ing models where a sequence of questions are asked, and published.
the answer contains entities and relations [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In this
research, we focus on joint modelling, specifically using 3.2. Seq2seq model for relation extraction
seq2seq models to output relation triplets. This method
has been previously used, for example, [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] used a bidirec- In a recent work by [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a pre-trained BART-based model
tional Long Short-Term Memory (bi-LSTM) encoder and for relation triplets sequence generation was proposed,
decoder to output strings formatting by relation triplets. named REBEL. The model output is in the format of
Model performance is, however, restricted by the amount concatenating all the relation triplets in the sentence
of data one can use. To overcome the restrictions posed with the help of special tokens. However, to pre-train a
by limited amounts of data, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] proposed a novel way to BART based model on relations extraction task requires
construct a large dataset from Wikipedia, and pre-train a large amount of annotated data. The author proposed
a BART-based model[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] on that dataset. This method a method to generate a silver dataset from Wikipedia.
achieved the best performance on the four tasks reported Firstly, they extract all the Wikipedia abstraction, which
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, these end-to-end methods only generate is the section before the list of content1. In Wikipedia,
relation triplets which means that entities that are not the entities are usually in hyperlinks. The author then
part of any relationship will not be extracted. These en- mapped these entities to WikiData, which is a collectively
tities can be important in real life applications. In this edited knowledge graph of relations between Wikipedia
work, we propose sequence formatting that incorporates
non-relation entities into the relation triplets. We com- 1This is true by the time when REBEL was published. Now Wikipedia
pare whether incorporating these non-relation entities has moved the content list to left hand side. The abstract now can
improves the performance of relation extraction. The be defined as the piece of text preceding any section titles
entries, and then extracted relations of these entities. But
the extracted relations from WikiData may not be
necessarily expressed in selected text. For example, in sentence
Donald Trump visited President of Canada, there is no
relation between Donald Trump and President although
this relation exists in WikiData. To alleviate this, the
author adapted an established Natural Language
Inference model. The NLI will assign a score which indicates
how likely the text can entail the relation triplets. The
sequence with a score below 75% will be filtered out.
      </p>
    </sec>
    <sec id="sec-5">
      <title>4. Methodology</title>
      <sec id="sec-5-1">
        <title>4.1. REBEL Model for relation extraction</title>
        <p>− prefix. In the original REBEL formatting,
ibuproWe use the REBEL model as our base model. In essence, fen would be missed out in the output sequence because
they formatted a triplet with special tokens. &lt;   &gt; it does not exist in any relation triplets.
token marks the start of a relation triplet, tokens E-REBEL can give us the power to extract non-relation
between &lt;   &gt; and &lt;  &gt; are the head entity entities and regular relations in one model. However
in the relation triplets, tokens between &lt;  &gt; and we also ask if the non-relation incorporation could also
&lt;  &gt; are the tail entity in the relation triplets, and enhance the performance of relation extraction and vice
tokens after &lt;  &gt; are what the relation is. If a head versa. We conducted a comparison between the REBEL
entity appears in more than one relation, then the second and E-REBEL models.
tail relation just adds on to the first relation triplets. For
example, in the following sentence The patient needs
to take paracetamol three times a day for a week. The 5. Experiment Design
output sequence will be</p>
        <p>&lt;   &gt;   &lt;  &gt; ℎ     &lt;
 &gt;  −    &lt;  &gt;      &lt;  &gt;
 −  .</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Entity-incorporated REBEL Model</title>
        <p>In this work, we proposed a novel way to incorporate
entities into relation triplets, named E-REBEL. The idea
is to treat entities as entity relations to themselves. For
example, the entity paracetamol as medication would be
treated as the following triplets &lt;   &gt;   &lt;
 &gt;   &lt;  &gt;  −  . To put it
in a sentence with other entities and relations, in the
following sentence The patient needs to take paracetamol
three times a day for a week and ibuprofen, the final
relation triplets would be</p>
        <p>&lt;   &gt;   &lt;  &gt; ℎ   &lt;
 &gt;  −    &lt;  &gt;      &lt;
 &gt;  −   &lt;  &gt;   &lt;
 &gt;  −  &lt;   &gt;    &lt;  &gt;
   &lt;  &gt;  − .</p>
        <p>In this way, all the entities will be included in the
output sequence, and they can be easily extracted using the</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.1. Dataset and Evaluation</title>
        <p>
          The data used in this project is from the 2018 n2c22 shared
task on adverse drug events and medication extraction
in electronic health records[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The data includes 505
discharge summaries from the MIMIC-III (Medical
Information Mart for Intensive Care-III) clinical care database3.
The task defined 9 drug related concepts and 8 drug
related relations. The list of concepts and relations, number
of samples for each concept and relation can be found
in Table 1. The challenge for this task is to distinguish
whether two entities form a “Reason-Drug” relation or
rather a “Drug-ADE” (Adverse Drug Event) relation. The
dataset is not balanced in either entity types, or in
relations.
        </p>
        <p>The evaluation metrics used in this experiment
include precision, recall and F1 score. They defined as
follows:
  
2∗∗
+
=</p>
        <p>+</p>
        <p>, 
2∗ 
= 2∗ + + 
=</p>
        <p>+ 
,  1
=
Where TP is the number of true positives, FP is
the number of false positives and FN is the number
of false negatives. We choose F1 score over accuracy</p>
        <sec id="sec-5-3-1">
          <title>2https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/ 3https://mimic.mit.edu/</title>
          <p>relations
Drug-Reason
Drug-Form
Drug-Strength
Drug-ADE
Drug-Dosage
Drug-Frequency
Drug-Route
Drug-Duration
Micro
Macro
precision</p>
        </sec>
      </sec>
      <sec id="sec-5-4">
        <title>5.2. REBEL Framework</title>
        <p>
          We adopted the pre-trained REBEL model as described
in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. We fine-tuned the model on our dataset. We
explored the following learning rates:
        </p>
        <p>Learning rate: 1e-5, 2.5e-5, 5e-5, 7.5e-5, 1e-4</p>
        <sec id="sec-5-4-1">
          <title>After this grid search, we found that the REBEL model</title>
          <p>with learning rate 7.5e-5, and E-REBEL model with
learning rate 2.5e-5. We use maximum sequence length of
256 for REBEL with batch size 8, and maximum sequence
length of 1024 for E-REBEL with batch size 2 because
E-REBEL sequences are two to three times longer than
the corresponding REBEL sequences. The batch size is
due to the GPU memory limit give the sequence length.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Preliminary Results</title>
      <p>The precision, recall and F1 score of end-to-end relation
extraction of REBEL, E-REBEL models are shown in Table
2 and 3 respectively. The precision, recall and F1 score of
entity extraction performance of E-REBEL are shown in
Table 4. The F1 scores of end-to-end relation extractions
of REBEL, E-REBEL and top five models on n2b2 leader
board are shown in tabel 5. The F1 scores include micro
and macro F1 scores and F1 scores of Drug-ADE and
Drug-Reasons. The F1 scores of concept extractions of
E-REBEL and top five models on n2b2 leader board are
shown in tabel 6. These five models are not necessarily
the same as end-to-end relation top five models. Micro
and macro F1 scores and F1 scores of ADE and Reasons
are shown. All rankings are based on micro-F1 score
across all relations or across all concepts.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Analysis</title>
      <p>From Table 2 and 5, REBEL model achieves relatively
good performance on end-to-end relation extraction. Its
micro and macro F1 scores are on a par with top
models on the leader board. Markedly, its performance on
Drug-Reason and Drug-ADE relations is far better when
compared to other models. REBEL model has a relatively
balanced scores on precision and recall.</p>
      <p>E-REBEL model has a decreased performance when
recall
compared to REBEL model. The drop is on all relations
but there are some big drops from the recalls of
DrugReason, Drug-ADE, Drug-Duration and Drug-Frequency.
This means the model is more conservative on
generating relation triplets. This shows that integrating entity
triplets may not help with the relation triplets
generations. A possible explanation is that the E-REBEL model
has to generate sequences that are much longer than the
REBEL model does, and within each output sequence,
entity parts are generally longer than relation parts. This
increases the dificulty of generating more and accurate
relation triplets. Additionally, we use REBEL pre-trained
model for fine-tuning which is not trained on medical
specific data, and it does not have entity incorporating.
Lastly, We need to test on more dataset to reach a more
convincing conclusion on whether the entity
incorporating decreases the relation extraction, especially including
dataset that has good amount of non-relation entities.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Future work</title>
      <p>This paper is a working progress. There are four aspects
that I am working on.</p>
      <p>Data The data used in this paper is limited. I plan to
use more medical data to have a better understanding of
E-REBEL model performance. especially for the data that
includes entities that are not always in some relations.</p>
      <p>Entity incorporating Method There are other
ways to incorporate entities into output sequence. I am
currently working on some possible methods and to
compare the performance of these methods.</p>
      <p>Entity incorporated Retraining In this work,
we only Incorporated entities in fine tuning stage.
The pre-trained REBEL model does not have entity
incorporation. This impairs the performance of E-REBEL
model and lead to unfair comparison between REBEL
and E-REBEL models.</p>
      <p>Medical knowledge integration REBEL model are
pre-trained on Wikipedia data, which is a collection of
general language information. I plan to create a medical
REBEL dataset for model to gain domain knowledge.</p>
      <p>There main real life applications coming out of this
work if it succeeds is a pre-trained entity incorporated
REBEL that can serve as a general framework for
downstream medical entity and relation extraction tasks such
as extracting information from endoscopy, pathology,
radiology reports. For example, this pretrained model
will be used for my PhD project which involved
extracting entities and relations from pathology and endoscopy
reports for Barrett’s oesophagus patients.
drug events and medication extraction in electronic
health records, Journal of the American Medical
Informatics Association 27 (2020) 3–12.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Miwa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <article-title>End-to-end relation extraction using lstms on sequences and tree structures</article-title>
          ,
          <source>arXiv preprint arXiv:1601.00770</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.-L. H.</given-names>
            <surname>Cabot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          , Rebel:
          <article-title>Relation extraction by end-to-end language generation</article-title>
          ,
          <source>in: Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2021</year>
          ,
          <year>2021</year>
          , pp.
          <fpage>2370</fpage>
          -
          <lpage>2381</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>T.-J. Fu</surname>
            ,
            <given-names>P.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          , W.-Y. Ma, Graphrel:
          <article-title>Modeling text as relational graphs for joint entity and relation extraction, in: Proceedings of the 57th annual meeting of the association for computational linguistics</article-title>
          ,
          <year>2019</year>
          , pp.
          <fpage>1409</fpage>
          -
          <lpage>1418</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Entity-relation extraction as multi-turn question answering</article-title>
          , arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>05529</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Nayak</surname>
          </string-name>
          , H. T. Ng,
          <article-title>Efective modeling of encoderdecoder architecture for joint entity and relation extraction</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>34</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>8528</fpage>
          -
          <lpage>8535</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghazvininejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , L. Zettlemoyer, Bart:
          <article-title>Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension</article-title>
          , arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>13461</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Van</given-names>
            <surname>Merriënboer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>On the properties of neural machine translation: Encoder-decoder approaches</article-title>
          ,
          <source>arXiv preprint arXiv:1409.1259</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nallapati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Sequence-tosequence rnns for text summarization (</article-title>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Henry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Buchan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Filannino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stubbs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Uzuner</surname>
          </string-name>
          ,
          <year>2018</year>
          <article-title>n2c2 shared task on adverse</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>