<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Comparative Study on Feature Selection in Relation Extraction from Electronic Health Records</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>r Alimov</string-name>
          <email>alimovailseyar@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kazan (Volga Region) Federal University</institution>
          ,
          <addr-line>Kazan</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>34</fpage>
      <lpage>45</lpage>
      <abstract>
        <p>In this paper, we focus on clinical relation extraction; namely, given a medical record with mentions of drugs and their attributes, we identify relations between these entities. We propose a machine learning model with a novel set of knowledge and context embedding features. We systematically investigate the impact of these features with popular distance and word-based features. Experiments are conducted on a benchmark dataset of clinical texts from the MADE 2018 shared task. We compare the developed feature-based model with BERT and several state-of-the-art models. The obtained results show that distance and word features are signi cantly bene cial to the classi er. The knowledgebased features increase classi cation results on particular types of relations only. The context embedding feature gives the highest increase in results among the other explored features. The classi er obtains stateof-the-art performance in clinical relation extraction with 92.6% of Fmeasure improving F-measure by 3.5% on the MADE corpus.</p>
      </abstract>
      <kwd-group>
        <kwd>relation extraction</kwd>
        <kwd>electronic health records</kwd>
        <kwd>natural language processing</kwd>
        <kwd>machine learning</kwd>
        <kwd>clinical data</kwd>
        <kwd>hand-crafted features</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Electronic health records (EHRs) contain rich information that can be applied
to different research purposes in the field of medicine such as adverse drug
reaction (ADR) detection, revealing unknown disease correlations, design and
execution of clinical trials for new drugs, clinical decision supports and
evidencebased medicine [
        <xref ref-type="bibr" rid="ref1 ref12 ref14 ref16 ref2 ref9">16, 1, 14, 9, 12, 2</xref>
        ]. Despite the enormous potential contained in
the clinical notes, there are a lot of technical challenges devoted to the
extraction of necessary information from EHRs [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. EHRs describing the treatment of
patients represents a massive volume of an underused text data source. Natural
language processing (NLP) can be a solution to provide fast, accurate, and
automated information extraction methods that can yield high cost and logistical
advantages.
      </p>
      <p>The relation extraction, which identifies important links between entities is
one of the crucial steps of natural language processing (NLP). In this paper,
we consider the relation extraction task as a binary classification. The classifier
takes as an input pre-annotated pairs of entities and have to identify the relation
between them. Let us consider the sentence: “The patient has received 4 cycles
of Ruxience plus Cyclophosphamide in the last day”. In this sentence the entities
Ruxience and 4 cycles are related to each other, while Cyclophosphamide and 4
cycles are not related.</p>
      <p>
        Considerable efforts have been devoted to relation extraction research in
biomedical domain, including MADE shared-task challenge [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], i2b2
competition [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] and BioCreative V chemical-disease relation extraction task [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]. The
aim of the MADE competition was unlocking ADR related information, which
can be further used by pharmacovigilance and drug safety surveillance. The
organizers provided EHRs texts annotated with medications and their relations
to corresponding attributes, indications, and adverse events. All participants of
competition developed system based on the machine learning approaches [
        <xref ref-type="bibr" rid="ref20 ref34 ref4 ref7">4, 7,
20, 34</xref>
        ]. The winning system obtained 86.8%, while other participants achieved
comparable results [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. However, for the real-world application of extracting
drug-related information, the results need to be further improved. Moreover, the
contribution of different feature types has not been extensively investigated yet.
      </p>
      <p>
        To fill this gap, we systematically evaluate four types of features on
drugrelated information extraction from EHRs: distance, word-based, knowledge, and
embedding. In addition to popular features, we propose novel features: (i)
number of sentences and punctuation characters between entities, (ii) the previous
co-occurrence of entities in biomedical documents from different sources, (iii)
semantic types from Medical Subject Headings (MeSH), and (iv) context
embedding feature obtained with sent2vec model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We apply a random forest model
and perform experiments on the MADE corpus. For comparison, we evaluate
a classifier based on Bidirectional Encoder Representations from Transformers
(BERT) and approaches of teams participated in the MADE shared task.
      </p>
      <p>
        The classifier with a combination of baseline and context embedding feature
obtains the best results of 92.6% of F-measure and outperforms the previous
state-of-the-art results [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] on 3.5%. BERT achieves 90.5% of F-measure. The
obtained results show that distance and word features are significantly beneficial
to the machine learning classifier. The knowledge features can increase results
only on particular types of relations. We also found out that the context
embedding feature gives the highest increase in results among the other explored
features.
      </p>
      <p>The rest of the paper is structured as follows. We discuss related work in
Section 2. Section 3 devoted to corpus description. We describe our set of features
in Section 4. Section 5 provides experimental evaluation and discussion. Section 6
concludes this paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The first attempts to relation extraction from EHRs were made in 2010. One of
the challenges of i2b2 competition was devoted to assigning relation types that
hold between medical problems, tests, and treatments in clinical health records
[
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. This challenge aimed to classify relations of pairs of given reference
standard concepts from a sentence. The system based on maximum entropy with
a set of features from [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], semantic features from Medline abstracts and
parsing trees feature performed the best results among challenge participants [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
The described system obtained 73.7% of F-measure. The model developed by
the team from NRC Canada achieved 73.1% of F-measure [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This model is
also based on the maximum entropy classification algorithm with the following
set of features: based on parsing trees, Pointwise Mutual Information between
two entities calculated on Medline abstracts, word surface, concept mapping and
context, section, sentence, document-level features. Besides, category balancing
and semi-supervised training were applied. The third-place system is based on a
hybrid approach that combines machine-learning techniques and constructed
linguistic patterns matching [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The authors trained SVM with three types of
features: surface, lexical, and syntactic. The system obtained 70.9% of F-measure.
The rest of the participants applied supervised machine-learning approaches and
achieved the results varying from 70.2% to 65.6% of F-measure [
        <xref ref-type="bibr" rid="ref11 ref17 ref24 ref28 ref8">24, 17, 11, 28, 8</xref>
        ].
One of the main problems faced by participants was varying number of examples
for each relation types. The developed classifiers could capture the larger classes
accurately by using basic textual features. However, to recognize less relevant
relation types, hand-built rules have to be developed.
      </p>
      <p>
        Natural Language Processing Challenge for Extracting Medication,
Indication, and Adverse Drug Events from Electronic Health Record Notes was
organized in 2018 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The aim of the competition was extracting ADRs and
detecting relations between drugs, their attributes, and diseases. In contrast to
i2b2 competition, in this case, only entities are annotated in the corpus. Thus,
it is necessary to make candidate pairs and then determine if there is a relation
between them. The first place obtained system based on a random forest model
with following a set of features, including, candidate entity types and forms,
the number of entities between and their types, tokens and part of speech tags
between and neighboring the candidate entities [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. According to the
competition resulting table, the described system obtained 86.8% of micro-averaged F1.
Dandala et al. applied the combination of Bidirectional LSTM and attention
network and achieved the second place results with 84% of micro-averaged F1
score [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The third place was taken by the system based on the support vector
machine model [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. The classifiers use four types of features: position, distance,
a bag of words, and a bag of entities and obtained 83.1% of micro-averaged F1
measure. Magge et al. employed random forest with entity types, number of the
word in entities, number of words between entities, averaged word embeddings
of each entity and indicator of presence in the same sentence as a feature [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
This approach obtained 81.6% of micro-averaged F1. As can be seen, the most
participant teams applied machine learning models, and the only one utilized
neural networks while the results were on par.
      </p>
      <p>
        Munkhdalai et al. conducted additional experiments on MADE corpus and
explored three supervised machine learning systems for relation identification:
(1) a support vector machines (SVM) model, (2) an end-to-end deep neural
network system, and (3) a supervised descriptive rule induction baseline system [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
For the SVM system entity types, a number of clinical entities, tokens between
entities, n-grams between two entities and of surrounding tokens, character
ngrams of named entities were applied as features. The combination of BiLSTM
and attention was utilized as a neural network model. The maximum averaged
F-measure of 89.1% was obtained by the SVM based approach, while the neural
network achieved only 65.72% of F-measure.
      </p>
      <p>
        According to the reviewed studies, the machine learning approaches have
a high potential for clinical relation extraction task. However, for real-world
biomedical applications, the results need to be improved [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The error analysis
of systems shows that the most common errors: (i) related entities more than
two sentences away from each other, (ii), not related entities occur together in
a small distance marks as related (iii) there is more than one entity related
to the same entity and only the closest relation is detected. We suppose that
these errors can be eliminated if the context is taken into account. Also, most
of the previously proposed studies devoted to relation extraction from EHRs
largely ignore valuable supportive information, such as the context and
knowledge sources. Therefore, the machine learning approach proposed in this paper
can be viewed as an extension of the previous work on extracting relations from
clinical notes.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Corpus</title>
      <p>
        We evaluated our model on the MADE competition corpus [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. MADE corpus
consists of de-identified electronic health records (EHRs) from 21 cancer
patients. The EHRs include discharge summaries, consultation reports, and other
clinic notes. The overall number of records is 1089, where 876 records were
selected for the training split, and the remaining 213 notes formed the testing split.
Several annotators participated in the annotation process, including physicians,
biologists, linguists, and biomedical database curators. Each document was
annotated with two annotators, one of which carried out the initial annotation, the
second reviewed the annotations and modified them to produce the final version.
      </p>
      <p>
        Each record annotated with the following types of entities: drug, adverse drug
reaction (ADR), indication, dose, frequency, duration, route, severity, and SSLIF
(other signs/symptoms/illnesses). There are 7 types of relations: drug−ade
(adverse), sslif−severity (severity), drug−route (route), drug−dosage (do), drug−
duration (du), drug−frequency (fr), drug−indication (reason). The detailed
statistic of annotated relations is presented in Table 1. According to statistics, the
most common relation types are drug-dose, drug-indication, and frequency. Two
types of relationships (reason and adverse) have the maximum distance between
entities more than 900 characters, which complicates the identification of
relations between them.
We have divided features into four categories: distance, word, embedding, and
knowledge. Distance features are based on counting different metrics between
entities. Word features were derived using various properties of context and
entity words. Embedding features were received from word embedding models
pre-trained on a large number of biomedical texts. Knowledge features were
obtained from biomedical resources. The description of each type of feature set
out below.
PubMed, PMC abstracts [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], and BioWordVec created using PubMed
and the clinical notes from MIMIC-III Clinical Database [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. For entities
represented by several words the averaged vector value was applied;
– context embedding (cont emb): vector obtained from pre-trained BioSentVec
model for words between two entities [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. BioSentVec was obtained using
sent2vec library and consists of 700-dimensional sentence embeddings;
– similarity (sim): similarity measure between entities embedding vectors.
      </p>
      <p>
        Four types of similarity measures were employed: taxicab, Euclidean,
cosine, coordinate. The vectors were obtained from BioWordVec model
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
4. Knowledge features:
– UMLS concept types (umls): UMLS1 (Unified Medical Language System)
semantic types of entities represented with binary vector;
– MeSH concept types (mesh): MeSH2 (Medical Subject Headings)
categories of entities represented with a binary vector;
– fda clinical trials occurrence (fda): the number of co-occurrence of both
entities in approval document received from FDA3 for each drug of
dataset;
– biomedical texts co-occurrence (bio texts): the number of entities
cooccurrence in biomedical texts. The detailed description of this feature
is provided below.
      </p>
      <p>
        Prior knowledge retrieved from available sources is essential for today’s health
specialists to keep up with and incorporate new health information into their
practices [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. This process of retrieving relevant information is usually carried
out by querying and checking medical articles. We propose a set of features based
on primary sources of information to analyze the influence of this process on
clinical decision making. In particular, we utilize statistics from various resources
using Pharmacognitive4. This system provides access to databases of grants,
publications, patents, clinical trials, and others.
      </p>
      <p>For our experiments, we focus on three sources: (i) scientific abstracts from
MEDLINE, (ii) USPTO patents, and (iii) projects from the grant-making
Agencies of USA, Canada, EU, and Australia. The Pharmacognitive system allows
retrieving statistics such as the number of documents or overall funding per year
matching a query. The queries are generated using terms from entities of three
types: Medication, Indication, and ADR. We extend all queries with terms’
synonyms provided by the Pharmacognitive tools. We consider the following features
for a individual query Medication, Condition, ADR:
– the number of publications/patents/projects published in the particular year
(3 features for each year from 1952 to 2018);
1 https://www.nlm.nih.gov/research/umls/
2 https://www.nlm.nih.gov/mesh/meshhome.html
3 https://www.fda.gov/
4 https://pharmacognitive.com
– the number of publications/patents/projects published before the particular
year (3 features for each year from 1953 to 2018);
– the total number of publications/patents/projects published for all time (3
features);
– the average and sum of projects’ funding published in the particular year (2
features for each year from 1974 to 2018);
– the average and sum of projects’ funding published before the particular year
(3 features for each year from 1975 to 2018);
– the average and sum of projects’ funding published for all time (2 features).
We also generate features based on statistics of publications and projects for joint
queries of two terms: Drug and a disease-related entity (ADR or Indication).
5</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>In this section, we describe our classifier model, entity pair generation,
experiments, and results.
5.1</p>
      <sec id="sec-4-1">
        <title>Classifier</title>
        <p>
          We build a system to resolve the task as a set of independent Random Forest
classifiers, one for each relation type. The Random Forest model was
implemented with the Scikit-learn library [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. We tuned the parameters on 5-fold
cross-validation and set the number of estimators equal to 100 and the weight
balance: 0.7 for positive and 0.3 for negative classes to mitigate the imbalanced
class issues.
5.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Bidirectional Encoder Representations from Transformers (BERT)</title>
        <p>
          BERT (Bidirectional Encoder Representations from Transformers) is a recent
neural network model for NLP presented by Google [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The model obtained
state-of-the-art results in various NLP tasks, including question answering,
dialog systems, text classification and sentiment analysis [
          <xref ref-type="bibr" rid="ref18 ref29 ref30 ref35 ref5">18, 35, 30, 5, 29</xref>
          ]. BERT
neural network based on bidirectional attention-based transformer architecture
[
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. One of the main model advantages is the ability to give it a row text as the
input. In our experiments, we utilized the entity texts combined with a context
between them as an input.
5.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Entities Pair Generation</title>
        <p>
          For each entity we obtained a set of candidate entities following the rules from
[
          <xref ref-type="bibr" rid="ref34">34</xref>
          ]: the number of characters between the entities is smaller than 1000, and the
number of other entities that may participate in relations and locate between
the candidate entities is not more than 3. These restrictions allow to reduce
infrequent negative pairs and mitigate the imbalanced class issues, while more
than 97% of the positive pairs remain in the dataset.
We utilize the model with distance and word features as a baseline. In
addition, we compare our results with two state-of-the-art approaches: proposed by
Munkhdalai et al. [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] and by Li et al. [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Munkhdalai et al. applied SVM
with following features: (i) token distance between the 2 entities, (ii) number of
clinical entities between the 2 entities, (iii) n-grams between the 2 entities, (iv)
n-grams of surrounding tokens of the 2 entities, (v) one-hot encoding of the left
and right entities types, (vi) character n-grams of the named entities. Li et al.
utilized modern capsule networks.
        </p>
        <p>For distance and word features evaluation, we removed each of the features
individually and in combination. To determine the most significant features from
embedding and knowledge features sets, we add each of the features separately
to the baseline model. The F1-measure for each relation type and micro-averaged
over all classes F1 were used as evaluation metrics. The evaluation scripts
provided by competition organizers were applied to compute these values. The
results for each relation type and micro-averaged F-measure are shown in Table 2.</p>
        <p>The combination of baseline selected features achieved 86.8% of micro
Fmeasure. This result stays in pair with the best 86.84% F-measure achieved
in the competition. The combination of baseline and context embedding
features obtained the best results of 92.6% of micro-averaged F-measure. Thus our
model outperformed the Munkhdalai et al. results on 3.5%, Li et al. approach on
5.4% and baseline approach on 6%. All reported improvements of the baseline
model with context embedding feature over baseline and both state-of-the-art
approaches are statistically significant with p-value &lt; 0.01 based on the paired
sample t-test. Further, we provided a more detailed analysis of the presented
results.</p>
        <p>According to Table 2, the classifier with distance features achieves 76.6%
of micro-averaged F-measure. Different types of distance features seemed to be
complementary to each other due to the absence of one of them leads to
approximately the same loss of results. The baseline model without distance set of
feature (see row ‘word’ in Table 2) decrease results on 19% of micro F-measure,
which evidences the importance of these parameters for relation classification.</p>
        <p>The word-based features also improved the performance of the relation
extraction system. The most significant improvement of micro F-measure obtained
with a bag of words feature (+3.8 %), which can be explained by a larger
vector size compared to the rest of the word-based features. The entity type and
a bag of entities feature increased the results of the baseline on 2.7% and 2.1%
respectively (see rows ‘baseline-type’ and ‘baseline-boe’).</p>
        <p>The results for embedding features show that entity embeddings and
similarity feature decrease the results regardless of a word embedding model used. The
context embedding feature achieved the most considerable improvement of
baseline results and obtained 92.6% of micro F-measure. Moreover, the model trained
only with the sent2vec feature, outperformed the baseline by 1.8%. This result
leads to the conclusion that the context between candidate entities contains more
useful information to make a conclusion about relations than candidate entities.</p>
        <p>To evaluate knowledge features, it is better to consider the results for
different relation types separately. The supplement of UMLS based feature to baseline
model increased the results of baseline for severity, reason, and adverse relation
types on 0.3%, 0.9% and 0.5% of F-measure respectively. The model with a
combination of baseline and MeSH semantic types feature increased the results of
baseline for severity and reason types on 0.5% and 0.6% of F-measure,
respectively. The FDA co-occurrence feature increased the results for frequency type
on 1.3%, while for the rest of the types results are in par. The number of
cooccurrence in the biomedical texts feature improved the classifier performance
for adverse relation type on 2%. Thus, the knowledge features improved model
results for selected types of features.</p>
        <p>BERT model achieved the best results for the severity, route, dose, duration,
and frequency types of relation. However, for a reason and adverse types, this
model obtained F-measure approximately lower on 10% than a random forest
with baseline and context embedding features. Thus, BERT gained 90.5% of
micro F-measure, and this is the second result among all evaluated models. We
suppose that the results reducing for adverse and reason types can be caused
for two reasons: (i) the same disease in different cases could be an adverse drug
reaction and a reason, (ii) the average length of the context for these relation
types is too long to catch the relation between entities.</p>
        <p>A comparison of results for different types of relation shows that the best
result was achieved for route (97.6%). This result roughly stays on par with the
best results for severity, reason, dose, duration, and frequency types, while the
best results for adverse type lower on 10.7%. This difference in results could be
due to the greater lexicon variety of adverse drug reaction entity type.</p>
        <p>To sum up this section, three important conclusions can be drawn. First, the
distance and word-based features are beneficial for the relation classifier.
Second, the context embedding has more impact on entities relations than entities
embeddings. Finally, the prior knowledge improves the results on particular
relation types and the most improvement achieved on adverse relation type with
biomedical text co-occurrence feature.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this study, we have investigated the different types of features for drug-related
information extraction tasks from EHRs. Our evaluation on MADE competition
corpus shows that context embedding, distance, and word features bring the
most beneficial to relation extraction task. The classifier with a combination of
these sets of features outperformed state-of-the-art results. These facts lead to
the conclusion that the context between entities plays a crucial role in relation
detection. The detailed analysis of results showed that prior knowledge about
entities co-occurrence improved the results for adverse relation type. Our future
research will focus on the investigation of modern neural networks for relation
extraction from EHRs. We also plan to analyze various context representation
methods and extend experiments on other biomedical corpora.</p>
      <sec id="sec-5-1">
        <title>Acknowledgments</title>
        <p>This research was supported by the Russian Foundation for Basic Research grant
no. 19-07-01115.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bates</surname>
            ,
            <given-names>D.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cullen</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laird</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petersen</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Small</surname>
            ,
            <given-names>S.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Servi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , La el, G.,
          <string-name>
            <surname>Sweitzer</surname>
            ,
            <given-names>B.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shea</surname>
            ,
            <given-names>B.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hallisey</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , et al.:
          <article-title>Incidence of adverse drug events and potential adverse drug events: implications for prevention</article-title>
          .
          <source>Jama</source>
          <volume>274</volume>
          (
          <issue>1</issue>
          ),
          <volume>29</volume>
          {
          <fpage>34</fpage>
          (
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Batin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turchin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sergey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhila</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denkenberger</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Arti cial intelligence in life extension: From deep learning to superintelligence</article-title>
          .
          <source>Informatica</source>
          <volume>41</volume>
          (
          <issue>4</issue>
          ) (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. de Bruijn,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Cherry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Kiritchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            , and
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          :
          <article-title>Nrc at i2b2: one challenge, three practical tasks, nine statistical systems, hundreds of clinical records, millions of useful features</article-title>
          .
          <source>In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data</source>
          . Boston, MA, USA: i2b2 (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peterson</surname>
            ,
            <given-names>K.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alba</surname>
            ,
            <given-names>P.R.</given-names>
          </string-name>
          , DuVall,
          <string-name>
            <given-names>S.L.</given-names>
            , and
            <surname>Patterson</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.V.</surname>
          </string-name>
          :
          <article-title>Detecting adverse drug events with rapidly trained classi cation models</article-title>
          .
          <source>Drug safety</source>
          ,
          <volume>1</volume>
          {
          <fpage>10</fpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhuo</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Bert for joint intent classi cation and slot lling</article-title>
          . arXiv preprint arXiv:
          <year>1902</year>
          .
          <volume>10909</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Biosentvec: creating sentence embeddings for biomedical texts</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>09302</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dandala</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joopudi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Devarakonda</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Adverse drug events detection in clinical notes by jointly modeling entities and relations using neural networks</article-title>
          .
          <source>Drug safety</source>
          ,
          <volume>1</volume>
          {
          <fpage>12</fpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Apostolova</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Islamaj</surname>
            <given-names>Dogan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          , et al.:
          <article-title>Nlms system description for the fourth i2b2/va challenge</article-title>
          .
          <source>In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data</source>
          . Boston, MA, USA: i2b2 (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>W.W.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>What can natural language processing do for clinical decision support?</article-title>
          <source>Journal of Biomedical Informatics</source>
          <volume>42</volume>
          (
          <issue>5</issue>
          ),
          <volume>760</volume>
          {
          <fpage>772</fpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Divita</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Treitler</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , et al.:
          <article-title>Salt lake city vas challenge submissions</article-title>
          .
          <source>In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data</source>
          . Boston, MA, USA: i2b2 (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Frankovich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Longhurst</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutherland</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          :
          <article-title>Evidence-based medicine in the emr era</article-title>
          .
          <source>N Engl J Med</source>
          <volume>365</volume>
          (
          <issue>19</issue>
          ),
          <volume>1758</volume>
          {
          <fpage>1759</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Grouin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abacha</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernhard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cartoni</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deleger</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ligozat</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minard</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosset</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Caramba: concept, assertion, and relation annotation using machine-learning based approaches</article-title>
          .
          <source>In: i2b2 Medication Extraction Challenge Workshop</source>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Gurwitz</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Field</surname>
            ,
            <given-names>T.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harrold</surname>
            ,
            <given-names>L.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rothschild</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debellis</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seger</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cadoret</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fish</surname>
            ,
            <given-names>L.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garber</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelleher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>Incidence and preventability of adverse drug events among older persons in the ambulatory setting</article-title>
          .
          <source>Jama</source>
          <volume>289</volume>
          (
          <issue>9</issue>
          ),
          <volume>1107</volume>
          {
          <fpage>1116</fpage>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Jagannatha</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Yu</surname>
          </string-name>
          , H.:
          <article-title>Overview of the rst natural language processing challenge for extracting medication, indication, and adverse drug events from electronic health record notes (made 1.0)</article-title>
          .
          <source>Drug safety</source>
          ,
          <volume>1</volume>
          {
          <fpage>13</fpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>P.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>L.J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Brunak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Mining electronic health records: towards better research applications and clinical care</article-title>
          .
          <source>Nature Reviews Genetics</source>
          <volume>13</volume>
          (
          <issue>6</issue>
          ),
          <volume>395</volume>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Jonnalagadda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gonzalez</surname>
          </string-name>
          , G.:
          <article-title>Can distributional statistics aid clinical concept extraction</article-title>
          .
          <source>In: Proceedings of the 2010 i2b2/VA workshop on challenges in natural language processing for clinical data</source>
          . Boston, MA, USA: i2b2 (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahoo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , N.:
          <article-title>End-to-end multimodal dialog systems with hierarchical multimodal attention on video features</article-title>
          .
          <source>In: DSTC7 at AAAI2019 Workshop</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>An investigation of single-domain and multidomain medication and adverse drug event relation extraction from electronic health record notes using advanced deep learning models</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>26</volume>
          (
          <issue>7</issue>
          ),
          <volume>646</volume>
          {
          <fpage>654</fpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Magge</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scotch</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Gonzalez-Hernandez</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Clinical ner and relation extraction using bi-char-lstms and random forest classi ers</article-title>
          .
          <source>In: International Workshop on Medication and Adverse Drug Event Detection</source>
          ,
          <volume>25</volume>
          {
          <fpage>30</fpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Moen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ananiadou</surname>
            ,
            <given-names>T.S.S.:</given-names>
          </string-name>
          <article-title>Distributional semantics resources for biomedical text processing</article-title>
          .
          <source>Proceedings of LBM</source>
          ,
          <volume>39</volume>
          {
          <fpage>44</fpage>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Munkhdalai</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>H.:</given-names>
          </string-name>
          <article-title>Clinical relation extraction toward drug safety surveillance using electronic health record narratives: classical learning versus deep learning</article-title>
          .
          <source>JMIR public health and surveillance</source>
          <volume>4</volume>
          (
          <issue>2</issue>
          ), (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Pao</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grefsheim</surname>
            ,
            <given-names>S.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barclay</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woolliscroft</surname>
            ,
            <given-names>J.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McQuillan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Shipman</surname>
            ,
            <given-names>B.L.</given-names>
          </string-name>
          :
          <article-title>Factors a ecting students' use of medline</article-title>
          .
          <source>Computers and Biomedical Research</source>
          <volume>26</volume>
          (
          <issue>6</issue>
          ),
          <volume>541</volume>
          {
          <fpage>555</fpage>
          (
          <year>1993</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Patrick</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>I2b2 challenges in clinical natural language processing 2010</article-title>
          .
          <source>In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data</source>
          . Boston, MA, USA: i2b2 (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Patrick</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A cascade approach to extracting medication events</article-title>
          .
          <source>In: Proceedings of the Australasian Language Technology Association Workshop</source>
          <year>2009</year>
          ,
          <volume>99</volume>
          {
          <fpage>103</fpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <volume>2825</volume>
          {
          <fpage>2830</fpage>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rink</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Harabagiu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Extraction of medical concepts, assertions, and relations from discharge summaries for the fourth i2b2/va shared task</article-title>
          .
          <source>In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data</source>
          . Boston, MA, USA: i2b2 (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Solt</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szidarovszky</surname>
            ,
            <given-names>F.P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tikk</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Concept, assertion and relation extraction at the 2010 i2b2 relation extraction challenge using parsing information and dictionaries</article-title>
          .
          <source>Proc. of i2b2/VA Shared-Task</source>
          . Washington, DC (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Qiu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Utilizing bert for aspect-based sentiment analysis via constructing auxiliary sentence</article-title>
          . arXiv preprint arXiv:
          <year>1903</year>
          .
          <volume>09588</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Uglow</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zlocha</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Zmyslony</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Semeval 2019 task 6: An exploration of state-of-the-art methods for o ensive language detection</article-title>
          . arXiv preprint arXiv:
          <year>1903</year>
          .
          <volume>07445</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Uzuner</surname>
            ,
            <given-names>O</given-names>
          </string-name>
          .,
          <string-name>
            <surname>South</surname>
            ,
            <given-names>B.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and DuVall, S.L.:
          <year>2010</year>
          i2b2/
          <article-title>va challenge on concepts, assertions, and relations in clinical text</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>18</volume>
          (
          <issue>5</issue>
          ),
          <volume>552</volume>
          {
          <fpage>556</fpage>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Polosukhin</surname>
          </string-name>
          , I.:
          <article-title>Attention is all you need</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          ,
          <volume>5998</volume>
          {
          <fpage>6008</fpage>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leaman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mattingly</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegers</surname>
            ,
            <given-names>T.C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Assessing the state of the art in biomedical relation extraction: overview of the biocreative v chemical-disease relation (cdr) task</article-title>
          .
          <source>Database</source>
          <year>2016</year>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yadav</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Uarizona at the made1. 0 nlp challenge</article-title>
          .
          <source>Proceedings of machine learning research 90</source>
          ,
          <volume>57</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Sdnet: Contextualized attention-based deep network for conversational question answering</article-title>
          . arXiv preprint arXiv:
          <year>1812</year>
          .
          <volume>03593</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>