=Paper= {{Paper |id=Vol-2523/paper06 |storemode=property |title= A Comparative Study on Feature Selection in Relation Extraction from Electronic Health Records |pdfUrl=https://ceur-ws.org/Vol-2523/paper06.pdf |volume=Vol-2523 |authors=Ilseyar Alimova,Elena Tutubalina |dblpUrl=https://dblp.org/rec/conf/rcdl/AlimovaT19 }} == A Comparative Study on Feature Selection in Relation Extraction from Electronic Health Records == https://ceur-ws.org/Vol-2523/paper06.pdf

A Comparative Study on Feature Selection
in Relation Extraction from Electronic Health
Records

Ilseyar Alimova[0000−0003−4528−6631] and Elena Tutubalina[0000−0001−7936−0284]

Kazan (Volga Region) Federal University, Kazan, Russia
{alimovailseyar,evtutubalina}@gmail.com

Abstract. In this paper, we focus on clinical relation extraction; namely,
given a medical record with mentions of drugs and their attributes, we
identify relations between these entities. We propose a machine learning
model with a novel set of knowledge and context embedding features.
We systematically investigate the impact of these features with popu-
lar distance and word-based features. Experiments are conducted on a
benchmark dataset of clinical texts from the MADE 2018 shared task.
We compare the developed feature-based model with BERT and sev-
eral state-of-the-art models. The obtained results show that distance and
word features are significantly beneficial to the classifier. The knowledge-
based features increase classification results on particular types of rela-
tions only. The context embedding feature gives the highest increase in
results among the other explored features. The classifier obtains state-
of-the-art performance in clinical relation extraction with 92.6% of F-
measure improving F-measure by 3.5% on the MADE corpus.

Keywords: relation extraction, electronic health records, natural lan-
guage processing, machine learning, clinical data, hand-crafted features

1 Introduction
Electronic health records (EHRs) contain rich information that can be applied
to diﬀerent research purposes in the ﬁeld of medicine such as adverse drug re-
action (ADR) detection, revealing unknown disease correlations, design and ex-
ecution of clinical trials for new drugs, clinical decision supports and evidence-
based medicine [16, 1, 14, 9, 12, 2]. Despite the enormous potential contained in
the clinical notes, there are a lot of technical challenges devoted to the extrac-
tion of necessary information from EHRs [16]. EHRs describing the treatment of
patients represents a massive volume of an underused text data source. Natural
language processing (NLP) can be a solution to provide fast, accurate, and au-
tomated information extraction methods that can yield high cost and logistical
advantages.
The relation extraction, which identiﬁes important links between entities is
one of the crucial steps of natural language processing (NLP). In this paper,
we consider the relation extraction task as a binary classiﬁcation. The classiﬁer

Copyright © 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).

34
takes as an input pre-annotated pairs of entities and have to identify the relation
between them. Let us consider the sentence: “The patient has received 4 cycles
of Ruxience plus Cyclophosphamide in the last day”. In this sentence the entities
Ruxience and 4 cycles are related to each other, while Cyclophosphamide and 4
cycles are not related.
Considerable eﬀorts have been devoted to relation extraction research in
biomedical domain, including MADE shared-task challenge [15], i2b2 competi-
tion [31] and BioCreative V chemical-disease relation extraction task [33]. The
aim of the MADE competition was unlocking ADR related information, which
can be further used by pharmacovigilance and drug safety surveillance. The or-
ganizers provided EHRs texts annotated with medications and their relations
to corresponding attributes, indications, and adverse events. All participants of
competition developed system based on the machine learning approaches [4, 7,
20, 34]. The winning system obtained 86.8%, while other participants achieved
comparable results [15]. However, for the real-world application of extracting
drug-related information, the results need to be further improved. Moreover, the
contribution of diﬀerent feature types has not been extensively investigated yet.
To ﬁll this gap, we systematically evaluate four types of features on drug-
related information extraction from EHRs: distance, word-based, knowledge, and
embedding. In addition to popular features, we propose novel features: (i) num-
ber of sentences and punctuation characters between entities, (ii) the previous
co-occurrence of entities in biomedical documents from diﬀerent sources, (iii) se-
mantic types from Medical Subject Headings (MeSH), and (iv) context embed-
ding feature obtained with sent2vec model [6]. We apply a random forest model
and perform experiments on the MADE corpus. For comparison, we evaluate
a classiﬁer based on Bidirectional Encoder Representations from Transformers
(BERT) and approaches of teams participated in the MADE shared task.
The classiﬁer with a combination of baseline and context embedding feature
obtains the best results of 92.6% of F-measure and outperforms the previous
state-of-the-art results [22] on 3.5%. BERT achieves 90.5% of F-measure. The
obtained results show that distance and word features are signiﬁcantly beneﬁcial
to the machine learning classiﬁer. The knowledge features can increase results
only on particular types of relations. We also found out that the context em-
bedding feature gives the highest increase in results among the other explored
features.
The rest of the paper is structured as follows. We discuss related work in
Section 2. Section 3 devoted to corpus description. We describe our set of features
in Section 4. Section 5 provides experimental evaluation and discussion. Section 6
concludes this paper.

2 Related Work

The ﬁrst attempts to relation extraction from EHRs were made in 2010. One of
the challenges of i2b2 competition was devoted to assigning relation types that
hold between medical problems, tests, and treatments in clinical health records

35
[31]. This challenge aimed to classify relations of pairs of given reference stan-
dard concepts from a sentence. The system based on maximum entropy with
a set of features from [25], semantic features from Medline abstracts and pars-
ing trees feature performed the best results among challenge participants [27].
The described system obtained 73.7% of F-measure. The model developed by
the team from NRC Canada achieved 73.1% of F-measure [3]. This model is
also based on the maximum entropy classiﬁcation algorithm with the following
set of features: based on parsing trees, Pointwise Mutual Information between
two entities calculated on Medline abstracts, word surface, concept mapping and
context, section, sentence, document-level features. Besides, category balancing
and semi-supervised training were applied. The third-place system is based on a
hybrid approach that combines machine-learning techniques and constructed lin-
guistic patterns matching [13]. The authors trained SVM with three types of fea-
tures: surface, lexical, and syntactic. The system obtained 70.9% of F-measure.
The rest of the participants applied supervised machine-learning approaches and
achieved the results varying from 70.2% to 65.6% of F-measure [24, 17, 11, 28, 8].
One of the main problems faced by participants was varying number of examples
for each relation types. The developed classiﬁers could capture the larger classes
accurately by using basic textual features. However, to recognize less relevant
relation types, hand-built rules have to be developed.
Natural Language Processing Challenge for Extracting Medication, Indica-
tion, and Adverse Drug Events from Electronic Health Record Notes was or-
ganized in 2018 [15]. The aim of the competition was extracting ADRs and
detecting relations between drugs, their attributes, and diseases. In contrast to
i2b2 competition, in this case, only entities are annotated in the corpus. Thus,
it is necessary to make candidate pairs and then determine if there is a relation
between them. The ﬁrst place obtained system based on a random forest model
with following a set of features, including, candidate entity types and forms,
the number of entities between and their types, tokens and part of speech tags
between and neighboring the candidate entities [4]. According to the competi-
tion resulting table, the described system obtained 86.8% of micro-averaged F1 .
Dandala et al. applied the combination of Bidirectional LSTM and attention
network and achieved the second place results with 84% of micro-averaged F1
score [7]. The third place was taken by the system based on the support vector
machine model [34]. The classiﬁers use four types of features: position, distance,
a bag of words, and a bag of entities and obtained 83.1% of micro-averaged F1
measure. Magge et al. employed random forest with entity types, number of the
word in entities, number of words between entities, averaged word embeddings
of each entity and indicator of presence in the same sentence as a feature [20].
This approach obtained 81.6% of micro-averaged F1 . As can be seen, the most
participant teams applied machine learning models, and the only one utilized
neural networks while the results were on par.
Munkhdalai et al. conducted additional experiments on MADE corpus and
explored three supervised machine learning systems for relation identiﬁcation:
(1) a support vector machines (SVM) model, (2) an end-to-end deep neural net-

36
work system, and (3) a supervised descriptive rule induction baseline system [22].
For the SVM system entity types, a number of clinical entities, tokens between
entities, n-grams between two entities and of surrounding tokens, character n-
grams of named entities were applied as features. The combination of BiLSTM
and attention was utilized as a neural network model. The maximum averaged
F-measure of 89.1% was obtained by the SVM based approach, while the neural
network achieved only 65.72% of F-measure.
According to the reviewed studies, the machine learning approaches have
a high potential for clinical relation extraction task. However, for real-world
biomedical applications, the results need to be improved [15]. The error analysis
of systems shows that the most common errors: (i) related entities more than
two sentences away from each other, (ii), not related entities occur together in
a small distance marks as related (iii) there is more than one entity related
to the same entity and only the closest relation is detected. We suppose that
these errors can be eliminated if the context is taken into account. Also, most
of the previously proposed studies devoted to relation extraction from EHRs
largely ignore valuable supportive information, such as the context and knowl-
edge sources. Therefore, the machine learning approach proposed in this paper
can be viewed as an extension of the previous work on extracting relations from
clinical notes.

3 Corpus

We evaluated our model on the MADE competition corpus [15]. MADE corpus
consists of de-identiﬁed electronic health records (EHRs) from 21 cancer pa-
tients. The EHRs include discharge summaries, consultation reports, and other
clinic notes. The overall number of records is 1089, where 876 records were se-
lected for the training split, and the remaining 213 notes formed the testing split.
Several annotators participated in the annotation process, including physicians,
biologists, linguists, and biomedical database curators. Each document was an-
notated with two annotators, one of which carried out the initial annotation, the
second reviewed the annotations and modiﬁed them to produce the ﬁnal version.
Each record annotated with the following types of entities: drug, adverse drug
reaction (ADR), indication, dose, frequency, duration, route, severity, and SSLIF
(other signs/symptoms/illnesses). There are 7 types of relations: drug−ade (ad-
verse), sslif−severity (severity), drug−route (route), drug−dosage (do), drug−
duration (du), drug−frequency (fr), drug−indication (reason). The detailed statis-
tic of annotated relations is presented in Table 1. According to statistics, the
most common relation types are drug-dose, drug-indication, and frequency. Two
types of relationships (reason and adverse) have the maximum distance between
entities more than 900 characters, which complicates the identiﬁcation of rela-
tions between them.

37
Table 1. The overall statistic for MADE corpus

Number Avg. distance Max. dist
Relation type train test all train test all train test all
do 5176 866 6042 8.4 7.7 8.3 215 143 215
reason 4523 870 5393 89.3 63.8 85.2 981 868 981
fr 4417 729 5146 17.7 18.6 17.8 201 178 201
severity type 3475 557 4032 2.6 1.8 2.5 259 188 259
adverse 1989 481 2470 59.4 45.6 56.7 937 718 937
manner/route 2550 455 3005 13.5 12.9 13.4 191 137 191
du 906 147 1053 18.5 15.0 18.0 272 121 272
all 23036 4109 27145 30.6 26.0 29.9 981 868 981

4 Features
We have divided features into four categories: distance, word, embedding, and
knowledge. Distance features are based on counting diﬀerent metrics between
entities. Word features were derived using various properties of context and
entity words. Embedding features were received from word embedding models
pre-trained on a large number of biomedical texts. Knowledge features were
obtained from biomedical resources. The description of each type of feature set
out below.

1. Distance features:
– word distance (word dist): the number of words between entities;
– char distance (char dist): the number of characters between entities;
– sentence distance (sent dist): the number of sentences between entities;
– punctuation (punc dist): the number of punctuation characters between
entities;
– position (position): the position of the entity candidate (drug or SSLIF
type entity) with respect to the attribute among the entire entity candi-
dates of the attribute, where the position of medical attribute is set to
0.
2. Word features:
– bag of words (bow): all words within a 10-word window before and after
the entities plus the entities text. We utilized as features only words
that appeared in such context windows with frequencies ≥500 across the
dataset. Thus, for each entity pair we generated 847 features;
– bag of entities (boe): the counts of all annotation types between the
entities;
– entity types (type): binary vector with the number of entities length and
units at positions of entity types.
3. Embedding features:
– entities embeddings (ent emb): the vectors obtained from pre-trained
word embedding models for each entity. We explored two word embed-
ding models, including trained on the concatenation of Wikipedia and

38
PubMed, PMC abstracts [21], and BioWordVec created using PubMed
and the clinical notes from MIMIC-III Clinical Database [6]. For entities
represented by several words the averaged vector value was applied;
– context embedding (cont emb): vector obtained from pre-trained BioSentVec
model for words between two entities [6]. BioSentVec was obtained using
sent2vec library and consists of 700-dimensional sentence embeddings;
– similarity (sim): similarity measure between entities embedding vectors.
Four types of similarity measures were employed: taxicab, Euclidean,
cosine, coordinate. The vectors were obtained from BioWordVec model
[6].
4. Knowledge features:
– UMLS concept types (umls): UMLS1 (Uniﬁed Medical Language System)
semantic types of entities represented with binary vector;
– MeSH concept types (mesh): MeSH2 (Medical Subject Headings) cate-
gories of entities represented with a binary vector;
– fda clinical trials occurrence (fda): the number of co-occurrence of both
entities in approval document received from FDA3 for each drug of
dataset;
– biomedical texts co-occurrence (bio texts): the number of entities co-
occurrence in biomedical texts. The detailed description of this feature
is provided below.

Prior knowledge retrieved from available sources is essential for today’s health
specialists to keep up with and incorporate new health information into their
practices [23]. This process of retrieving relevant information is usually carried
out by querying and checking medical articles. We propose a set of features based
on primary sources of information to analyze the inﬂuence of this process on
clinical decision making. In particular, we utilize statistics from various resources
using Pharmacognitive 4 . This system provides access to databases of grants,
publications, patents, clinical trials, and others.
For our experiments, we focus on three sources: (i) scientiﬁc abstracts from
MEDLINE, (ii) USPTO patents, and (iii) projects from the grant-making Agen-
cies of USA, Canada, EU, and Australia. The Pharmacognitive system allows
retrieving statistics such as the number of documents or overall funding per year
matching a query. The queries are generated using terms from entities of three
types: Medication, Indication, and ADR. We extend all queries with terms’ syn-
onyms provided by the Pharmacognitive tools. We consider the following features
for a individual query Medication, Condition, ADR:

– the number of publications/patents/projects published in the particular year
(3 features for each year from 1952 to 2018);
1
https://www.nlm.nih.gov/research/umls/
2
https://www.nlm.nih.gov/mesh/meshhome.html
3
https://www.fda.gov/
4
https://pharmacognitive.com

39
– the number of publications/patents/projects published before the particular
year (3 features for each year from 1953 to 2018);
– the total number of publications/patents/projects published for all time (3
features);
– the average and sum of projects’ funding published in the particular year (2
features for each year from 1974 to 2018);
– the average and sum of projects’ funding published before the particular year
(3 features for each year from 1975 to 2018);
– the average and sum of projects’ funding published for all time (2 features).
We also generate features based on statistics of publications and projects for joint
queries of two terms: Drug and a disease-related entity (ADR or Indication).

5 Experiments
In this section, we describe our classiﬁer model, entity pair generation, experi-
ments, and results.

5.1 Classiﬁer
We build a system to resolve the task as a set of independent Random Forest
classiﬁers, one for each relation type. The Random Forest model was imple-
mented with the Scikit-learn library [26]. We tuned the parameters on 5-fold
cross-validation and set the number of estimators equal to 100 and the weight
balance: 0.7 for positive and 0.3 for negative classes to mitigate the imbalanced
class issues.

5.2 Bidirectional Encoder Representations from Transformers
(BERT)
BERT (Bidirectional Encoder Representations from Transformers) is a recent
neural network model for NLP presented by Google [10]. The model obtained
state-of-the-art results in various NLP tasks, including question answering, di-
alog systems, text classiﬁcation and sentiment analysis [18, 35, 30, 5, 29]. BERT
neural network based on bidirectional attention-based transformer architecture
[32]. One of the main model advantages is the ability to give it a row text as the
input. In our experiments, we utilized the entity texts combined with a context
between them as an input.

5.3 Entities Pair Generation
For each entity we obtained a set of candidate entities following the rules from
[34]: the number of characters between the entities is smaller than 1000, and the
number of other entities that may participate in relations and locate between
the candidate entities is not more than 3. These restrictions allow to reduce
infrequent negative pairs and mitigate the imbalanced class issues, while more
than 97% of the positive pairs remain in the dataset.

40
Table 2. The results of F-measure for each relation type and averaged micro F-measure
of all relation types for MADE corpus. The distance and word features are applied as
a baseline.

Features severity route reason do du fr adverse all
baseline: distance & word feat-s .933 .918 .806 .906 .905 .896 .729 .866
Munkhdalai et al. [22] .950 .960 .750 .880 .910 .950 .850 .891
Li et al. [19] - - - - - - - .872
baseline-word dist .923 .922 .812 .900 .860 .909 .716 .864
baseline-char dist .929 .916 .810 .908 .869 .890 .731 .864
baseline-sent dist .933 .919 .807 .910 .880 .906 .719 .866
baseline-punc dist .926 .912 .798 .907 .836 .906 .735 .863
baseline-position .931 .917 .803 .897 .865 .883 .723 .858
distance .918 .843 .683 .859 .713 .780 .525 .766
baseline-boe .932 .897 .775 .888 .861 .868 .715 .845
baseline-bow .918 .906 .726 .895 .810 .843 .712 .828
baseline-type .934 .906 .779 .899 .891 .891 .562 .839
word .542 .777 .645 .662 .718 .846 .511 .672
baseline+emb pubmed pmc wiki .927 .898 .730 .887 .684 .900 .605 .827
baseline+emb bio .920 .903 .772 .893 .602 .908 .613 .833
baseline+cont emb .936 .954 .937 .929 .854 .938 .869 .926
cont emb .932 .935 .909 .915 .854 .835 .782 .884
baseline+sim .920 .908 .796 .905 .880 .902 .737 .862
baseline+umls .936 .915 .815 .922 .883 .891 .734 .870
baseline+mesh .938 .918 .812 .910 .856 .904 .730 .868
baseline+fda .936 .912 .808 .906 .895 .909 .730 .868
baseline+bio text .934 .918 .805 .906 .905 .896 .749 .866
baseline+knowledge .936 .914 .806 .916 .889 .896 .736 .848
BERT .951 .976 .845 .934 .946 .950 .767 .905

5.4 Experiments and Results

We utilize the model with distance and word features as a baseline. In addi-
tion, we compare our results with two state-of-the-art approaches: proposed by
Munkhdalai et al. [22] and by Li et al. [19]. Munkhdalai et al. applied SVM
with following features: (i) token distance between the 2 entities, (ii) number of
clinical entities between the 2 entities, (iii) n-grams between the 2 entities, (iv)
n-grams of surrounding tokens of the 2 entities, (v) one-hot encoding of the left
and right entities types, (vi) character n-grams of the named entities. Li et al.
utilized modern capsule networks.
For distance and word features evaluation, we removed each of the features
individually and in combination. To determine the most signiﬁcant features from
embedding and knowledge features sets, we add each of the features separately
to the baseline model. The F1 -measure for each relation type and micro-averaged
over all classes F1 were used as evaluation metrics. The evaluation scripts pro-
vided by competition organizers were applied to compute these values. The re-
sults for each relation type and micro-averaged F-measure are shown in Table 2.

41
The combination of baseline selected features achieved 86.8% of micro F-
measure. This result stays in pair with the best 86.84% F-measure achieved
in the competition. The combination of baseline and context embedding fea-
tures obtained the best results of 92.6% of micro-averaged F-measure. Thus our
model outperformed the Munkhdalai et al. results on 3.5%, Li et al. approach on
5.4% and baseline approach on 6%. All reported improvements of the baseline
model with context embedding feature over baseline and both state-of-the-art
approaches are statistically signiﬁcant with p-value < 0.01 based on the paired
sample t-test. Further, we provided a more detailed analysis of the presented
results.
According to Table 2, the classiﬁer with distance features achieves 76.6%
of micro-averaged F-measure. Diﬀerent types of distance features seemed to be
complementary to each other due to the absence of one of them leads to ap-
proximately the same loss of results. The baseline model without distance set of
feature (see row ‘word’ in Table 2) decrease results on 19% of micro F-measure,
which evidences the importance of these parameters for relation classiﬁcation.
The word-based features also improved the performance of the relation ex-
traction system. The most signiﬁcant improvement of micro F-measure obtained
with a bag of words feature (+3.8 %), which can be explained by a larger vec-
tor size compared to the rest of the word-based features. The entity type and
a bag of entities feature increased the results of the baseline on 2.7% and 2.1%
respectively (see rows ‘baseline-type’ and ‘baseline-boe’).
The results for embedding features show that entity embeddings and similar-
ity feature decrease the results regardless of a word embedding model used. The
context embedding feature achieved the most considerable improvement of base-
line results and obtained 92.6% of micro F-measure. Moreover, the model trained
only with the sent2vec feature, outperformed the baseline by 1.8%. This result
leads to the conclusion that the context between candidate entities contains more
useful information to make a conclusion about relations than candidate entities.
To evaluate knowledge features, it is better to consider the results for diﬀer-
ent relation types separately. The supplement of UMLS based feature to baseline
model increased the results of baseline for severity, reason, and adverse relation
types on 0.3%, 0.9% and 0.5% of F-measure respectively. The model with a com-
bination of baseline and MeSH semantic types feature increased the results of
baseline for severity and reason types on 0.5% and 0.6% of F-measure, respec-
tively. The FDA co-occurrence feature increased the results for frequency type
on 1.3%, while for the rest of the types results are in par. The number of co-
occurrence in the biomedical texts feature improved the classiﬁer performance
for adverse relation type on 2%. Thus, the knowledge features improved model
results for selected types of features.
BERT model achieved the best results for the severity, route, dose, duration,
and frequency types of relation. However, for a reason and adverse types, this
model obtained F-measure approximately lower on 10% than a random forest
with baseline and context embedding features. Thus, BERT gained 90.5% of
micro F-measure, and this is the second result among all evaluated models. We

42
suppose that the results reducing for adverse and reason types can be caused
for two reasons: (i) the same disease in diﬀerent cases could be an adverse drug
reaction and a reason, (ii) the average length of the context for these relation
types is too long to catch the relation between entities.
A comparison of results for diﬀerent types of relation shows that the best
result was achieved for route (97.6%). This result roughly stays on par with the
best results for severity, reason, dose, duration, and frequency types, while the
best results for adverse type lower on 10.7%. This diﬀerence in results could be
due to the greater lexicon variety of adverse drug reaction entity type.
To sum up this section, three important conclusions can be drawn. First, the
distance and word-based features are beneﬁcial for the relation classiﬁer. Sec-
ond, the context embedding has more impact on entities relations than entities
embeddings. Finally, the prior knowledge improves the results on particular re-
lation types and the most improvement achieved on adverse relation type with
biomedical text co-occurrence feature.

6 Conclusion

In this study, we have investigated the diﬀerent types of features for drug-related
information extraction tasks from EHRs. Our evaluation on MADE competition
corpus shows that context embedding, distance, and word features bring the
most beneﬁcial to relation extraction task. The classiﬁer with a combination of
these sets of features outperformed state-of-the-art results. These facts lead to
the conclusion that the context between entities plays a crucial role in relation
detection. The detailed analysis of results showed that prior knowledge about
entities co-occurrence improved the results for adverse relation type. Our future
research will focus on the investigation of modern neural networks for relation
extraction from EHRs. We also plan to analyze various context representation
methods and extend experiments on other biomedical corpora.
Acknowledgments
This research was supported by the Russian Foundation for Basic Research grant
no. 19-07-01115.

References
1. Bates, D.W., Cullen, D.J., Laird, N., Petersen, L.A., Small, S.D., Servi, D., Laffel,
G., Sweitzer, B.J., Shea, B.F., Hallisey, R., et al.: Incidence of adverse drug events
and potential adverse drug events: implications for prevention. Jama 274 (1), 29–
34 (1995).
2. Batin, M., Turchin, A., Sergey, M., Zhila, A., Denkenberger, D.: Artificial intelli-
gence in life extension: From deep learning to superintelligence. Informatica 41(4)
(2017)
3. de Bruijn, B., Cherry, C., Kiritchenko, S., Martin, J., and Zhu, X.: Nrc at i2b2:
one challenge, three practical tasks, nine statistical systems, hundreds of clinical

43
records, millions of useful features. In: Proceedings of the 2010 i2b2/VA Workshop
on Challenges in Natural Language Processing for Clinical Data. Boston, MA,
USA: i2b2 (2010).
4. Chapman, A.B., Peterson, K.S., Alba, P.R., DuVall, S.L., and Patterson, O.V.:
Detecting adverse drug events with rapidly trained classification models. Drug
safety, 1–10 (2019).
5. Chen, Q., Zhuo, Z., and Wang, W.: Bert for joint intent classification and slot
filling. arXiv preprint arXiv:1902.10909 (2019).
6. Chen, Q., Peng, Y., and Lu, Z.: Biosentvec: creating sentence embeddings for
biomedical texts. arXiv preprint arXiv:1810.09302 (2018).
7. Dandala, B., Joopudi, V., and Devarakonda, M.: Adverse drug events detection
in clinical notes by jointly modeling entities and relations using neural networks.
Drug safety, 1–12 (2019).
8. Demner-Fushman, D., Apostolova, E., Islamaj Dogan, R., et al.: Nlms system de-
scription for the fourth i2b2/va challenge. In: Proceedings of the 2010 i2b2/VA
Workshop on Challenges in Natural Language Processing for Clinical Data. Boston,
MA, USA: i2b2 (2010).
9. Demner-Fushman, D., Chapman, W.W., and McDonald, C.J.: What can natural
language processing do for clinical decision support? Journal of Biomedical Infor-
matics 42 (5), 760–772 (2009).
10. Devlin, J., Chang, M.W., Lee, K., and Toutanova, K.: Bert: Pre-training
of deep bidirectional transformers for language understanding. arXiv preprint
arXiv:1810.04805 (2018).
11. Divita, G., Treitler, O., Kim, Y., et al.: Salt lake city vas challenge submissions. In:
Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language
Processing for Clinical Data. Boston, MA, USA: i2b2 (2010).
12. Frankovich, J., Longhurst, C.A., Sutherland, S.M.: Evidence-based medicine in the
emr era. N Engl J Med 365(19), 1758–1759 (2011)
13. Grouin, C., Abacha, A.B., Bernhard, D., Cartoni, B., Deleger, L., Grau, B.,
Ligozat, A.L., Minard, A.L., Rosset, S., and Zweigenbaum, P.: Caramba: concept,
assertion, and relation annotation using machine-learning based approaches. In:
i2b2 Medication Extraction Challenge Workshop (2010).
14. Gurwitz, J.H., Field, T.S., Harrold, L.R., Rothschild, J., Debellis, K., Seger, A.C.,
Cadoret, C., Fish, L.S., Garber, L., Kelleher, M., et al.: Incidence and preventability
of adverse drug events among older persons in the ambulatory setting. Jama 289
(9), 1107–1116 (2003).
15. Jagannatha, A., Liu, F., Liu, W., and Yu, H.: Overview of the first natural language
processing challenge for extracting medication, indication, and adverse drug events
from electronic health record notes (made 1.0). Drug safety, 1–13 (2018).
16. Jensen, P.B., Jensen, L.J., and Brunak, S.: Mining electronic health records: to-
wards better research applications and clinical care. Nature Reviews Genetics 13
(6), 395 (2012).
17. Jonnalagadda, S. and Gonzalez, G.: Can distributional statistics aid clinical con-
cept extraction. In: Proceedings of the 2010 i2b2/VA workshop on challenges in
natural language processing for clinical data. Boston, MA, USA: i2b2 (2010).
18. Le, H., Hoi, S., Sahoo, D., and Chen, N.: End-to-end multimodal dialog systems
with hierarchical multimodal attention on video features. In: DSTC7 at AAAI2019
Workshop (2019).
19. Li, F. and Yu, H.: An investigation of single-domain and multidomain medication
and adverse drug event relation extraction from electronic health record notes

44
using advanced deep learning models. Journal of the American Medical Informatics
Association 26 (7), 646–654 (2019).
20. Magge, A., Scotch, M., and Gonzalez-Hernandez, G.: Clinical ner and relation ex-
traction using bi-char-lstms and random forest classifiers. In: International Work-
shop on Medication and Adverse Drug Event Detection, 25–30 (2018).
21. Moen, S. and Ananiadou, T.S.S.: Distributional semantics resources for biomedical
text processing. Proceedings of LBM, 39–44 (2013).
22. Munkhdalai, T., Liu, F., and Yu, H.: Clinical relation extraction toward drug safety
surveillance using electronic health record narratives: classical learning versus deep
learning. JMIR public health and surveillance 4 (2), (2018).
23. Pao, M.L., Grefsheim, S.F., Barclay, M.L., Woolliscroft, J.O., McQuillan, M., and
Shipman, B.L.: Factors affecting students’ use of medline. Computers and Biomed-
ical Research 26 (6), 541–555 (1993).
24. Patrick, J., Nguyen, D., Wang, Y., and Li, M.: I2b2 challenges in clinical natu-
ral language processing 2010. In: Proceedings of the 2010 i2b2/VA Workshop on
Challenges in Natural Language Processing for Clinical Data. Boston, MA, USA:
i2b2 (2010).
25. Patrick, J. and Li, M.: A cascade approach to extracting medication events. In:
Proceedings of the Australasian Language Technology Association Workshop 2009,
99–103 (2009).
26. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.: Scikit-learn: Machine
learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011).
27. Roberts, K., Rink, B., and Harabagiu, S.: Extraction of medical concepts, asser-
tions, and relations from discharge summaries for the fourth i2b2/va shared task.
In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language
Processing for Clinical Data. Boston, MA, USA: i2b2 (2010).
28. Solt, I., Szidarovszky, F.P., and Tikk, D.: Concept, assertion and relation extrac-
tion at the 2010 i2b2 relation extraction challenge using parsing information and
dictionaries. Proc. of i2b2/VA Shared-Task. Washington, DC (2010).
29. Sun, C., Huang, L., and Qiu, X.: Utilizing bert for aspect-based sentiment analysis
via constructing auxiliary sentence. arXiv preprint arXiv:1903.09588 (2019).
30. Uglow, H., Zlocha, M., and Zmyślony, S.: Semeval 2019 task 6: An explo-
ration of state-of-the-art methods for offensive language detection. arXiv preprint
arXiv:1903.07445 (2019).
31. Uzuner, Ö., South, B.R., Shen, S., and DuVall, S.L.: 2010 i2b2/va challenge on
concepts, assertions, and relations in clinical text. Journal of the American Medical
Informatics Association 18 (5), 552–556 (2011).
32. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
L., and Polosukhin, I.: Attention is all you need. In: Advances in Neural Information
Processing Systems, 5998–6008 (2017).
33. Wei, C.H., Peng, Y., Leaman, R., Davis, A.P., Mattingly, C.J., Li, J., Wiegers,
T.C., and Lu, Z.: Assessing the state of the art in biomedical relation extraction:
overview of the biocreative v chemical-disease relation (cdr) task. Database 2016
(2016).
34. Xu, D., Yadav, V., and Bethard, S.: Uarizona at the made1. 0 nlp challenge. Pro-
ceedings of machine learning research 90, 57 (2018).
35. Zhu, C., Zeng, M., and Huang, X.: Sdnet: Contextualized attention-based deep
network for conversational question answering. arXiv preprint arXiv:1812.03593
(2018).