=Paper=
{{Paper
|id=Vol-2226/paper4
|storemode=property
|title=Automated Detection of Adverse Drug Reactions in the Biomedical Literature Using Convolutional Neural Networks and Biomedical Word Embeddings
|pdfUrl=https://ceur-ws.org/Vol-2226/paper4.pdf
|volume=Vol-2226
|authors=Diego Saldana Miranda
|dblpUrl=https://dblp.org/rec/conf/swisstext/Miranda18
}}
==Automated Detection of Adverse Drug Reactions in the Biomedical Literature Using Convolutional Neural Networks and Biomedical Word Embeddings==
Automated Detection of Adverse Drug Reactions in the Biomedical Literature
Using Convolutional Neural Networks and Biomedical Word Embeddings
Diego Saldana Miranda
Novartis Pharma A.G.
Applied Technology Innovation
Novartis Campus
4056 Basel
diego.saldana miranda@novartis.com
odic reports such as Development Safety Update Re-
ports (DSURs) and Periodic Safety Update Reports
Abstract (PSURs) regarding the safety of their drugs and prod-
ucts (Krishnamurthy et al., 2017).
Monitoring the biomedical literature for cases One of the most important sources of information
of Adverse Drug Reactions (ADRs) is a crit- to be monitored in pharmacovigilance is the biomed-
ically important and time consuming task ical literature (Pontes et al., 2014). To this end, large
in pharmacovigilance. The development of numbers of scientific abstracts and publications need
computer assisted approaches to aid this pro- to be screened and/or read in full in order to collect
cess in different forms has been the subject of information relevant to safety, and in particular Ad-
many recent works. verse Drug Reactions (ADRs) associated to a particu-
One particular area that has shown promise is lar drug.
the use of Deep Neural Networks, in particu- Screening and reading the biomedical literature is
lar, Convolutional Neural Networks (CNNs), a time consuming task and is of critical importance.
for the detection of ADR relevant sentences. It requires particular expertise, and needs to be per-
Using token-level convolutions and general formed by well-trained readers. Given this, systems
purpose word embeddings, this architecture that enable human readers to perform this task faster
has shown good performance relative to more and more effectively would be of great value.
traditional models as well as Long Short Term
Memory (LSTM) models. 2 Background
In this work, we evaluate and compare two Computer assisted pharmacovigilance and, more
different CNN architectures using the ADE specifically, the automation of the detection of ADR
corpus. In addition, we show that by de- relevant information across various data sources has
duplicating the ADR relevant sentences, we the potential to have great positive impact on the
can greatly reduce overoptimism in the clas- pharmaceutical industry. There is a very vast ar-
sification results. Finally, we evaluate the use ray of sources of potential ADR relevant information,
of word embeddings specifically developed including both structured and unstructured data re-
for biomedical text and show that they lead sources.
to a better performance in this task. In many cases, adverse reactions are initially de-
tected through unstructured means of communication,
1 Introduction such as a patient speaking to a healthcare profes-
sional, and case reports written by physicians and pub-
Pharmacovigilance is a crucial component at every lished in biomedical literature sources, such as MED-
stage of the drug development cycle, and regulations LINE, PubMed and EMBASE (Rison, 2013). Sponta-
require pharmaceutical companies to prepare peri- neous reporting can also be made through telephone
calls, email communication, and even fax (Vallano
In: Mark Cieliebak, Don Tuggener and Fernando Benites (eds.):
Proceedings of the 3rd Swiss Text Analytics Conference (Swiss- et al., 2005). Such information is processed, gener-
Text 2018), Winterthur, Switzerland, June 2018 ally through human intervention in order to properly
1
33
categorize them and add the necessary metadata. manually curated lexicons which could be used to
Other potential sources of safety signals include build cancer drug-side effect (drug SE) pair knowl-
electronic medical/health records (EMRs/EHRs) edge bases from scientific publications (Xu and Wang,
(Park et al., 2011). Similarly, omics, chemical, phe- 2014c). The authors also described a method to ex-
notypic and metabolic pathway data can be analyzed tract syntactical patterns, via parse trees from the
using a diverse array of methods to find associations Stanford Parser (Xu and Wang, 2014a), based on
between drugs and specific side effects (Liu et al., known seed cancer drug-SE pairs. The patterns can
2012; Mizutani et al., 2012; Lee et al., 2011). In then be used to extract new cancer drug-SE pairs.
recent years, social media websites have also become They further proposed an approach using SVM classi-
a potential source of safety signals (Karimi et al., fiers to categorize tables from cancer related literature
2015; Sarker and Gonzalez, 2015; Tafti et al., 2017). as either ADR relevant or not (Xu and Wang, 2015a).
Finally, after careful processing, the data is usually The authors then extracted cancer drug-SE pairs from
aggregated and stored in structured databases for re- the tables using a lexicon-based approach and com-
porting and/or aggregation. Many regulatory agencies pared them with data from the FDA label information.
maintain databases that aggregate information regard- Xu et al. also evaluated their method in a large scale,
ing reported adverse events, such as the FDA Adverse full text corpus of oncological publications (Xu and
Event Reporting System (FAERS) (Fang et al., 2014) Wang, 2015b), extracting drug-SE pairs and showing
in the U.S., EudraVigilance in Europe (Banovac et al., good correlation of the extracted pairs with gene tar-
2017), and the MedEffect Adverse Reaction Online gets and disease indications.
Database in Canada (Barry et al., 2014). There are a number of available data resources for
The aim of our work is to contribute towards the de- the purpose of ADR signal detection. Gurulingappa
velopment of systems that provide assistance to read- et al. introduced the ADE corpus, a large corpus of
ers in charge of finding ADR signals in the biomedical MEDLINE sentences annotated as ADR relevant or
literature. As such, the ideal system should be able to not (Gurulingappa et al., 2012). Karimi et al. de-
accurately discriminate between ADR relevant and ir- scribed CADEC, a corpus of social media posts with
relevant sentences in the documents that it processes. ADE annotations (Karimi et al., 2015) including map-
In the following section, we detail some of the past pings to vocabularies such as SNOMED. Further, the
efforts to automate this as well as other tasks related to annotations include detailed information such as drug-
the extraction of ADR relevant information from the event and drug-dose relationships. Sarker et al. de-
biomedical literature. scribed an approach using SVM classifiers, as well as
diverse feature engineering methods, to classify clini-
3 Related Work cal reports and social media posts from multiple cor-
The automation of the detection of ADR relevant in- pora as ADR relevant or not (Sarker and Gonzalez,
formation across various data sources has received 2015). Odom et al. explored an approach using rela-
much attention in recent years. Ho et al. performed tional gradient boosting (FRGB) models to combine
a systematic review and summarized their findings information learned from labelled data with advice
on various methods to predict ADEs ranging from from human readers in the identification of ADRs in
omics to social media (Ho et al., 2016). In addition, the biomedical literature (Odom et al., 2015). Adams
the authors presented a list of public and commercial et al. proposed an approach using custom search
data sources available for the task. Similarly, Tan et PubMed queries making use of MeSH subheadings to
al. summarized the available data resources and pre- automatically identify ADR related publications. The
sented the state of computational decision support sys- authors conducted an evaluation by comparing with
tems for ADRs (Tan et al., 2016). Harpaz et al. pre- results manually tagged by investigators, obtaining a
pared an overview of the state of the art in text mining precision of 0.90 and a recall of 0.93.
for Adverse Drug Events (ADEs) (Harpaz et al., 2014) Some researchers have tried to combine informa-
in various contexts, such as the biomedical literature, tion from structured databases with the unstructured
product labelling, social media and web search logs. data found in the biomedical literature. For exam-
Xu et al. initially proposed a method based on ple, Xu et al. showed that, by combining informa-
2
34
tion from FAERS and MEDLINE using signal boost- by Pyysalo et al. (2013) and show that, by using
ing and ranking algorithms, it’s possible to improve these embeddings in place of general-purpose
cancer drug-side effect (drug-SE pair) signal detection GloVe embeddings, it is possible to improve the
(Xu and Wang, 2014b). performance of the algorithm.
There have recently been efforts to use neural net-
works to improve the performance of the ADR sen- 4 Dataset
tence detection, entity and relation extraction tasks. The ADE corpus was introduced by Gurulingappa et
Gupta et al. proposed a two step approach for ex- al. (2012) in order to provide a benchmark dataset
tracting mentions of adverse events from social media: for the development of algorithms for the detection of
(1) predicting the drug based on the context, unsuper- ADRs in case reports. The original source of the data
vised; (2) predicting adverse event mentions based on was 2972 MEDLINE case reports. The data was la-
a tweet and the features learned in the previous step, belled by three trained annotators and their annotation
supervised (Gupta et al., 2017). Li et al. proposed ap- results were consolidated into a final dataset includ-
proaches combining CNNs and bi-LSTMS to perform ing 6728 ADE relations (in 4272 sentences), as well
named entity recognition as well as relation extrac- as 16688 non-ADR relevant sentences.
tion for ADRs in the annotated sentences in the ADE The authors calculated Inter-Annotator Agreement
dataset (Li et al., 2017). More recently, Ramamoor- (IAA), using F1 scores as a criterion, for adverse event
thy et al. described an approach using bi-LSTMs with entities between 0.77 and 0.80 for partial matches and
an attentional mechanism to jointly perform relation between 0.63 and 0.72 for exact matches. For more
extraction as well as visualize the patterns in the sen- detail, the reader can refer to the work of Gurulin-
tence. gappa et al. (Gurulingappa et al., 2012).
Huynh proposed using convolutional recurrent neu-
ral networks (CRNN) and convolutional neural net- 4.1 Preprocessing
works with attention (CNNA) to identify ADR related
tweets and MEDLINE article sentences (Huynh et al., The dataset is suitable for two types of tasks: (1) cat-
2016). The CNNA’s attention component had the at- egorization of sentences as either relevant for ADRs
tractive property that it allows visualization of the in- or not; and (2) extraction of drug-adverse event rela-
fluence of each word in the decision of the network. tions and drug-dose relations. Because there can be
In this work, we introduce approaches building more than one relation in the same sentence, the ADR
upon previous results using convolutional neural net- relevant sentences are sometimes duplicated.
works (CNNs) (Huynh et al., 2016) to detect ADR rel- The presence of duplicates can lead to situations
evant sentences in the biomedical literature. Our key where the same sentence is present in both the training
contributions are as follows: and test datasets, as well as to an overall distortion of
the distribution of the sentences. In order to prevent
• We compare Huynh’s CNN approach, which this, we de-duplicate these sentences, which results in
is based on the architecture proposed by Kim 4272 ADR relevant sentences, as stated in the work of
(2014), with a deeper architecture based on the Gurulingappa et al. (Gurulingappa et al., 2012).
one proposed by Hughes et al. (2017), using the
ADE dataset, showing that Kim’s architecture 5 Methods
performs much better for this task and dataset.
In the following sections, we will describe (1) the
• We apply a de-duplication of the ADR relevant word embeddings used in our learning algorithms; and
sentences in the ADE dataset, (Gurulingappa (2) the two different CNN architectures evaluated in
et al., 2012) which we believe leads to a better our experiments.
estimation of the performance of the algorithm
and does not seem to be applied in some of the 5.1 Embeddings
previous works. GloVe 840B
• We evaluate the use of word embeddings devel- As in Huynh’s work (Huynh et al., 2016), we use
oped specifically for biomedical text introduced pre-trained word embeddings. Huynh focused mainly
3
35
on the general purpose GloVe Common Crawl 840B,
300 dimensional word embeddings (Pennington et al.,
2014).
Pyysalo’s Embeddings
We also evaluate the use of 200 dimensional word2vec
embeddings introduced by Pyysalo et al. (Pyysalo
et al., 2013). These word embeddings were fitted on
a corpus combining PubMed abstracts, PubMed Cen-
tral Open Access (PMC OA) full text articles as well
as Wikipedia articles. We also initialize zero valued
vectors for the unknown word symbol as well as for
the padding symbol.
Figure 1: Diagram of the architecture proposed by
Preprocessing Huynh (Huynh et al., 2016).
As in Huynh’s work, no new word vectors are ini-
tialized for tokens not present in the pre-trained vo- 1 hX
N
cabulary, and only the tokens that are in the 20000 L Θ =− yi log yˆi +
most frequent words in the dataset are included. The N
i=1
i
remaining tokens are mapped to the unknown word
(1 − yi )log 1 − yˆi . (2)
symbol vector. We enable the algorithm to optimize
the pre-trained weights after initialization. We fol-
Huynh’s CNN architecture
low the preprocessing strategy used by Huynh (Huynh
et al., 2016), which is itself based on that of Kim This architecture consists of the use of a 1D-
(Kim, 2014), and includes expansion of contractions, convolution layer with 300 filters and a 5 token win-
and additionally, all non-alphabetic characters are re- dow applied on the word vectors. This is followed by
placed with spaces prior to tokenization. a Rectified Linear Unit (ReLu) and a 1D-max pool-
ing over the full axis of 1D-convolution results. This
leads to a 300 dimensional vector representation, v,
6 Convolutional Neural Network Architec- which is used as an input for the classification net-
tures work described above. Figure 1 shows a diagram of
the resulting architecture. Note that M , the number of
In all architectures described below, the sentences are
embedding dimensions, may be equal to either 300 or
mapped to a vector representation, v. Dropout is ap-
200, but is shown as 300 for illustration in the figure.
plied to v during training with a dropout probability
To reduce overfitting, a constraint is added to en-
of 0.5. As in usual classification tasks, the predicted
sure that the L2 norms of each one of the 1D convolu-
probability of a possitive outcome, that is, of the sen-
tion filters are never above a threshold value, s, after
tence being ADR relevant, is given by
each batch. For more detail, the reader can refer the
works of Huynh (Huynh et al., 2016) and Kim (Kim,
ŷ = ρ vT w + b , (1) 2014).
where w is a vector of coefficients, b is the inter- Hughes’ CNN architecture
cept, and ρ is the sigmoid function. Based on the approach proposed by Hughes (Hughes
The objective function to be optimized is the cross et al., 2017) we explored a deeper architecture, with
entropy, which can also be interpreted as an average multiple successive stages of 1D-convolution, non-
negative log-likelihood, and is given by linear transformations, and max pooling.
4
36
7 Experimental Setup
Following the approach used by Huynh et al. (2016),
we used 10-fold cross validation to evaluate the per-
formance of our classifiers. The normalization thresh-
old used to clip the L2 norms of the filters, s, was set
to 9.
The Adam optimizer (Kingma and Ba, 2014) was
used to minimize the loss, L Θ , with 8 epochs and
a batch size of 50. To avoid overfitting, early stopping
is used based on a development set consisting of 10%
of the training data of each fold. For the decision of
the classifier, instead of a ŷ threshold of 0.5, we deter-
mine the optimum threshold by evaluating all possible
thresholds present in the development set of each fold
and keeping the threshold that results in the best F1
score.
After every 10 batches, the optimal threshold is de-
termined from the development set and the associated
best F1 score is obtained. Optimization is stopped if
the F1 score on the development set fails to improve
after 6 steps. The set of CNN parameters associated
with the best F1 score observed throughout the train-
ing process is then kept and used to evaluate the net-
work’s performance on the test set of each fold.
Figure 2: Diagram of an architecture based on the one We use the architecture originally proposed by
proposed by Hughes (Hughes et al., 2017). Huynh (Huynh et al., 2016) without de-duplication as
the baseline results to understand the impact of the
de-duplication, choice of embeddings, and CNN ar-
This architecture starts with two successive stages chitecture.
of 1D-convolutions with 256 filters and a 5 token win- All CNN implementations were done using Python
dow, each followed by a ReLu transformation. After 3.4.5 (Rossum, 1995) and Tensorflow 1.2.0 (Abadi
this, a 1D-max pooling on the axis of the convolutions et al., 2015).
with a window of length 5 is applied. Finally, an-
other two successive stages of 1D-convolutions with 8 Results
256 filters and a window of length 5, each followed by 8.1 Impact of De-duplication on Classification
a ReLu transformation, is applied, followed by a 1D- Performance Estimates
max pooling over the full axis of the 1D-convolutions.
Table 8.1 shows a comparison of the performance
Similar to the case of the previous architecture, this metrics of our implementation of Huynh’s architec-
leads to a 256 dimensional vector representation, v, ture and GloVe 849B word embeddings with and
and a constraint is used to keep the L2 norms of all without de-duplication of the sentences labelled as
1D-convolution filters under a threshold value s. Fig- ADR relevant. After de-duplication, most of the per-
ure 2 shows a diagram of the resulting architecture. As formance metrics were lower, since the presence of
previously, note that M may be equal to either 300 or duplicates in the positive samples resulted in overly
200, but is shown as 300 for illustration in the figure. optimistic results.
The biggest impact was observed on precision, re-
For further detail, the reader can refer to the work call and F1 scores. Overall accuracies and area under
of Hughes (2017). the ROC curve (AUROC) didn’t seem to be greatly
5
37
De-duplication No Yes This also led to an increased average F1 score from
Accuracy 0.919 0.914 0.790 to 0.798. The average AUROC also increased
Precision 0.858 0.784 from 0.954 to 0.958. Specificity increased from 0.943
Recall 0.860 0.798 to 0.949, and recall was the only metric that was
F1-score 0.859 0.790 slightly reduced from 0.798 to 0.797.
Specificity 0.942 0.943
AUROC 0.966 0.954 8.3 Comparison With Hughes’ CNN Architec-
ture
Table 1: Performance metrics of Huynh’s architecture
using GloVe 840B embeddings with and without de- Architecture Huynh Hughes
duplication of the ADR relevant sentences. Accuracy 0.918 0.905
Precision 0.800 0.765
affected. Note that the specificity, which is the true
Recall 0.797 0.771
negative rate, was higher after de-duplication.
F1-score 0.798 0.767
We initially obtained somewhat lower perfor-
Specificity 0.949 0.939
mances for the baseline model without de-duplication
AUROC 0.958 0.940
compared to the one reported by Huynh et al. (2016)
even though we accurately followed the described ar-
chitecture. After investigating the differences in the Table 3: Performance metrics of Huynh’s and
code, we noticed that during pre-processing, charac- Hughes’ architectures with de-duplication and
ters that are not alphabetic are replaced with spaces Pyysalo’s embeddings.
prior to tokenization. After incorporating this step into Table 8.3 shows a comparison between the per-
our code, the results matched the previously reported formances of our implementations of Huynh’s and
ones much better. Hughes’ architectures. In both cases, de-duplication
of ADR relevant sentences, and biomedical embed-
8.2 Impact of Biomedical Word Embeddings
dings were used. The former ourperformed the latter
Word Embeddings Glove 840B Pyysalo in every performance metric. The biggest improve-
Accuracy 0.914 0.918 ment was in metrics associated to the positive class,
Precision 0.784 0.800 such as precision, recall, and F1 score.
Recall 0.798 0.797
F1-score 0.790 0.798 9 Discussion
Specificity 0.943 0.949
The purpose of this work was to evaluate the use
AUROC 0.954 0.958
of convolutional neural networks (CNNs) architec-
tures and biomedical word embeddings for the au-
Table 2: Performance metrics of Huynh’s architec-
tomatic categorization of sentences relevant to ad-
ture with de-duplication with GloVe 840B embed-
verse drug reactions (ADRs) in case reports present
dings and Pyysalo’s embeddings.
in the biomedical literature. For this purpose, we used
Table 8.2 shows a comparison of the performance the ADE corpus, which consists of sentences coming
metrics with de-duplication of ADR relevant sen- from 2972 MEDLINE case reports labelled by trained
tences using the GloVe 840B word embeddings, and annotators. This includes 4272 ADR relevant sen-
the word embeddings fit for biomedical data purposes tences, as well as 16688 non-ADR relevant sentences.
proposed by Pyysalo et al. (Pyysalo et al., 2013). We showed that, because of duplications present in
In most cases, the use of biomedical word embed- the ADE corpus, the use of this dataset for sentence
dings was favorable or non-detrimental to the perfor- classification without performing a de-duplication can
mance metrics. The largest improvement was seen lead to overoptimistic performance estimates. In ad-
on the increase of average precision from 0.780 with dition, we showed that, by using biomedical word em-
GloVe 840B to 0.800 with the biomedical embed- beddings, as opposed to general purpose word embed-
dings. dings, it’s possible to improve upon the performance
6
38
of the algorithm. Finally, we compared the perfor- research.
mance of our implementations of two CNN architec-
tures, with the architecture proposed by Huynh out-
performing the architecture proposed by Hughes in References
this task and dataset in every metric. Martı́n Abadi, Ashish Agarwal, Paul Barham, Eugene
One important measure of the potential noise in Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado,
the inputs of human annotators is the Inter Annotator Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghe-
Agreement (IAA) (Gurulingappa et al., 2012), which mawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving,
Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz
in this dataset was measured by its original authors Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion
by calculating inter annotator F1 scores. Although Mané, Rajat Monga, Sherry Moore, Derek Murray,
this measure was calculated on the entity (partial and Chris Olah, Mike Schuster, Jonathon Shlens, Benoit
exact) matching level, and although there has been Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vin-
cent Vanhoucke, Vijay Vasudevan, Fernanda Viégas,
a harmonization process, it is informative of the po- Oriol Vinyals, Pete Warden, Martin Wattenberg, Mar-
tential noise in the inputs used to build the dataset. tin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. Ten-
The fact that the IAAs for partial matches of adverse sorFlow: Large-scale machine learning on heteroge-
events ranged between 0.77 and 0.80 indicates that neous systems. Software available from tensorflow.org.
https://www.tensorflow.org/.
aiming for near perfect predictions may be unrealistic,
since there is a considerable degree of disagreement Marin Banovac, Gianmario Candore, Jim Slattery, Fran-
between human annotators. cois Houez, David Haerry, Georgy Genov, and Peter
Arlett. 2017. Patient reporting in the EU: Analysis
10 Conclusions and Future Work of EudraVigilance data. Drug Safety 40(7):629–645.
https://doi.org/10.1007/s40264-017-0534-1.
Our results highlight the importance of sentence de-
Arden R. Barry, Sheri L. Koshman, and Glen J.
duplication, pre-processing, choice of word embed-
Pearson. 2014. Adverse drug reactions.
dings, and neural network architectures when apply- Canadian Pharmacists Journal / Revue des
ing convolutional neural networks (CNNs) for the de- Pharmaciens du Canada 147(4):233–238.
tection of adverse drug reaction (ADR) relevant sen- https://doi.org/10.1177/1715163514536523.
tences in the biomedical literature using the ADE
H Fang, Z Su, Y Wang, A Miller, Z Liu, P C Howard,
dataset. We believe that these are only a few of the W Tong, and S M Lin. 2014. Exploring the FDA
factors that can greatly influence the performance of adverse event reporting system to generate hypothe-
the algorithms performing these tasks. ses for monitoring of disease characteristics. Clin-
Future work could include the use of either ical Pharmacology & Therapeutics 95(5):496–498.
https://doi.org/10.1038/clpt.2014.17.
grid-based, random, or reinforcement-learning based
search for more optimal CNN architectures, as well Shashank Gupta, Sachin Pawar, Nitin Ramrakhiyani,
as the evaluation of architectures other than CNNs. Girish Keshav Palshikar, and Vasudeva Varma. 2017.
In addition, another very interesting area explored Semi-supervised recurrent neural network for ad-
verse drug reaction mention extraction. CoRR
in previous works (Huynh et al., 2016) was the as- abs/1709.01687. http://arxiv.org/abs/1709.01687.
pect of visualization using CNNs with Attention (CN-
NAs). However, this algorithm seemed to underper- Harsha Gurulingappa, Abdul Mateen Rajput, Angus
form compared to the normal CNN. Building upon Roberts, Juliane Fluck, Martin Hofmann-Apitius, and
Luca Toldo. 2012. Development of a benchmark
this approach to improve its performance while re- corpus to support the automatic extraction of drug-
taining its attractive visualization properties would be related adverse effects from medical case reports.
an important step towards the development of systems Journal of Biomedical Informatics 45(5):885–892.
that assist human readers. https://doi.org/10.1016/j.jbi.2012.04.008.
Rave Harpaz, Alison Callahan, Suzanne Tamang, Yen
11 Acknowledgements Low, David Odgers, Sam Finlayson, Kenneth Jung,
Paea LePendu, and Nigam H. Shah. 2014. Text min-
The author would like to thank Abhimanyu Verma as ing for adverse drug events: the promise, challenges,
well as the Technology Architecture & Digital depart- and state of the art. Drug Safety 37(10):777–790.
ment at Novartis Pharma A.G. for their support in this https://doi.org/10.1007/s40264-014-0218-z.
7
39
Tu-Bao Ho, Ly Le, Dang Tran Thai, and Siriwon Phillip Odom, Vishal Bangera, Tushar Khot, David Page,
Taewijit. 2016. Data-driven approach to de- and Sriraam Natarajan. 2015. Extracting adverse drug
tect and predict adverse drug reactions. Cur- events from text using human advice. In Artificial In-
rent Pharmaceutical Design 22(23):3498–3526. telligence in Medicine, Springer International Publish-
https://doi.org/10.2174/1381612822666160509125047. ing, pages 195–204. https://doi.org/10.1007/978-3-319-
19551-3 26.
Mark Hughes, Irene Li, Spyros Kotoulas, and Toyotaro
Suzumura. 2017. Medical text classification using con- Man Young Park, Dukyong Yoon, KiYoung Lee, Seok Yun
volutional neural networks. CoRR abs/1704.06841. Kang, Inwhee Park, Suk-Hyang Lee, Woojae Kim,
http://arxiv.org/abs/1704.06841. Hye Jin Kam, Young-Ho Lee, Ju Han Kim, and
Rae Woong Park. 2011. A novel algorithm for
Trung Huynh, Yulan He, Alistair Willis, and Stefan Rger. detection of adverse drug reaction signals using a
2016. Adverse drug reaction classification with deep hospital electronic medical record database. Phar-
learning. In International Conference of Computational macoepidemiology and Drug Safety 20(6):598–607.
Linguistics (COLING). https://doi.org/10.1002/pds.2139.
Sarvnaz Karimi, Alejandro Metke-Jimenez, Madonna Jeffrey Pennington, Richard Socher, and Christopher D.
Kemp, and Chen Wang. 2015. Cadec: A Manning. 2014. Glove: Global vectors for word
corpus of adverse drug event annotations. representation. In Empirical Methods in Natural
Journal of Biomedical Informatics 55:73–81. Language Processing (EMNLP). pages 1532–1543.
https://doi.org/10.1016/j.jbi.2015.03.010. http://www.aclweb.org/anthology/D14-1162.
Yoon Kim. 2014. Convolutional neural networks Helena Pontes, Mallorie Clément, and Victoria Rolla-
for sentence classification. CoRR abs/1408.5882. son. 2014. Safety signal detection: The relevance
http://arxiv.org/abs/1408.5882. of literature review. Drug Safety 37(7):471–479.
https://doi.org/10.1007/s40264-014-0180-9.
Diederik P. Kingma and Jimmy Ba. 2014. Adam:
A method for stochastic optimization. CoRR S. Pyysalo, F. Ginter, H. Moen, T. Salakoski, and
abs/1412.6980. http://arxiv.org/abs/1412.6980. S. Ananiadou. 2013. Distributional seman-
tics resources for biomedical text processing.
Arun Chander Yadav Krishnamurthy, Jayasudha In Proceedings of LBM 2013. pages 39–44.
Dhanasekaran, and Anusha Natarajan. 2017. A http://lbm2013.biopathway.org/lbm2013proceedings.pdf.
succinct medical safety: periodic safety update reports.
International Journal of Basic & Clinical Pharma-
cology 6(7):1545. https://doi.org/10.18203/2319- Richard A Rison. 2013. A guide to writing case reports for
2003.ijbcp20172714. the journal of medical case reports and BioMed central
research notes. Journal of Medical Case Reports 7(1).
Sejoon Lee, Kwang H Lee, Min Song, and Doheon Lee. https://doi.org/10.1186/1752-1947-7-239.
2011. Building the process-drug–side effect network to
discover the relationship between biological processes Guido Rossum. 1995. Python reference manual. Technical
and side effects. BMC Bioinformatics 12(Suppl 2):S2. report, Amsterdam, The Netherlands, The Netherlands.
https://doi.org/10.1186/1471-2105-12-s2-s2.
Abeed Sarker and Graciela Gonzalez. 2015. Portable
Fei Li, Meishan Zhang, Guohong Fu, and Donghong Ji. automatic text classification for adverse drug
2017. A neural joint model for entity and relation ex- reaction detection via multi-corpus training.
traction from biomedical text. BMC Bioinformatics Journal of Biomedical Informatics 53:196–207.
18(1). https://doi.org/10.1186/s12859-017-1609-9. https://doi.org/10.1016/j.jbi.2014.11.002.
Mei Liu, Yonghui Wu, Yukun Chen, Jingchun Sun, Zhong- Ahmad P Tafti, Jonathan Badger, Eric LaRose, Ehsan
ming Zhao, Xue wen Chen, Michael Edwin Math- Shirzadi, Andrea Mahnke, John Mayer, Zhan Ye, David
eny, and Hua Xu. 2012. Large-scale prediction of ad- Page, and Peggy Peissig. 2017. Adverse drug event
verse drug reactions using chemical, biological, and discovery using biomedical literature: A big data neu-
phenotypic properties of drugs. Journal of the Amer- ral network adventure. JMIR Medical Informatics
ican Medical Informatics Association 19(e1):e28–e35. 5(4):e51. https://doi.org/10.2196/medinform.9170.
https://doi.org/10.1136/amiajnl-2011-000699.
Yuxiang Tan, Yong Hu, Xiaoxiao Liu, Zhinan Yin, Xue
S. Mizutani, E. Pauwels, V. Stoven, S. Goto, and Y. Yaman- wen Chen, and Mei Liu. 2016. Improving drug
ishi. 2012. Relating drug-protein interaction network safety: From adverse drug reaction knowledge discov-
with drug side effects. Bioinformatics 28(18):i522– ery to clinical implementation. Methods 110:14–25.
i528. https://doi.org/10.1093/bioinformatics/bts383. https://doi.org/10.1016/j.ymeth.2016.07.023.
8
40
A. Vallano, G. Cereza, C. Pedròs, A. Agustı́, I. Danés,
C. Aguilera, and J. M. Arnau. 2005. Obsta-
cles and solutions for spontaneous reporting of ad-
verse drug reactions in the hospital. British
Journal of Clinical Pharmacology 60(6):653–658.
https://doi.org/10.1111/j.1365-2125.2005.02504.x.
Rong Xu and QuanQiu Wang. 2014a. Automatic con-
struction of a large-scale and accurate drug-side-effect
association knowledge base from biomedical litera-
ture. Journal of Biomedical Informatics 51:191–199.
https://doi.org/10.1016/j.jbi.2014.05.013.
Rong Xu and QuanQiu Wang. 2014b. Large-scale com-
bining signals from both biomedical literature and the
FDA adverse event reporting system (FAERS) to im-
prove post-marketing drug safety signal detection. BMC
Bioinformatics 15(1):17. https://doi.org/10.1186/1471-
2105-15-17.
Rong Xu and QuanQiu Wang. 2014c. Toward cre-
ation of a cancer drug toxicity knowledge base: au-
tomatically extracting cancer drug—side effect rela-
tionships from the literature. Journal of the Amer-
ican Medical Informatics Association 21(1):90–96.
https://doi.org/10.1136/amiajnl-2012-001584.
Rong Xu and QuanQiu Wang. 2015a. Combining auto-
matic table classification and relationship extraction in
extracting anticancer drug–side effect pairs from full-
text articles. Journal of Biomedical Informatics 53:128–
135. https://doi.org/10.1016/j.jbi.2014.10.002.
Rong Xu and QuanQiu Wang. 2015b. Large-scale au-
tomatic extraction of side effects associated with tar-
geted anticancer drugs from full-text oncological ar-
ticles. Journal of Biomedical Informatics 55:64–72.
https://doi.org/10.1016/j.jbi.2015.03.009.
9
41