Comparative analysis of context representation models in the
         relation extraction task from biomedical texts*
                                         Ilseyar Alimova                      Elena Tutubalina
                                     Kazan Federal University             Kazan Federal University
                                          Kazan, Russia                        Kazan, Russia
                                    alimovaIlseyar@gmail.com               elvtutubalina@kpfu.ru


                                                                 Abstract
                                 This paper focuses on the task of extracting relations between entities in
                             biomedical texts. This study aims to identify the most effective method for
                             representing context between entities. We compare several context
                             representation methods such as a bag of words representation, average word
                             embeddings, sentence embedding, representations obtained by convolutional,
                             recurrent neural networks, and bidirectional encoder representations from
                             Transformers (BERT). We conduct a set of experiments on two benchmark
                             corpora of patient electronic health records and scientific articles in English.
                             As expected, thehighestclassificationresultswereobtainedwiththestate-of-
                             theart neural architecture BERT.


1 Introduction
     Relation extraction is one of the crucial problems in the field of natural language processing and information
extraction. Relation extraction aims to extract from unstructured text entities, which are semantically connected.
Relation extraction is a main step for developing different systems in the fields of natural language processing,
including, question-answering systems [34], ontology [41], information retriever [4]. In this paper, we focus on
extracting relations from biomedical texts [32]. In the field of biomedical text processing, relation extraction is
applied to extract adverse drug reaction and drug-related information [30], detecting protein-protein interactions
[6], identifying the influence of chemical on disease [42].
     The context between two entities is essential for relation extraction. Two entities can be related that depends on
the context between two entities. For example, in the passage of receipt given to a patient “Lorazepam 1 mg every
6 hours in case of nausea, Omeprazole 20 mg in a day” nausea is the indication of Lorazepam; therefore entities
Lorazepam and nausea are related to each other. However, in the sentence “Prochlorperazine 10 mg every 6 hours
in case of nausea, Valacyclovir 500 mg 2 times a day, Lorazepam 1 mg in case of insomnia” the entities Lorazepam
and nausea are not related.
     Inthispaper, we perfor man extensive comparison of context representation methods in order to identify the
most effective method for relation extraction in the biomedical domain. We consider several methods of context
representations: (i) a bag of words representation; (ii) averaged word embeddings from a word2vec model [27];
(iii) sentence embeddings from a sent2vec [7], (iv) representations obtained by convolutional neural networks
(CNN) [22], long short-term memory (LSTM) [14], and bidirectional encoder representations from Transformers
(BERT) [11]. We conduct a set of experiments on MADE and CDR corpora of texts from various sources. Natural
Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic
Health Record Notes (MADE) corpus consists of annotated electronic health records [16]. BioCreative V chemical


*
    Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0International (CC BY 4.0).


                                                                      1
disease relation (CDR) corpus [42] includes annotations of scientific articles on a biomedical domain. This study
examines the relationship between drugs and their attributes and between chemicals and diseases.

2 Related Work
    There are various approaches to the problem of identifying related entities in biomedical texts [3,5,15,18,24,28,
38]. Earlier works on relation extraction adopted frequency-based methods. This method calculates the frequency
of the entities occurrences within the given context length. If the resulting number is greater than the specified
threshold, then it is considered that the entities are related. The advantage of this approach is the simplicity of its
implementation, no need for linguistic analysis and labeled sampling. However, the significant drawback of this
approach is that it does not take into account the semantic interpretation of context between entities presented in a
text.
    The template-based approach is grounded in finding a match for linguistic patterns, represented as regular
expressions. Templates are generated automatically or manually based on context. The advantage of this approach
is that there is no need for an annotated corpus. However, a wide variety of contests generates a large number of
templates, which significantly reduces the quality of the system [8,12,35].
    The increase of annotated corpora of biomedical text number leads to experiments with machine learning
methods to the problem of the relation extraction [2,19,20,31,36,39]. According to this approach, the context is
encoded as the feature vectors. The most common features are:

    • bag of words: a feature vector that consists of the words before, after, and between entities;

    • part of speech tags: a feature vector consisting of parts of speech words before, after and between entities;

    • distance between the entities: the number of the words between the entities, the number of indicator
      wordsbetween the entities, for example, specific verbs that indicate the existence of a connection;

    • shortest syntactic tree path: the encoded shortest path from one entity to another in the syntactic parse tree.
    Recent relation extraction approaches are based on neural networks, where context and entities are encoded
with word embeddings as an input [10, 25, 37]. Sahu et al. applied CNN for extracting relations from patients’
electronic health records [37]. The model utilized as input the whole sentence encoded with word embeddings. The
obtained vectors sequentially passed through convolutional and dense layers. The results show that CNN can extract
global features, which can give good context representation and improve the quality of the system. Lv X. et al.
adopted autoencoder for context representation [25]. The experiments indicate that the proposed model is effective,
and the method of optimizing functions by the deep learning model has great potential. Dandala et al. employed
bidirectional long short term memory network with attention for extracting relations from electronic health records
[10]. The proposed approach achieved 84% of F-measure.
    A review of the literature shows that machine learning models are the most widely used method, and the most
common method for representing context is a bag of words. There are no studies that utilize sentence embeddings
to solve the problem of context representations.


3 Context Representation Methods
   Let context be text between two entities. For the evaluation, we select several approaches for context
representation, ranging from the simplest methods, such as a bag of words and an average vector representation of
words, to more complex, such as a vector representation of sentences, convolutional, and recurrent neural networks.
A classifier takes the context representation between two entities as input and predicts whether it express a relation.
Bag of words (bow) is one of the first models of text presentation, proposed by Zelling Harris in 1954 [13].
Currently, a bag of words is actively used for text classification and information retrieval. According to this model,


                                                          2
the number of occurrences of each word from the dictionary is calculated for the text, where the dictionary is a set
of unique words of all the texts of the training dataset. The model does not take into account the word order in the
text, which is one of its main disadvantages. Besides, the final text representation vector has a large dimension.
    The averaged word embeddings (word2vec) is calculated by summing the embedding of each word in the
context divided by the total number of words in the context. Tomas Mikolov proposed the word embedding model
in 2013 [27]. It is based on a neural network trained to predict a word by context on a large corpus of text, the
hidden states of which are later used as vectors for words. The advantage of this model is the ability to consider the
semantic meaning of the words. Thus, the vectors of words that are close in meaning will be close to each other in
the vector space. However, this property can be lost on the text level due to averaging vectors. Also, this
representation has a fixed dimension for all texts, equal to the length of the word embedding vector.
    Sentence embeddings (sent2vec) are one of the variations of word embedding representation model [33].
However, the neural network trains not only on separate words but also on word n-grams and the averaged
embeddings for the words in a sentence. Thus, the model can better represent the semantic meaning of the sentence
than a simple averaging of word embeddings.
    Convolutional neural network (CNN) is widely used for context modeling [17,21,22]. The network takes as an
input a matrix E consisting of context words encoded with word embeddings. We apply a standard convolutional
layer over the matrix E. It is followed by a global max-pooling layer to produce the text embedding:


                                                                         ,
where k ∈ Rv×d is a kernel matrix, v is the width of a kernel; B ∈ R(n−v)×d is a matrix composed of elements bij. The j
axis is computed using different parallel kernels. The max operation is applied alongside the i axis.
    Thus, each neuron on the next layer is connected not with all neurons, but only with a small localized subset of
neurons in the previous layer. This fact allows for identifying the most significant features for each of the input
matrix fragments. The pooling layer is used to reduce the size of the feature map. Most often, the function of
maxpooling or weighted average pooling is used.
    Recurrent neural network (RNN) is used to process sequential data such as time series or word sequences
[26]. The network utilizes information from the previous network states, which is one of the critical advantages of
this model. The model takes context words encoded with word embeddings as an input. At each step, the network
calculates the weights using the word embedding vector and the output obtained at the previous step. We use the
last cell state as the context representation. ys = cn,

                                                  hi = RNN(wi,ci−1),


where cn is the RNN memory state after reading the entire input sequence; hi is the RNN output produced using wi
(a word embedding) and ci−1 (memory state from the previous time step) as inputs.
    BERT (Bidirectional Encoder Representations from Transformers) is a recent neural network model for NLP
presented by Google [11]. The model obtained state-of-the-art results in various NLP tasks, including question
answering, dialog systems, text classification, and sentiment analysis. BERT neural network based on bidirectional
attention-based transformer architecture [40]. One of the main model advantages is the ability to give it a row text
as the input. In our experiments, we calculated the averaged vector of each word in the context. We utilize a
biomedical version of BERT called BioBERT [23].


                                                          3
4 Datasets
   We conduct experiments on two annotated corpora of biomedical texts: MADE [16] and CDR [42]. The overall
corpora statistic is presented in Table 1.

                                     Table 1: The overall statistic of corpora.
 Corpus     # Relations      Avg. context len.     Max. context len. (in characters)     # Unique context words
 MADE          27 145        29.9                  981                                   17 443
  CDR           3 013        167.1                 1 021                                 16 197
4.1 MADE corpus
Indication and Adverse Drug Events from Electronic Health Record Notes (MADE) corpus consist of 1089
anonymized electronic health records of patients with cancer [16]. Electronic records include an extract statement,
inspectionresults, and othernotes. The corpus containsnine typesof entities andseven typesofrelations. Annotated
entities can be divided into two groups: related to the disease or the drug. Entities of the first group: adverse drug
reaction (ADE), a reason to use the drag (Indication), the severity of the disease (Severity), and other symptoms
and diseases not included in previous groups (SSD). Entities related to drugs: name (Drug name), dose (Dose),
duration of taking a drug (Duration), frequency of taking a drug (Frequency), route of taking a drug (Route).
    The corpus includes seven types of relationships, 4 of which are between the name of the drug and its attributes:

    • Drug name – Dose
    • Drug name – Route
    • Drug name – Frequency
    • Drug name – Duration
    • Drug name - Indication
    • Drug name - ADE
    • SSD - Severity, includes the relationship between the severity of the disease and all types of entities
      includedin the group of diseases: ADE, Indication, SSD.
   Entities in relations can be found both in one sentence and indifferent ones. The corpus is divided into training
and test subsets.

4.2 CDR corpus
    The CDR corpus was developed for the BioCreative V competition [42]. The corpus consists of abstracts of
scientific articles collected from the PubMed resource. The corpus annotations contain the entities denoting
diseases (Disease) and chemical preparations (Chemical), and the relations between these entities. The corpus is
divided into three subsets: training, test, and development. In this work, the training and development subsets are
combined into one common train subset; the model is evaluated on a test subset.

4.3 Generation of negative examples for training
    Manual annotations in both corpora contain only positive examples, denoting related entities. It is necessary to
generate negative examples to train models for binary classification. For each entity, we obtained a set of candidate
entities following the rules from [16]: the number of characters between the entities is smaller than 1000, and the
number of other entities that may participate in relations and locate between the candidate entities is not more than
3. These restrictions allow to reduce infrequent negative pairs and mitigate the imbalanced class issues, while more
than 97% of the positive pairs remain in the MADE dataset, and 100% remain in CDR corpus.

5 Experiments and Results
    We applied word vectors trained on the texts of PubMed and PMC resource articles and Wikipedia texts [29]
for the average vector of context representations. The length of the vectors is 200. The vocabulary coverage is 93%


                                                           4
for CDR and 89% for the MADE corpus. For sentence, embeddings were obtained from the BioSentVec model,
pre-trained on the text corpus consisting of articles from the PubMed resource and electronic patient cards of the
MIMIC-III base [7]. The model is trained on bigrams, with a window size of 20 words, the length of the resulting
vectors is 700.
    We utilized freezed weights from the last layer of BioBERT model [23]. BioBERT * was initialized with
General-domain BERT and in addition pre-trained on PubMed abstracts (PubMed) and PubMed Central full-text
articles (PMC) (version: BioBERT v1.0 (+ PubMed 200K + PMC 270K)).
    Following [21], we trained convolutional neural network with the following parameters: the number of layers
is 3, the size of the layer filters are 5, 4, 3, the number of epochs is 10, the batch size is 32, the weights for classes
is 0.7 for related entities, and 0.3 for unrelated entities. A recurrent neural network was trained with the following
parameters: the number of hidden states is 200, the dropout is 0.2, the number of epochs is 20, the size of the input
data block is 64, the weight for the classes is 0.75 for entities that have a connection and 0.25 for unrelated entities.
All implementation is based on Keras and TensorFlow libraries [1,9].
    We employed a support vector machine (SVM) as a classifier. The classifier takes as an input various context
representationssequentially. Theclassifierwasevaluatedwithstandardmetrics: precision(P),recall(R),F-measure (F).
The results are presented in Table 2.


                                  Table 2: Results of SVM with different context presentations.
                                                         MADE               CDR
                                             Method          P       R       F   P   R   F
                                             bow            .878 .573 .693 .395 .341 .367
                                             word2vec .760 .800 .779 .557 .312 .400
                                             sent2vec       .894 .873 .883 .437 .376 .405
                                             CNN            .725 .825 .772 .446 .334 .382
                                             RNN            .482 .404 .440 .297 .516 .377
                                     BERT        .929 .882 .905 .473 .385 .424
    According to the results, all models outperformed the baseline results of the bag of words model, which obtained
69.3% and 36.7% F-measures on MADE and CDR corpora, respectively. The best method of context representation
is BERT for both corpora. This model achieved 90.5% and 42.4% of F-measures on MADE and CDR corpora,
respectively. The averaged sent2vec method performed the second results. For the CDR corpus, the difference
between sent2vec and BERT models is 1.9, while on the MADE corpus, the difference is 2.2%, which is more
significant. CNN outperformed the RNN on MADE and CDR corpora on 33.2%, while the result for CDR corpus
state on par. The highest results in terms of precision and recall for CDR corpus was achieved by averaged word
embeddings method (55.7% of precision) and recurrent neural network (51.6% of recall). On the MADE corpus,
the highest results of precision and recall were achieved with the BERT model (92.9% and 88.2%, respectively).
    The results show that the F-measure on the MADE corpus is higher than on the CDR corpus in common. Such
a difference in results could be due to the MADE corpus has significantly more examples of relations, which allows
the classifier to learn the parameters better and make a better classification.


   *
       This model is available at https://github.com/naver/biobert-pretrained.


                                                                         5
6 Conclusion
    In this paper, we have investigated several methods for representing the context in the task of extracting
relations between biomedical entities. The study aims to identify the most effective methods of context
representation. The experiment results showed that the BERT model performed the highest results. In the future,
we plan to evaluate models considered in the article for the protein-protein relation extraction task.

Acknowledgments This research was supported by the Russian Foundation for Basic Research grant no.
190701115.

References
[1] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay
    Ghemawat, GeoffreyIrving, MichaelIsard, etal., Tensorflow: Asystemforlarge-scalemachinelearning, 12th
    {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), 2016, pp. 265–283.

[2] Syed Toufeeq Ahmed, Radhika Nair, Chintan Patel, and Hasan Davulcu, Bioeve: bio-molecular event
    extraction from text using semantic classification and dependency parsing, Proceedings of the Workshop on
    Current Trends in Biomedical Natural Language Processing: Shared Task, Association for Computational
    Linguistics, 2009, pp. 99–102.

[3] Antti Airola, Sampo Pyysalo, Jari Björne, Tapio Pahikkala, Filip Ginter, and Tapio Salakoski, All-paths graph
    kernel for protein-protein interaction extraction with evaluation of cross-corpus learning, BMC
    bioinformatics 9 (2008), no. 11, S2.

[4] Omar Alonso, Jannik Strötgen, Ricardo A Baeza-Yates, and Michael Gertz, Temporal information retrieval:
    Challenges and opportunities., Twaw 11 (2011), 1–8.

[5] William A Baumgartner, K Bretonnel Cohen, and Lawrence Hunter, An open-source framework for
    largescale, flexible evaluation of biomedical text mining systems, Journal of biomedical discovery and
    collaboration 3 (2008), no. 1, 1.

[6] Christian Blaschke, Miguel A Andrade, Christos A Ouzounis, and Alfonso Valencia, Automatic extraction of
    biological information from scientific text: protein-protein interactions., Ismb, vol. 7, 1999, pp. 60–67.

[7] Qingyu Chen, Yifan Peng, and Zhiyong Lu, Biosentvec: creating sentence embeddings for biomedical texts,
    The 7th IEEE International Conference on Healthcare Informatics (2019).

[8] Yong Suk Choi, Tree pattern expression for extracting information from syntactically parsed text corpora,
    Data Mining and Knowledge Discovery 22 (2011), no. 1-2, 211–231.

[9] François Chollet et al., Keras: The python deep learning library, Astrophysics Source Code Library (2018).

[10] Bharath Dandala, Venkata Joopudi, and Murthy Devarakonda, Adverse drug events detection in clinical notes
     by jointly modeling entities and relations using neural networks, Drug safety 42 (2019), no. 1, 135–146.

[11] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, Bert: Pre-training of deep
     bidirectional transformers for language understanding, Proceedings of the 2019 Conference of the North
     American Chapter of the Association for Computational Linguistics: Human Language Technologies,
     Volume 1 (Long and Short Papers), 2019, pp. 4171–4186.


                                                       6
[12] Andrew D Fox, William A Baumgartner, Helen L Johnson, Lawrence E Hunter, and Donna K Slonim, Mining
     protein-protein interactions from generifs with opendmap, Linking Literature, Information, and Knowledge
     for Biology, Springer, 2010, pp. 43–52.

[13] Zellig S Harris, Distributional structure, Word 10 (1954), no. 2-3, 146–162.

[14] S.HochreiterandJ.Schmidhuber, LongShort-TermMemory, NeuralComputation9(1997), no.8, 1735–1780,
     Based on TR FKI-207-95, TUM (1995).

[15] Minlie Huang, Xiaoyan Zhu, and Ming Li, A hybrid method for relation extraction from biomedical literature,
     International journal of medical informatics 75 (2006), no. 6, 443–455.

[16] Abhyuday Jagannatha, Feifan Liu, Weisong Liu, and Hong Yu, Overview of the first natural language
     processing challenge for extracting medication, indication, and adverse drug events from electronic health
     record notes (made 1.0), Drug safety (2018), 1–13.

[17] Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom, A convolutional neural network for modelling
     sentences, arXiv preprint arXiv:1404.2188 (2014).

[18] Halil Kilicoglu and Sabine Bergler, Adapting a general semantic interpretation approach to biological event
     extraction, Proceedings of the BioNLP Shared Task 2011 Workshop, Association for Computational
     Linguistics, 2011, pp. 173–182.

[19] Mi-Young Kim, Detection of gene interactions based on syntactic relations, BioMed Research International
     2008 (2008).

[20] Sun Kim, Soo-Yong Shin, In-Hee Lee, Soo-Jin Kim, Ram Sriram, and Byoung-Tak Zhang, Pie: an online
     prediction system for protein–protein interactions from text, Nucleic acids research 36 (2008), no. suppl_2,
     W411–W415.

[21] Yoon Kim, Convolutional neural networks for sentence classification, Proceedings of the 2014 Conference
     on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1746–1751.

[22] Yann LeCun, Yoshua Bengio, et al., Convolutional networks for images, speech, and time series, The
     handbook of brain theory and neural networks 3361 (1995), no. 10, 1995.

[23] Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang,
     Biobert: pre-trained biomedical language representation model for biomedical text mining, arXiv preprint
     arXiv:1901.08746 (2019).

[24] Florian Leitner, Scott A Mardis, Martin Krallinger, Gianni Cesareni, Lynette A Hirschman, and Alfonso
     Valencia, An overview of biocreative ii. 5, IEEE/ACM Transactions on Computational Biology and
     Bioinformatics (TCBB) 7 (2010), no. 3, 385–399.

[25] Xinbo Lv, Yi Guan, Jinfeng Yang, and Jiawei Wu, Clinical relation extraction with deep learning,
     International Journal of Hybrid Information Technology 9 (2016), no. 7, 237–248.

[26] Larry Medsker and Lakhmi C Jain, Recurrent neural networks: design and applications, CRC press, 1999.

[27] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean, Distributed representations of
     words and phrases and their compositionality, Advances in neural information processing systems, 2013, pp.
     3111–3119.


                                                       7
[28] Makoto Miwa, Rune Sætre, Yusuke Miyao, and Jun’ichi Tsujii, A rich feature vector for protein-protein
     interaction extraction from multiple corpora, Proceedings of the 2009 Conference on Empirical Methods in
     Natural Language Processing: Volume 1-Volume 1, Association for Computational Linguistics, 2009, pp.
     121–130.

[29] SPFGH Moen and Tapio Salakoski2 Sophia Ananiadou, Distributional semantics resources for biomedical
     text processing, Proceedings of LBM (2013), 39–44.

[30] Tsendsuren Munkhdalai, Feifan Liu, and Hong Yu, Clinical relation extraction toward drug safety
     surveillance using electronic health record narratives: classical learning versus deep learning, JMIR public
     health and surveillance 4 (2018), no. 2, e29.

[31] Yun Niu, David Otasek, and Igor Jurisica, Evaluation of linguistic features useful in extraction of interactions
     from pubmed; application to annotating known, high-throughput and predicted interactions in i2d,
     Bioinformatics 26 (2009), no. 1, 111–119.

[32] Stanley Chika ONYE, Arif AKKELEŞ, and Nazife DIMILILER, Review of biomedical relation extraction.

[33] Matteo Pagliardini, Prakhar Gupta, and Martin Jaggi, Unsupervised learning of sentence embeddings using
     compositional n-gram features, arXiv preprint arXiv:1703.02507 (2017).

[34] Deepak Ravichandran and Eduard Hovy, Learning surface text patterns for a question answering system,
     Proceedings of the 40th annual meeting on association for computational linguistics, Association for
     Computational Linguistics, 2002, pp. 41–47.

[35] Dietrich Rebholz-Schuhmann, Antonio Jimeno-Yepes, Miguel Arregui, and Harald Kirsch, Measuring
     predictioncapacityofindividualverbsfortheidentificationofproteininteractions,
     Journalofbiomedicalinformatics 43 (2010), no. 2, 200–207.

[36] Rune Sætre, Kenji Sagae, and Jun’ichi Tsujii, Syntactic features for protein-protein interaction extraction.,
     LBM (Short Papers) 319 (2007).

[37] Sunil Kumar Sahu, Ashish Anand, Krishnadev Oruganty, and Mahanandeeshwar Gattu, Relation extraction
     from clinical texts using domain invariant convolutional neural network, arXiv preprint arXiv:1606.09370
     (2016).

[38] Isabel Segura-Bedmar, Paloma Martínez, and César de Pablo-Sánchez, A linguistic rule-based approach to
     extract drug-drug interactions from pharmacological documents, BMC bioinformatics, vol. 12, BioMed
     Central, 2011, p. S1.

[39] Sofie Van Landeghem, Yvan Saeys, Bernard De Baets, and Yves Van de Peer, Extracting protein-protein
     interactions from text using rich feature vectors and feature selection, 3rd International symposium on
     Semantic Mining in Biomedicine (SMBM 2008), Turku Centre for Computer Sciences (TUCS), 2008, pp.
     77–84.

[40] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser,
     and Illia Polosukhin, Attention is all you need, Advances in Neural Information Processing Systems, 2017,
     pp. 5998–6008.

[41] Dmitriy Yur’yevich Vlasov, Dmitriy Yevgen’yevich Pal’chunov, and Pavel Andreyevich Stepanov,
     Avtomatizatsiya izvlecheniya otnosheniy mezhdu ponyatiyami iz tekstov yestestvennogo yazyka, Vestnik
     Novosibirskogo gosudarstvennogo universiteta. Seriya: Informatsionnyye tekhnologii 8 (2010), no. 3.


                                                         8
[42] Chih-Hsuan Wei, Yifan Peng, Robert Leaman, Allan Peter Davis, Carolyn J Mattingly, Jiao Li, Thomas C
     Wiegers, and Zhiyong Lu, Overview of the biocreative v chemical disease relation (cdr) task, Proceedings of
     the fifth BioCreative challenge evaluation workshop, vol. 14, 2015.


                                                       9