=Paper= {{Paper |id=None |storemode=property |title=The 1st DDIExtraction-2011 Challenge Task: Extraction of Drug-Drug Interactions from Biomedical Texts |pdfUrl=https://ceur-ws.org/Vol-761/paper0.pdf |volume=Vol-761 }} ==The 1st DDIExtraction-2011 Challenge Task: Extraction of Drug-Drug Interactions from Biomedical Texts== https://ceur-ws.org/Vol-761/paper0.pdf

The 1st DDIExtraction-2011 challenge task:
Extraction of Drug-Drug Interactions from
biomedical texts

Isabel Segura-Bedmar, Paloma Martı́nez, and Daniel Sánchez-Cisneros

Universidad Carlos III de Madrid, Computer Science Department,
Avd. Universiad, 30, 28911 Leganés, Madrid, Spain
{isegura,pmf,dscisner}@springer.com
http://labda.inf.uc3m.es/

Abstract. We present an evaluation task designed to provide a frame-
work for comparing diﬀerent approaches to extracting drug-drug interac-
tions from biomedical texts. We deﬁne the task, describe the training/test
data, list the participating systems and discuss their results. There were
10 teams who submitted a total of 40 runs.

Keywords: Biomedical Text Mining, Drug-Drug Interaction Extraction

1 Task Description and Related Work
A drug-drug interaction (DDI) occurs when one drug inﬂuences the level or ac-
tivity of another drug. Since negative DDIs can be very dangerous, DDI detection
is the subject of an important ﬁeld of research that is crucial for both patient
safety and health care cost control. Although health care professionals are sup-
ported in DDI detection by diﬀerent databases, those being used currently are
rarely complete, since their update periods can be as long as three years [12].
Drug interactions are frequently reported in journals of clinical pharmacology
and technical reports, making medical literature the most eﬀective source for the
detection of DDIs. The management of DDIs is a critical issue, therefore, due to
the overwhelming amount of information available [8].
Information extraction (IE) can be of great beneﬁt for both the pharma-
ceutical industry by facilitating the identiﬁcation and extraction of relevant in-
formation on DDIs, as well as health care professionals by reducing the time
spent reviewing the relevant literature. Moreover, the development of tools for
automatically extracting DDIs is essential for improving and updating the drug
knowledge databases.
Diﬀerent systems have been developed for the extraction of biomedical rela-
tions, particularly PPIs, from texts. Nevertheless, few approaches have been pro-
posed to the problem of extracting DDIs in biomedical texts. We developed two
diﬀerent approaches for DDI extraction. Since no benchmark corpus was avail-
able to evaluate our approaches to DDI extraction, we created the DrugDDI
corpus annotated with 3,160 DDIs. Our ﬁrst approach is a hybrid linguistic

3URFHHGLQJVRIWKHVW&KDOOHQJHWDVNRQ'UXJ'UXJ,QWHUDFWLRQ([WUDFWLRQ '',([WUDFWLRQ SDJHV±
+XHOYD6SDLQ6HSWHPEHU
,VDEHO6HJXUD%HGPDU3DORPD0DUWLQH]DQG'DQLHO6DQFKH]&LVQHURV

approach [13] that combines shallow parsing and syntactic simpliﬁcation with
pattern matching. This system yielded a precision of 48.69%, a recall of 25.70%
and an F-measure of 33.64%. Our second approach [14] is based on a supervised
machine learning technique, more speciﬁcally, the shallow linguistic kernel pro-
posed in Giuliano et al. (2006) [7]. It achieved a precision of 51.03%, a recall of
72.82% and an F-measure of 60.01%.
In order to stimulate research in this direction, we have organized the chal-
lenge task DDIExtraction2011. Likewise the BioCreAtIvE (Critical Assessment
of Information Extraction systems in Biology) challenge evaluation has devoted
to provide a common frameworks for evaluation of text mining driving progress
in text mining techniques applied to the biological domain, our purpose is to
create a benchmark dataset and evaluation task that will enable researchers to
compare their algorithms applied to the extraction of drug-drug interactions.

2 The DrugDDI corpus
While Natural Language Processing(NLP) techniques are relatively domain-
portable, corpora are not. For this reason, we created the ﬁrst annotated corpus,
the DrugDDI corpus, studying the phenomenon of interactions among drugs.
We hope that the corpus serves to encourage the NLP community to conduct
further research in the ﬁeld of pharmacology.
As source of unstructured textual information on drugs and their interactions,
we used the DrugBank database[17]. This database is a rich resource combining
chemical and pharmaceutical information of approximately 4,900 pharmacolog-
ical substances. For each drug, DrugBank contains more than 100 data ﬁelds
including drug synonyms, brand names, chemical formula and structure, drug
categories, ATC and AHFS codes (i.e., codes of standard drug families), mech-
anism of action, indication, dosage forms, toxicity, etc. Of particular interest to
this study, DrugBank oﬀers the ﬁeld ’Interactions’ (it is no longer available) that
contained a link to a document describing DDIs in unstructured texts. DrugBank
provides a ﬁle with the names of approved drugs1 , approximately 1,450. We ran-
domly chose 1,000 drug names and used the RobotMaker2 , a screen-scrapper
application, to download the interaction documents for these drugs. We only
retrieved a total of 930 documents since some drugs did not have any linked
document. Due to the cost-intensive and time consuming nature of the annota-
tion process, we decided to reduce the number of documents to be annotated
and only considered 579 documents. We believe that these texts are a reliable
and representative source of data for expressing DDI since the language used
is mostly devoted to descriptions of DDIs. Additionally, the highly specialized
pharmacological language is very similar to that found in the Medline pharma-
cology abstracts.
These documents were then analyzed by the UMLS MetaMap Transfer
(MMTx) [2] tool performing sentence splitting, tokenization, POS-tagging, shal-
1
http://www.drugbank.ca/downloads
2
http://openkapow.com/

7KHVW'',([WUDFWLRQ FKDOOHQJHWDVN

low syntactic parsing (see Figure 1) and linking of phrases with UMLS Metathe-
saurus concepts. Drugs are automatically identiﬁed by MMTx since the tool al-
lows for the recognition and annotation of biomedical entities occurring in texts
according to the UMLS semantic types. An experienced pharmacist reviewed the
UMLS Semantic Network as well as the semantic annotation provided by MMTx
and recommended us the inclusion of the following UMLS semantic types as
possible types of interacting drugs: Clinical Drug (clnd), Pharmacological Sub-
stance (phsu), Antibiotic (antb), Biologically Active Substance (bacs), Chemical
Viewed Structurally (chvs) and Amino Acid, Peptide, or Protein (aapp).
The principal value of the DrugDDI corpus undoubtedly comes from its DDIs
annotations. To obtain these annotations, all documents were marked-up by a
researcher with pharmaceutical background. DDIs were annotated at the sen-
tence level and, thus, any interactions spanning over several sentences were not
annotated here. Only sentences with two or more drugs were considered and the
annotation was made sentence by sentence. Figure 1 shows an example of an
annotated sentence that contains three interactions. Each interaction is repre-
sented as a DDI node in which the names of the interacting drugs are registered
in its NAME DRUG 1 and NAME DRUG 2 attributes. The identiﬁers of the
phrases containing these interacting drugs are also annotated, providing an eas-
ily access to the related concepts provided by MMTx. As mentioned, Figure 1
shows three DDIs: the ﬁrst DDI represents an interaction between Aspirin and
probenecid, the second one an interaction between aspirin and sulﬁnpyrazone,
and the last one a DDI between aspirin and phenylbutazone.

Fig. 1. Example of DDI annotations.

The DrugDDI corpus is also provided in the uniﬁed format for PPI corpora
proposed in Pyysalo et al. [11] (see Figure 2). This shared format could attract
attention of groups studying PPI extraction because they could easily adapt their
systems to the problem of DDI extraction. The uniﬁed XML format does not
contain any linguistic information provided by MMTx. The uniﬁed format only

,VDEHO6HJXUD%HGPDU3DORPD0DUWLQH]DQG'DQLHO6DQFKH]&LVQHURV

Fig. 2. The uniﬁed XML format.

Table 1. Basic statistics on the DrugDDI corpus.

Number Avg. per document
Documents 579
Sentences 5,806 10.03
Phrases 66,021 114.02
Tokens 127,653 220.47
Sentences with at least one DDI 2,044 3.53
Sentences with no DDI 3,762 6.50
DDIs 3,160 5.46 (0.54 per sentence)

provides the sentences, their drugs and their interactions. Each entity (drug)
includes reference (origId) to its id phrase in the MMTX format corpus text
in which the corresponding drug appears. For each sentence from the DrugDDI
corpus represented in the uniﬁed XML format, its DDI candidate pairs should be
generated from the diﬀerent drugs appearing therein. Each DDI candidate pair is
represented as a pair node in which the ids of the interacting drugs are registered
in its e1 and e2 attributes. If the pair is a DDI, the interaction attribute must
be set to true, and false value otherwise.
Table 1 shows basic statistics of the DrugDDI corpus. In general, the size of
biomedical corpora is quite small and usually does not exceed 1,000 sentences.
The average number of sentences per MedLine abstract was estimated at 7.2 ±
1.9 [18]. Our corpus contains 5,806 sentences with 10.3 sentences per document
on average. MMTx identiﬁed a total of 66,021 phrases of which 12.5% (8,260)
are drugs. The average number of drug mentions per document was 24.9, and
the average number of drug mentions per sentence was 2.4. The corpus contains
a total of 3,775 sentences with two or more drug mentions, although only 2,044
sentences contain at least one interaction. With the assistance of a pharmacist,
a total of 3,160 DDIs were with an average of 5.46 DDIs per document and 0.54
per sentence.
DDI extraction can be formulated as a supervised learning problem, more
particularly, as a drug pair classiﬁcation task. Therefore, a crucial step is to

7KHVW'',([WUDFWLRQ FKDOOHQJHWDVN

generate suitable datasets to train and test a classiﬁer from the DrugDDI corpus.
The simplest way to generate examples to train a classiﬁer for a speciﬁc relation
R is to enumerate all possible ordered pairs of sentence entities. We proceeded in
a similar way. Given a sentence S with at least two drugs, we deﬁned D as the set
of drugs in S and N as the number of drugs. The set of examples generated for
S, therefore, was deﬁned as follows: {(Di , Dj ) : Di , Dj D, 1 <= i, j <= N, i =
j, i < j}. If the interaction existed between the two DDI candidate drugs, then
the example was labeled 1. Otherwise, it was labeled 0. Although some DDIs
may be asymmetrical, the roles of the interacting drugs were not included in the
corpus annotation and are not speciﬁcally addressed in this task. As a result,
we enumerate candidate pairs here without taking their order into account, such
that (Di , Dj ) and (Dj , Di ) are considered as a single candidate pair. Since the
order of the drugs in the sentence was not taken into account, each example
is the copy of the original sentence S where the candidates were assigned the
tag, ’DRUG’, and remaining drugs were assigned the tag, ’OTHER’. The set of
possible candidate pairs was the set of 2−combinations from the whole set of
drugs appearing in S. Thus, the number of examples was CN,2 = N2 .
Table 2 shows the total number of relation examples or instances generated
from the DrugDDI corpus. Among the 30,757 candidate drug pairs, only 3,160
(10.27%) were marked as positive interactions (i.e., DDIs) while 27,597 (89.73%)
were marked as negative interactions (i.e., non-DDIs).

Table 2. Distribution of positive and negative examples in training and testing
datasets.

Set Documents Examples Positives Negatives
Train 437 (75.5%) 25,209 2,421 (9.6%) 22,788 (90.4%)
Final Test 142 (24.5%) 5,548 739 (13.3%) 4,809 (86.7%)
Total 579 30,757 3,160 (10.27%) 27,597 (89.73%)

Once we generated the set of relation instances from the DrugDDI corpus, the
set was then split in order to build the datasets for the training and evaluation
of the diﬀerent DDI extraction systems. In order to build the training dataset
used for development tests, 75% of the DrugDDI corpus ﬁles (435 ﬁles) were
randomly selected for the training dataset and the remaining 25% (144 ﬁles)
is used in the ﬁnal evaluation to determine which model was superior. Table 3
shows the distribution of the documents, sentences, drugs and DDIs in each set.
Approximately 90% of the instances in the training dataset were negative exam-
ples (i.e., non-DDIs). The distribution between positive and negative examples
in the ﬁnal test dataset was also quite similar (see Table 2).

3 The participants
The task of extracting drug-drug interactions from biomedical texts has attracted
the participation of 10 teams who submitted 40 runs. Table 4 lists the teams,

,VDEHO6HJXUD%HGPDU3DORPD0DUWLQH]DQG'DQLHO6DQFKH]&LVQHURV

Table 3. Training and testing datasets.

Set Documents Sentences Drugs DDIs
Training 435 4,267 11,260 2,402
Final Test 144 1,539 3,689 758
Total 579 5,806 14,949 3,160

their aﬃliations, the number of runs submitted and the description of their
systems.

The runs’ performance information in terms of precision, recall, F-measure
and accuracy, appears in Table 5.

Table 4. Short description of the teams.

Team Institution Runs Description
WBI Humboldt-Universitat 5 combination of several kernels and a
Berlin case-based reasoning (CBR) system
using a voting approach
FBK-HLT Fondazione Bruno Kessler - 5 composite kernels using the MEDT,
HLT PST and SL kernels
LIMSI-FBK LIMSI - Fondazione Bruno 1 a feature-based method using
Kessler SVM and a composite kernel-based
method.
UTurku University of Turku 4 machine learning classiﬁers such
as SVM and RLS; DrugBank and
MetaMap
LIMSI-CNRS LIMSI-CNRS 5 a feature-based method using lib-
SVM and SVMPerf
bnb nlel Universidad Politécnica de 1 a feature-based method using Ran-
Valencia dom Forests
laberinto-uhu Universidad de Huelva 5 a feature-based method using clas-
sical classiﬁers such as SVM, Nave
Bayes, Decision Trees, Adaboost
DrIF University of Pavia (Depart- 4 two machine learning-based (CFFs
ment Mario Stefanelli) and SVMs) and one hybrid ap-
proach which combines CRFs and
a rule-based technique.
ENCU East China Normal Univer- 5 a feature-based method using SVM.
sity
IUPUITMGroup Indiana University-Purdue 5 all paths graph (APG) kernel
University Indianapolis

7KHVW'',([WUDFWLRQ FKDOOHQJHWDVN

Table 5. Precision, recall, F-measure and accuracy over each run’s performance.

Team run TP FP FN TN P R F Acc
WBI 5 543 354 212 5917 0.6054 0.7192 0.6574 0.9194
WBI 4 529 332 226 5939 0.6144 0.7007 0.6547 0.9206
WBI 2 568 465 187 5806 0.5499 0.7523 0.6353 0.9072
WBI 1 575 585 180 5686 0.4957 0.7616 0.6005 0.8911
WBI 3 319 362 436 5909 0.4684 0.4225 0.4443 0.8864
LIMSI-FBK 1 532 376 223 5895 0.5859 0.7046 0.6398 0.9147
FBK-HLT 4 529 377 226 5894 0.5839 0.7007 0.6370 0.9142
FBK-HLT 1 513 344 242 5927 0.5986 0.6795 0.6365 0.9166
FBK-HLT 2 560 458 195 5813 0.5501 0.7417 0.6317 0.9071
FBK-HLT 3 534 423 221 5848 0.5580 0.7073 0.6238 0.9083
FBK-HLT 5 544 674 211 5597 0.4466 0.7205 0.5514 0.8740
Uturku 3 520 376 235 5895 0.5804 0.6887 0.6299 0.9130
Uturku 4 370 179 385 6092 0.6740 0.4901 0.5675 0.9197
Uturku 2 368 197 387 6074 0.6513 0.4874 0.5576 0.9169
Uturku 1 350 172 405 6099 0.6705 0.4636 0.5482 0.9179
LIMSI-CNRS 1 490 398 265 5873 0.5518 0.6490 0.5965 0.9056
LIMSI-CNRS 2 491 402 264 5869 0.5498 0.6503 0.5959 0.9052
LIMSI-CNRS 4 462 380 293 5891 0.5487 0.6119 0.5786 0.9042
LIMSI-CNRS 5 373 264 382 6007 0.5856 0.4940 0.5359 0.9081
LIMSI-CNRS 3 388 470 367 5801 0.4522 0.5139 0.4811 0.8809
BNBNLEL 1 420 266 335 6005 0.6122 0.5563 0.5829 0.9145
laberinto-uhu 1 335 335 420 5936 0.5000 0.4437 0.4702 0.8925
laberinto-uhu 2 324 371 431 5900 0.4662 0.4291 0.4469 0.8859
laberinto-uhu 3 368 551 387 5720 0.4004 0.4874 0.4397 0.8665
laberinto-uhu 4 238 153 517 6118 0.6087 0.3152 0.4154 0.9046
laberinto-uhu 5 193 107 562 6164 0.6433 0.2556 0.3659 0.9048
DrIF 1 369 545 386 5725 0.4037 0.4887 0.4422 0.8675
DrIF 4 369 545 386 5726 0.4037 0.4887 0.4422 0.8675
DrIF 3 317 456 438 5815 0.4101 0.4199 0.4149 0.8728
DrIF 2 196 110 559 6161 0.6405 0.2596 0.3695 0.9048
ENCU 5 351 836 404 5435 0.2957 0.4649 0.3615 0.8235
ENCU 3 324 830 431 5441 0.2808 0.4291 0.3394 0.8205
ENCU 1 580 3456 175 2815 0.1437 0.7682 0.2421 0.4832
ENCU 2 713 4781 42 1490 0.1298 0.9444 0.2282 0.3135
ENCU 4 206 424 549 5847 0.3270 0.2728 0.2975 0.8615
IUPUITMGroup 4 193 1457 562 4814 0.1170 0.2556 0.1605 0.7126
IUPUITMGroup 1 237 2005 518 4266 0.1057 0.3139 0.1582 0.6409
IUPUITMGroup 2 127 943 628 5328 0.1187 0.1682 0.1392 0.7764
IUPUITMGroup 3 125 937 630 5334 0.1177 0.1656 0.1376 0.7770
IUPUITMGroup 5 110 770 645 5501 0.1250 0.1457 0.1346 0.7986

4 Discussion
The best performance is achieved by the team WBI [15]. Its system combines
several kernels (APG [1], SL [7], kBSPS [16]) and a case-based reasoning (CBR)
(called MOARA [10]) using a voting approach. In particular, the combination

,VDEHO6HJXUD%HGPDU3DORPD0DUWLQH]DQG'DQLHO6DQFKH]&LVQHURV

of the kernels APG, SL and the MOARA system yields the best F-measure
(0.6574).
The team FBK-HLT [5] proposes new composite kernels using well-known
kernels such as MEDT [6], PST [9] and SL [7]. Similarly, the team LIMSI-FBK [4]
combines the same kernels (MEDT, PST and SL) and a feature-based method
using SVM. This system achieves an F-measure of 0.6398.
The team Uturku [3] proposes a feature-based method using the classiﬁers
SVM and RLS. Features used by the classiﬁers include syntactic information
(tokens, dependency types, POS tags, text, stems, etc) and semantic knowledge
from DrugBank and MetaMap. This system achieves an F-measure of 0.6299.
In general, approaches based on kernels methods achieved better results than
the classical feature-based methods. Most systems have used primarily syntactic
information, however semantic information has been poorly used.

5 Conclusion

This paper describes a new semantic evaluation task, Extraction of drug-drug
interactions from biomedical texts. We have accomplished our goal of providing
a framework and a benchmark data set to allow for comparisons of methods
for this task. The results that the participating systems have reported show
successful approaches to this diﬃcult task, and the advantages of kernel-based
methods over classical machine learning classiﬁers.
The success of the task shows that the framework and the data are useful
resources. By making this collection freely accessible, we encourage further re-
search into this domain. Moreover, next SemEval-3 (6th International Workshop
on Semantic Evaluations3 ) to be held in summer 2013 has scheduled the ”Ex-
traction of drug-drug interactions from biomedical Texts” task 4 . In order to
accomplish this new task, the current corpus is being extended to collect new
data test.

Acknowledgements

This study was funded by the projects MA2VICMR (S2009/TIC-1542) and
MULTIMEDICA (TIN2010-20644-C03-01). The organizers are particularly grate-
ful to all participants who contributed to detect annotation errors in the corpus.

References

1. Airola, A., Pyysalo, S., Bjorne, J., Pahikkala, T., Ginter, F., Salakoski, T.: All-
paths graph kernel for protein-protein interaction extraction with evaluation of
cross-corpus learning. BMC bioinformatics 9(Suppl 11), S2 (2008)
3
http://www.cs.york.ac.uk/semeval/
4
http://www.cs.york.ac.uk/semeval/proposal-16.html

7KHVW'',([WUDFWLRQ FKDOOHQJHWDVN

2. Aronson, A.R.: Eﬀective mapping of biomedical text to the UMLS Metathesaurus:
the MetaMap program. Annual AMIA Symposium pp. 17–21 (Jan 2001)
3. Björne, J., Airola, A., Pahikkala, T., Salakoski, T.: Drug-drug interaction extrac-
tion with rls and svm classiﬀers. In: Proceedings of the First Challenge task on
Drug-Drug Interaction Extraction (DDIExtraction 2011) (2011)
4. Chowdhury, M., Abacha, A., Lavelli, A., P., Z.: Two diﬀerent machine learning
techniques for drug-drug interaction extraction. In: Proceedings of the First Chal-
lenge task on Drug-Drug Interaction Extraction (DDIExtraction 2011) (2011)
5. Chowdhury, M., Lavelli, A.: Drug-drug interaction extraction using composite ker-
nels. In: Proceedings of the First Challenge task on Drug-Drug Interaction Extrac-
tion (DDIExtraction 2011) (2011)
6. Chowdhury, M., Lavelli, A., Moschitti, A.: A study on dependency tree kernels for
automatic extraction of protein-protein interaction. ACL HLT 2011 p. 124
7. Giuliano, C., Lavelli, A., Romano, L.: Exploiting shallow linguistic information for
relation extraction from biomedical literature. In: Proceedings of the Eleventh Con-
ference of the European Chapter of the Association for Computational Linguistics
(EACL-2006). pp. 401–408 (2006)
8. Hansten, P.D.: Drug interaction management. Pharmacy World & Science 25(3),
94–97 (2003)
9. Moschitti, A.: A study on convolution kernels for shallow semantic parsing. In:
Proceedings of the 42nd Annual Meeting on Association for Computational Lin-
guistics. pp. 335–es. Association for Computational Linguistics (2004)
10. Neves, M., Carazo, J., Pascual-Montano, A.: Extraction of biomedical events using
case-based reasoning. In: Proceedings of the Workshop on BioNLP: Shared Task.
pp. 68–76. Association for Computational Linguistics (2009)
11. Pyysalo, S., Airola, A., Heimonen, J., Bjorne, J., Ginter, F., Salakoski, T.: Com-
parative analysis of ﬁve protein-protein interaction corpora. BMC bioinformatics
9(Suppl 3), S6 (2008)
12. Rodrı́guez-Terol, A., Camacho, C., Others: Calidad estructural de las bases de
datos de interacciones. Farmacia Hospitalaria 33(03), 134 (2009)
13. Segura-Bedmar, I., Martı́nez, P., de Pablo-Sánchez, C.: A linguistic rule-based
approach to extract drug-drug interactions from pharmacological documents. BMC
Bioinformatics 12(Suppl 2), S1 (2011)
14. Segura-Bedmar, I., Martı́nez, P., de Pablo-Sánchez, C.: Using a shallow linguistic
kernel for drug-drug interaction extraction. Journal of Biomedical Informatics In
Press, Corrected Proof (2011)
15. Thomas, P., Neves, M., Solt, I., Tikk, D., Leser, U.: Relation extraction for drug-
drug interactions using ensemble learning. In: Proceedings of the First Challenge
task on Drug-Drug Interaction Extraction (DDIExtraction 2011) (2011)
16. Tikk, D., Thomas, P., Palaga, P., Hakenberg, J., Leser, U.: A comprehensive bench-
mark of kernel methods to extract protein–protein interactions from literature.
PLoS Computational Biology 6(7), e1000837 (2010)
17. Wishart, D.S., Knox, C., Guo, A.C., Cheng, D., Shrivastava, S., Tzur, D., Gautam,
B., Hassanali, M.: DrugBank: a knowledgebase for drugs, drug actions and drug
targets. Nucleic acids research 36(Database issue), D901–6 (Jan 2008)
18. Yu, H.: Towards answering biological questions with experimental evidence: au-
tomatically identifying text that summarize image content in full-text articles.
Annual AMIA Symposium proceedings pp. 834–8 (Jan 2006)