<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Machine Learning Approach to Extract Drug -Drug Interactions in an Unbalanced Dataset</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jacinto</forename><surname>Mata</surname></persName>
							<email>jacinto.mata@dti.uhu.es</email>
							<affiliation key="aff0">
								<orgName type="department">Escuela Técnica Superior de Ingeniería</orgName>
								<orgName type="institution">Universidad de Huelva Ctra. Huelva</orgName>
								<address>
									<addrLine>Palos de la Frontera s/n</addrLine>
									<postCode>21819</postCode>
									<settlement>La Rábida (Huelva)</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ramón</forename><surname>Santano</surname></persName>
							<email>ramon.santano@alu.uhu.es</email>
							<affiliation key="aff0">
								<orgName type="department">Escuela Técnica Superior de Ingeniería</orgName>
								<orgName type="institution">Universidad de Huelva Ctra. Huelva</orgName>
								<address>
									<addrLine>Palos de la Frontera s/n</addrLine>
									<postCode>21819</postCode>
									<settlement>La Rábida (Huelva)</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Daniel</forename><surname>Blanco</surname></persName>
							<email>daniel.blanco@alu.uhu.es</email>
							<affiliation key="aff0">
								<orgName type="department">Escuela Técnica Superior de Ingeniería</orgName>
								<orgName type="institution">Universidad de Huelva Ctra. Huelva</orgName>
								<address>
									<addrLine>Palos de la Frontera s/n</addrLine>
									<postCode>21819</postCode>
									<settlement>La Rábida (Huelva)</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marcos</forename><surname>Lucero</surname></persName>
							<email>marcos.lucero@alu.uhu.es</email>
							<affiliation key="aff0">
								<orgName type="department">Escuela Técnica Superior de Ingeniería</orgName>
								<orgName type="institution">Universidad de Huelva Ctra. Huelva</orgName>
								<address>
									<addrLine>Palos de la Frontera s/n</addrLine>
									<postCode>21819</postCode>
									<settlement>La Rábida (Huelva)</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Manuel</forename><forename type="middle">J</forename><surname>Maña</surname></persName>
							<email>manuel.mana@dti.uhu.es</email>
							<affiliation key="aff0">
								<orgName type="department">Escuela Técnica Superior de Ingeniería</orgName>
								<orgName type="institution">Universidad de Huelva Ctra. Huelva</orgName>
								<address>
									<addrLine>Palos de la Frontera s/n</addrLine>
									<postCode>21819</postCode>
									<settlement>La Rábida (Huelva)</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Machine Learning Approach to Extract Drug -Drug Interactions in an Unbalanced Dataset</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">F1AA10FAA9718DB2E3524859B7F1C754</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T23:29+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Drug-drug interaction</term>
					<term>machine learning</term>
					<term>unbalanced classification</term>
					<term>feature selection</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Drug-Drug Interaction (DDI) extraction from the pharmacological literature is an emergent challenge in the text mining area. In this paper we describe a DDI extraction system based on a machine learning approach. We propose distinct solutions to deal with the high dimensionality of the problem and the unbalanced representation of classes in the dataset. On the test dataset, our best run reaches an F-measure of 0.4702.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>One of the most relevant problems in patient safety is the adverse reaction caused by drugs interactions. In <ref type="bibr" target="#b2">[3]</ref>, it is claimed that 1.5 million adverse drug events and tens of thousands of hospital admissions take place each year. A Drug-Drug Interaction (DDI) occurs when the effect of a particular drug is altered when it is taken with another drug. The most updated source to know DDI is the pharmacological specialized literature. However, the automatic extraction of DDI information from this huge document repository is not a trivial problem. In this scenario, text mining techniques are very suitable to deal with this kind of problems.</p><p>Different approaches are used in DDI extraction. In <ref type="bibr" target="#b8">[9]</ref>, the authors propose a hybrid method based on linguistic and pattern rules to detect DDI in the literature. Linguistic rules grasp syntactic structures or semantic meanings that could discover relations from unstructured texts. Pattern-based rules encode the various forms of expressing a given relationship. As far as we know, there are not many works applying machine learning approaches to this task due to the inexistence of available corpora. In <ref type="bibr" target="#b9">[10]</ref> a SVM classifier was used to extract DDI into the DrugDDI corpus. However, in the similar problem of protein-protein interaction (PPI) has been widely used obtaining promising effectiveness, as in <ref type="bibr" target="#b6">[7]</ref>. The main advantages of this approach are that they can be easily extended to new set of data and the development effort is considerably lower than manual encoding of rules and patterns.</p><p>In this paper we present a machine learning approach to extract DDI using the DrugDDI corpus <ref type="bibr" target="#b9">[10]</ref>. Natural Language Processing (NLP) techniques are used to analyze documents and extracting features which represent them. The unbalanced proportion between positive and negative classes in the corpus suggest us the application of sampling techniques. We have experimented with several machine learning algorithms (SVM, Naïve Bayes, Decision Trees, Adaboost) in combination with feature selection techniques in order to reduce the dimensionality of the problem.</p><p>The paper is organized as follows. The system architecture is presented in section 2. In Section 3 we describe the set of features that represents each pair of drugs which appears in the documents. Also we present the feature selection methods used to reduce the initial set of attributes. Next, Section 4 describes the techniques that we have used to deal with this unbalanced classification problem. In Section 5 we evaluate the results obtained with the training corpus. The results on the test corpus are presented in Section 6. Finally, the conclusions are in Section 7. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">System Architecture</head><p>Two different document formats has been provided by the organizers, the Unified format and the MMTx format. We have used this last one to develop and testing our system.</p><p>The words around the drugs in a sentence have been selected as attributes of the database because they could provide clues about the existence of interaction between two drugs. We have experimented using the words as they appear in the documents and, in other cases, with the lemmas provided by the Stanford University morphologic parser <ref type="foot" target="#foot_1">1</ref> .</p><p>For each drug pair in a sentence a set of features was extracted. The main features were focused on keywords, distances between drugs and drug semantic types. In the next section, a more detailed description of each attribute is done.</p><p>In order to carry out the experimentation, the DB of Features was split in two datasets for training and testing. We have used 2/3 of the original DB for training the classifier. The remaining 1/3 was used to test the system during the development phase.</p><p>Before training the classifier we have experimented with two preprocessing techniques. Because this problem is an unbalanced classification task we have carried out sampling techniques. Also, to reduce the dimensionality of the dataset a feature selection technique was performed. To obtain the model, we have experimented with several machine learning algorithms (SVM, Naïve Bayes, Decision Trees, Adaboost).</p><p>With each obtained model an evaluation was completed using the test dataset. The results obtained in this evaluation are shown in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Feature Extraction and Selection</head><p>The most important part in this kind of classifying problem is to choose the set of features that represents as well as possible each pair of drugs. It means that we need to find those features that provide important information for differentiating pairs of drugs with interaction of pairs without interactions.</p><p>In this section we describe the features we have chosen to build the dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Features</head><p>Firstly, we have extracted the drug ID, which indicates the sentence and the phrase of the dataset to which the drug belongs to. Secondly, a feature subset composed by keywords was chosen. Each attribute is represented by a binary value that means the presence or absence of this keyword. Three windows of tokens have been considered to locate the keywords: between the first and the second drug, before the first drug and after the second drug. In the last two cases, only three tokens were taken into account.</p><p>In this work, a keyword is a word that could provide relevant information about whether a pair of drugs interacts or not. In order to build the list of keywords we extracted all the words between each pair of drugs, before the first drug or after the second drug, according the case. This set of words was filtered by a short list of stop-</p><p>The POS tag of each word has been taken into account to make the selection. In this sense, we thought that verbs have an important semantic content, so we decided to include all of them into the final list. With respect to the nouns, we did a manual selection choosing those nouns that could be related semantically with drug interactions. Finally, in the case of prepositions, adverbs and conjunctions, we selected those that could be related with negation or frequency.</p><p>We have experimented using the keywords as they appear in the documents and, in other cases, with the lemmas provided by the Stanford University morphologic analyzer. In this case, the number of keywords was reduced because distinct verb tenses or plurals of a word were reduced to their lemmas, obtaining a total of 459 attributes.</p><p>Next, we added to the feature set the distance, in number of words and phrases, between the drugs. Also we included two features that represent the semantic type of each drug (represented by integer numbers).</p><p>Finally, the feature set is completed with the class, a binary value, where 1 means drug interaction and 0 if the pair does not interact.</p><p>As we can see in Table <ref type="table" target="#tab_0">1</ref>, we have extracted a total of 600 features from the original dataset to build the develop dataset. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Feature selection</head><p>Due to the high dimensionality of the training dataset, we have experimented with chi-squared feature selection method <ref type="bibr" target="#b7">[8]</ref>. This method returns a ranking of the features in decreasing order by the value of the chi-squared statistic with respect to the class. We selected the attributes which the statistic had a value greater than 0. The resulting dataset, in the case of keywords without lemmatization, had 496 attributes.</p><p>As shown in Table <ref type="table" target="#tab_1">2</ref>, there are 23827 drug pairs in the develop dataset and only 2409 are real drug interactions. Therefore, the positive class is nearly the 10% (9.89%) of the total number of instances. It is a classification task with unbalanced classes. To deal with this problem we have used the SMOTE algorithm <ref type="bibr" target="#b1">[2]</ref> in order to balance the classes.</p><p>Several classification algorithms have been selected in order to obtain the best effectiveness results with respect to the F-measure of the positive class. We have used the Weka <ref type="bibr" target="#b3">[4]</ref> implementation of the following algorithms: RandomForest <ref type="bibr" target="#b0">[1]</ref>, Naïve Bayes <ref type="bibr" target="#b4">[5]</ref>, SMO <ref type="bibr" target="#b5">[6]</ref> and MultiBoosting <ref type="bibr" target="#b10">[11]</ref>.</p><p>In some cases, to build the classification model, we have applied a cost sensitive matrix in order to penalize false positives.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Experimentation on Training Corpus</head><p>The develop corpus contains a collection of pharmacological texts labeled with drug interactions. This collection consists of 4267 sentences extracted from a total of 435 documents, which describe the interactions between drugs (Drug Drug Interactions or DDI). From these documents we have extracted 23827 drug pairs as possible cases of interaction. In total, there are 2409 instances corresponding to drug interactions and 21418 instances where there is no interaction between drugs.</p><p>Table <ref type="table" target="#tab_1">2</ref> summarizes the training corpus statistics. In the experiment phase, we divided the dataset into two new datasets for training and testing, respectively. The training dataset consists of 2/3 of the total instances (15885). The test dataset consists of the remaining instances (7942).</p><p>The distribution of the instances for training and test datasets was done at random, keeping the percentage of instances with drug interaction and no interaction (10% and 90%, respectively).</p><p>Table <ref type="table">3</ref> shows the effectiveness results for precision, recall and F-measure on the positive class of the 10 best evaluations. Each row of the table indicates a different </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. System Architecture Diagram.</figDesc><graphic coords="2,162.41,377.83,283.51,251.29" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Feature set without lemmatization of the keywords.</figDesc><table><row><cell>Feature</cell><cell>Type</cell><cell>Number of features</cell></row><row><cell>Drugs ID</cell><cell>Integer</cell><cell>2</cell></row><row><cell>Keywords before first drug</cell><cell>Binary</cell><cell>153</cell></row><row><cell>Keywords between drugs</cell><cell>Binary</cell><cell>243</cell></row><row><cell>Keywords after second drug</cell><cell>Binary</cell><cell>197</cell></row><row><cell>Number of words between drugs</cell><cell>Integer</cell><cell>1</cell></row><row><cell cols="2">Number of phrases between drugs Integer</cell><cell>1</cell></row><row><cell>Drug semantic types</cell><cell>Integer</cell><cell>2</cell></row><row><cell>Class</cell><cell>Binary</cell><cell>1</cell></row><row><cell>Total</cell><cell></cell><cell>600</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Training corpus statistics.</figDesc><table><row><cell>Total different documents (files)</cell><cell>435</cell></row><row><cell>Number of documents containing, at least, one drug</cell><cell>412</cell></row><row><cell>Number of documents containing, at least, one drug pair</cell><cell>399</cell></row><row><cell>Total number of sentences</cell><cell>4267</cell></row><row><cell>Total number of drugs</cell><cell>11260</cell></row><row><cell>Total number of drug pairs</cell><cell>23827</cell></row><row><cell>Number of drug interactions</cell><cell>2409</cell></row><row><cell>Total entities that participate in a pair</cell><cell>10374</cell></row><row><cell>Average drug per document (documents and sentences with pairs)</cell><cell>25.88</cell></row><row><cell>Average drug per sentence (sentences with pairs)</cell><cell>4.67</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4 .</head><label>4</label><figDesc>Evaluation on test corpus. The second column is the classification algorithm. For RandomForest algorithm, the I parameter means the number of trees used to train the model. The CST column indicates whether the model has been built using a cost sensitive training. Different cost sensitive matrixes have been used in the experimentation phase. The FS column shows when feature selection has been carried out. The Sampling column has the same meaning with the application of SMOTE algorithm. Finally, KW Lem. column shows a lemmatization process has been performed.</figDesc><table><row><cell cols="9">RUN Classification algorithm CST FS Sampling KW Lem. Precision Recall F-Measure</cell></row><row><cell>l</cell><cell>RandomForest (I = 50)</cell><cell>X</cell><cell>X</cell><cell>X</cell><cell></cell><cell>0.5000</cell><cell>0.4437</cell><cell>0.4702</cell></row><row><cell>2</cell><cell>RandomForest (I = 50)</cell><cell>X</cell><cell>X</cell><cell>X</cell><cell>X</cell><cell>0.4662</cell><cell>0.4291</cell><cell>0.4669</cell></row><row><cell>3</cell><cell>RandomForest (I = 10)</cell><cell>X</cell><cell></cell><cell>X</cell><cell>X</cell><cell>0.4004</cell><cell>0.4874</cell><cell>0.4397</cell></row><row><cell>4</cell><cell>RandomForest (I = 50)</cell><cell>X</cell><cell></cell><cell></cell><cell>X</cell><cell>0.6087</cell><cell>0.3152</cell><cell>0.4154</cell></row><row><cell>5</cell><cell>MultiBoosting</cell><cell></cell><cell>X</cell><cell>X</cell><cell></cell><cell>0.6433</cell><cell>0.2556</cell><cell>0.3659</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">3URFHHGLQJV RI WKH VW &amp;KDOOHQJH WDVN RQ 'UXJ'UXJ ,QWHUDFWLRQ ([WUDFWLRQ '',([WUDFWLRQ SDJHV ± +XHOYD 6SDLQ 6HSWHPEHU</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_1">http://nlp.stanford.edu/index.shtml $ 0DFKLQH /HDUQLQJ $SSURDFK WR ([WUDFW 'UXJ ± 'UXJ ,QWHUDFWLRQV LQ DQ 8QEDODQFHG 'DWDVHW</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>combination of classification algorithm, cost sensitive training, feature selection, sampling and keyword lemmatization.</p><p>As can be seen, the best results are obtained with the RandomForest algorithm. Moreover, the cost sensitive training, feature selection, sampling and lemmatization of the keywords contribute to achieve the best F-measures. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Results on Test Corpus</head><p>In order to send runs with different characteristics, we didn't send the five runs with higher value of F-measure. According to Table <ref type="table">3</ref>, runs 1, 2, 4, 7 and 8 were submitted. We chose this strategy because we did not know the characteristics of the test corpus.</p><p>In Table <ref type="table">4</ref>, we present the results obtained for the five submitted runs. The approaches that obtain the best results on the training dataset coincide with the obtained on the test dataset. Although there are not significant differences between precisions on training and test datasets, a greater decrement in the recall measure do that the F-measure falls a 10% approximately. We think that this decrement in the effectiveness measures is due to a possible overfitting of the classification models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusions</head><p>In this paper we have presented a DDI extraction system based on a machine learning approach. We have proposed distinct solutions to deal with the high dimensionality of the problem and the unbalanced representation of classes in the dataset. The results obtained on both datasets are promising and we think that this could be a good starting point for future improvements.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Random Forests. Machine Learning</title>
		<author>
			<persName><forename type="first">L</forename><surname>Breiman</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="5" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Synthetic Minority Oversampling Technique</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">V</forename><surname>Chawla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">W</forename><surname>Bowyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">O</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">P</forename><surname>Kegelmeyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Artificial Intelligence Research</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="321" to="357" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Critical drug-drug interactions for use in electronic health records systems with computerized physician order entry: review of leading approaches</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">C</forename><surname>Classen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Phansalkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">W</forename><surname>Bates</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Patient Safety</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="61" to="65" />
			<date type="published" when="2011-06">2011. Jun</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">The WEKA Data Mining Software: An Update</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Frank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Holmes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Pfahringer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Reutemann</surname></persName>
		</author>
		<author>
			<persName><forename type="middle">I H</forename><surname>Witten</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGKDD Explorations</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Estimating Continuous Distributions in Bayesian Classifiers</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">H</forename><surname>John</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Langley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Eleventh Conference on Uncertainty in Artificial Intelligence</title>
				<meeting><address><addrLine>San Mateo</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1995">1995</date>
			<biblScope unit="page" from="338" to="345" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Improvements to Platt&apos;s SMO Algorithm for SVM Classifier Design</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Keerthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Shevade</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bhattacharyya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">R K</forename><surname>Murthy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural Computation</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="637" to="649" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The BioCreative II.5 challenge overview</title>
		<author>
			<persName><forename type="first">M</forename><surname>Krallinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Leitner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Valencia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the BioCreative II. 5 Workshop 2009 on Digital Annotations</title>
				<meeting>the BioCreative II. 5 Workshop 2009 on Digital Annotations</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page">19</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Feature selection and discretization of numeric attributes</title>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Setiono</surname></persName>
		</author>
		<author>
			<persName><surname>Chi2</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE 7th International Conference on Tools with Artificial Intelligence</title>
				<meeting>IEEE 7th International Conference on Tools with Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="1995">1995</date>
			<biblScope unit="page" from="338" to="391" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents</title>
		<author>
			<persName><forename type="first">I</forename><surname>Segura-Bedmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Martínez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>De Pablo-Sánchez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC BioInformatics</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">S1</biblScope>
			<date type="published" when="2011-03">March. 2011</date>
		</imprint>
	</monogr>
	<note>Suppl</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Using a shallow linguistic kernel for drug-drug interaction extraction</title>
		<author>
			<persName><forename type="first">I</forename><surname>Segura-Bedmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Martinez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>De Pablo-Sanchez</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.jbi.2011.04.005</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Biomedical Informatics</title>
		<imprint>
			<date type="published" when="2011-04-24">24 April 2011</date>
			<publisher>Press</publisher>
		</imprint>
	</monogr>
	<note>Corrected Proof</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">MultiBoosting: A Technique for Combining Boosting and Wagging</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">I</forename><surname>Webb</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
