<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Drug-Drug Interactions Discovery Based on CRFs, SVMs and Rule-Based Methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stefania Rubrichi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Gabetta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Bellazzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristiana Larizza</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Silvana Quaglini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Laboratory for Biomedical Informatics “Mario Stefanelli”, Department of Computers and Systems Science, University of Pavia</institution>
          ,
          <addr-line>Pavia</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Information about medications is critical in improving the patients' safety and quality of care. Most adverse drug events are predictable from the known pharmacology of the drugs and many represent known interactions and are, therefore, likely to be preventable. However, most of this information is locked in free-text and, as such, cannot be actively accessed and elaborated by computerized applications. In this work, we propose three different approaches to the problem of automatic recognition of drug-drug interactions that we have developed within the “First Challenge Task: Drug-Drug Interaction Extraction” competition. Our approaches learn to discriminate between semantically interesting and uninteresting content in a structured prediction framework as well as a rule-based one. The systems are trained using the DrugDDI corpus provided by the challenge organizers. An empirical analysis of the three approaches on this dataset shows that the inclusion of rule-based methods is indeed advantageous.</p>
      </abstract>
      <kwd-group>
        <kwd>Drug-Drug Interactions</kwd>
        <kwd>Information Extraction</kwd>
        <kwd>Conditional Random Fields</kwd>
        <kwd>Support Vector Machines</kwd>
        <kwd>Adverse Drug Events</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The use of medications has a central role in health care provision, yet on
occasion it may endanger patients’ safety and account for increased health care costs,
as result of adverse drug events (ADEs). Many of these injuries are inevitable,
but at least a quarter may be secondary to medication errors [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that can be
avoidable. That is the case of ADEs due to drug-drug interactions (DDIs), since
many of them are due to disregarded known interactions and are therefore likely
to be preventable. Over the 6.5% of drug-related hospital admissions are a
consequence of DDIs.
      </p>
      <p>DDIs are a common problem during drug treatment. Widely, a drug
interaction represents the situation in which a substance affects the activity of an
active ingredient, resulting in various effects such as alterations in absorption,
'
, )
!
. )
! ,
/</p>
      <p>
        Extracting Drug-Drug Interaction
metabolism, excretion, and pharmacodynamics (i.e. the drug effects are
decreased or increased, or the drug produces a new effect that neither produces on
its own). Safe medication use requires that prescribers receive clear information
on the medication itself including information about any potential interactions.
This information is constantly changing, and while most of the necessary
updated knowledge is available somewhere, it is not always readily accessible. In
particular, most of this information is locked in free-text, then cannot be
actively used by health information systems. Reliable access to this comprehensive
information, by Natural Language Processing (NLP) systems, can represent a
useful tool for preventing medication errors and, more specifically, DDIs. Over
the last two decades there has been an increase of interest in applying NLP, in
particular information extraction (IE) techniques, to biomedical text. Excellent
efforts have been documented in the medication domain literature on IE from
textual clinical documents [
        <xref ref-type="bibr" rid="ref11 ref12 ref14 ref15 ref18 ref4 ref5 ref9">4,5,9,11,12,14,15,18</xref>
        ], and its subsequent application
in summarization, case finding, decision-support, or statistical analysis tasks.
In this context, we accepted the challenge presented within the “First Challenge
Task: Drug-Drug Interaction Extraction” competition and developed a system
for the automatic extraction of DDIs from a corpus [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] of documents, collected
from the DrugBank database [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], describing, for each drug, the relating DDIs.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>On the following section we present the proposed system and its components.
2.1</p>
      <sec id="sec-2-1">
        <title>System Outline</title>
        <p>
          We exploit three different approaches, which rely upon different methods for
the extraction of such information. The first approach (henceforth referred as
hybrid approach) is twofold: it combines a supervised learning technique based on
Conditional Random Fields (CRFs) [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] with a rule-based method. We modeled
the problem as follows: in a first step we employed the CRFs classifier in order
to assign the correct semantic category to each word, or segment of sentence, of
the text. We considered the following three semantic categories:
1. DrugNotInteracting: describes a drug entity, which is not involved in an
interaction;
2. DrugInteracting: describes a drug entity, which is involved in an interaction;
3. None: indicates elements that are not relevant for this task.
        </p>
        <p>Once every potential interacting entity has been identified by the CRFs classifier,
we defined a set of rules for the construction of the actual pairs of interacting
entities, and match them with the sentences.</p>
        <p>The second (henceforth referred as pair-centered CRFs approach) and third
(henceforth referred as pair-centered SVMs approach) approaches are very
similar: they are both based on supervised learning methods, CRFs and Support
+
/</p>
        <sec id="sec-2-1-1">
          <title>Extracting Drug-Drug Interaction 3</title>
          <p>
            Vector Machines (SVMs) [
            <xref ref-type="bibr" rid="ref17 ref2">2, 17</xref>
            ], respectively. In this case we focused on the
single pair of drug entities: for any given pair in a sentence, such techniques predict
the presence or absence of interaction relation, relying on a set of hundreds of
engineered features, which take into account the properties of the text, by
learning the correspondence between semantic categories and features. We considered
only two semantic categories:
1. Interaction: describes a pair of drug entities which interact;
2. NotInteraction: describes a pair of drug entities which don’t interact;
All these three methodologies have been developed through different steps. We
began with a pre-processing pass over the corpus in order to prepare the dataset
for the use by the extraction module. Then, we defined a set of binary features
that express some descriptive characteristics of the data, and we converted the
data in a set of corresponding features. Finally, we processed the data through
the three methodologies described above.
2.2
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Supervised Learning Methods: CRFs and SVMs</title>
        <p>Supervised learning approaches have been widely applied to the domain of IE
from free text. A typical application of supervised learning works to classify a
novel instance x as belonging to a particular category y. Given a predefined set of
categories, such methods use a set of training examples to take decision in front
of new examples. They automatically tune their own parameters to maximize
their performance on the training set and then generalize from the new samples.
We processed the data through the two linear classifiers, CRFs and SVMs: both
algorithms iterate the tokens in the sentence, and label proper tokens with
semantic categories. These classifiers discriminate between semantically interesting
and uninteresting content through the automatic adaptation of a large number
of interdependent descriptive characteristics (features) taking into account the
properties of the input text. Each token is represented by a set of features, then
the classifiers learn a correspondence between semantic categories and features,
and assign real-valued weight to such features.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Pre-processing</title>
        <p>The first step of our DDIs detection system has been a pre-processing over the
data provided within the challenge contest.</p>
        <p>
          We designed two different pre-processing strategies, one for the hybrid approach,
the other one for the pair-centered CRFs and the pair-centered SVMs approach.
The first pre-processing strategy analyzes sentence-by-sentence the training
corpus, using a quite classical NLP system developed using Gate [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], an open source
framework for language processing. This system includes:
– Tokenizer: splits the atomic parts of the sentence (tokens) according to a
specific language (English in our case);
'
, )
!
. )
! ,
/
        </p>
        <p>
          Extracting Drug-Drug Interaction
– Part of Speech (POS) Tagger [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]: assigns to the tokens their grammatical
class (e.g. noun, verb, adjective . . . );
– Morphological Analyzer: assigns the lexical roots to the tokens;
– UMLS concept finder: a module we developed, in order to discover concepts
referable to the Unified Medical Language System (UMLS) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] within the
text.
        </p>
        <p>The pre-processing system returns as output a line for each token; such line
contains the token itself together with additional information necessary for the
features generation task. In particular:
– the semantic category of the token itself;
– the “entity tag” that is the entity’s code (e.g. DrugDDI.d385.s4.e0) when
the token is an entity and null otherwise;
– the “main drug tag” that is true if the token matches the standard name of
the referential drug1 and false otherwise;
– the “brand name tag” that is true if the token matches one of the brand
names of the referential drug and false otherwise. Brand names come from
the DrugBank;
– the “POS tag” that is the grammatical class provided by the POS Tagger
(entities are automatically tagged as proper nouns - NNP);
– the “root tag” which is the root of the token provided by the Morphological</p>
        <p>Analyzer (the entity itself for the entities);
– the “semantic group tag” that, when the token belongs to a UMLS concept, is
the semantic group of the concept itself (e.g. “DISO” for concepts belonging
to the “Disorders” group); it is “ENT” when the token is an entity and null
otherwise.</p>
        <p>As an example, given the input sentence:
&lt;sentence id="DrugDDI.d368.s0" origId="s0" text="Itraconazole
decreases busulfan clearance by up to 25%, and may produce AUCs &gt;
1500 muMolmin in some patients."&gt;</p>
        <p>&lt;entity id="DrugDDI.d368.s0.e0" origId="s0.p0" charOffset="0-12"
type="drug" text="Itraconazole" /&gt;</p>
        <p>&lt;entity id="DrugDDI.d368.s0.e1" origId="s0.p2" charOffset="23-31"
type="drug" text="busulfan" /&gt;</p>
        <p>&lt;pair id="DrugDDI.d368.s0.p0" e1="DrugDDI.d368.s0.e0"
e2="DrugDDI.d368.s0.e1" interaction="true" /&gt;
&lt;/sentence&gt;
the first pre-processing strategy will generate the following lines:
itraconazole-DrugInteracting-DrugDDI.d368.s0.e0-false-false-NNPitraconazole-ENT
decreases-None-null-false-false-NNS-decrease-CONC
busulfan-DrugInteracting-DrugDDI.d368.s0.e1-true-false-NNP-busulfan1 We indicate by “referential drug” the drug described in the specific document under
examination.
/</p>
        <sec id="sec-2-3-1">
          <title>Extracting Drug-Drug Interaction 5</title>
          <p>ENT
clearance-None-null-false-false-NN-clearance-PHEN
...
and so on.</p>
          <p>The second pre-processing strategy evaluates separately all the pairs within a sentence;
it uses the same NLP system described for the first strategy, but it formats the output
in a different way. For each pair, the output consists of a header line, containing the
codes of the involved entities and the semantic category of the pair. The header line is
followed by a line for each token standing between the two entities involved in the pair;
for each line the elements describing the token are exactly the same as those described
for the first strategy (token, interaction tag, entity tag, etc.).</p>
          <p>Given the input sentence from the previous example, the second pre-processing strategy
will generate the following lines:
DrugDDI.d368.s0.e0 DrugDDI.d368.s0.e1-Interaction
decreases-None-null-false-false-NNS-decrease-CONC
2.4</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Feature Definition and Data Conversion</title>
        <p>The feature construction process aims at capturing the salient characteristics of each
token in order to help the system to predict its semantic label. Feature definition
is a critical stage regarding the success of feature-based statistical models such as
CRFs and SVMs. A careful inspection of the corpus has resulted in the identification
of a set of informative binary features that capture salient aspects of the data with
respect to the tagging task. Subsequently, the stream of tokens has been converted to
features. In particular, in the pair-centered CRFs and pair-centered SVMs approaches
we considered only the tokens between the two entities which form each pair. This
means that features for drug entities pair E1-E2 contain predicates about the n tokens
between E1 and E2.</p>
        <p>In the following we report on the set of features used in our experiments.
Orthographical Features As a good starting point, this class of features consists of
the simplest and most obvious feature set: word identity feature, that is the vocabulary
derived from the training data.</p>
        <p>Part Of Speech (POS) Features We supposed lexical information might be
quite useful for identifying named entities. Thus, we included features that indicate
the lexical function of each token.</p>
        <p>Punctuation Features Also notable are punctuation features, which contain some
special punctuation in sentences. After browsing our corpus we found that colon might
prove helpful. Given a medication in fact, colon is usually preceded by the interacting
substance and followed by the explanation of the specific interaction effects.
Semantic Features In order to have these models benefit from domain specific
knowledge we added semantic features which use external semantic resources. This
class of features includes:
'
, )
!
. )
! ,
/</p>
        <p>Extracting Drug-Drug Interaction
1. root feature: takes account of the root associated to each word;
2. UMLS feature: relies on the UMLS Metathesaurus and for each word returns the
corresponding semantic group;
3. brand name feature: it recognizes the corresponding brand names occurring in the
text. DrugBank database drug entries are provided with the field “Brand Names”,
which contains a complete list of brand names from different manufacturers. We
create a binary feature, which, every time a text token coincides with one of such
names, is active, indicating that the token corresponds to a brand name of the
specific referential drug;
4. standard drug name feature: identifies the standard name of the source drug. For
each token this feature tests if it matches such standard name;
5. drug entity feature: allows the models to recognize the drug entities annotated by
the MetaMap tool: it is active for the tokens which have been annotated as drug
entity by the MetaMap tool.</p>
        <p>
          Context Features Finally, we extended all the classes of feature we described above
to a token window of [-k,k]. The descriptive characteristics of tokens preceding or
following a target token may be useful for modeling the local context. It is clear that the
more context words analyzed, the better and more precise the results could become.
However, widening the context window quickly leads to an explosion of the
computational and statistical complexity. For our experiments, we estimated a suitable window
size of [
          <xref ref-type="bibr" rid="ref3">-3,3</xref>
          ].
2.5
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>Rule-based Method</title>
        <p>As we have already stated, while both pair-centered CRFs and pair-centered SVMs
approaches focus on entities pairs and predict directly the presence or absence of
interaction, the first one considers a token at a time, then the semantic category prediction
is on a token-by-token basis. Therefore, a further processing pass was necessary in order
to build up the interaction pairs, starting from each single entity. For this purpose, we
employed a rule-based method which relies upon a set of rules, manually-constructed
from the training data analysis. In particular, the rules that we built to find out the
interacting pairs are the following:
– if a sentence contains less than two tokens labeled as DrugInteracting, then no
interacting pair is generated;
– an interacting pair must contain two tokens labeled as DrugInteracting;
– one and only one of the token involved in the interacting pair, must be the
referential drug or one of its brand names.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>
        We used the Unified format of the DrugDDI corpus [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] provided by the competition
organizers.
      </p>
      <p>For the linear SVMs, we found the regularization parameter λ = 1 to work well. SVMs
results have been produced using 10 passes through the entire training set. For the
variance of the Gaussian regularizer of the CRFs we used the value 0.1.
We submitted a total of three runs: the first run includes the predictions generated
by the hybrid approach; the second run includes the predictions generated by the
pair-centered CRFs approach; the third run includes the predictions generated by the
pair-centered SVMs approach.
/</p>
      <sec id="sec-3-1">
        <title>Extracting Drug-Drug Interaction 7</title>
        <p>4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results and Discussion</title>
      <p>The evaluation process was performed by the challenge organizers.</p>
      <p>The overall results of the three approaches can be found in Table 1. In general, the
hybrid approach outperforms the other two. This performance gain can be attributed
to the additional contribute of rule-based method, that played an important role in
building the interacting pairs. In particular it makes the system benefit from additional
knowledge that facilitates the pairs disambiguation process. It specifies, for example,
that a pair has to include the referential drug or one of its brand names together with
another drug entity different from them.</p>
      <p>There is room for improvement, especially for the pair-centered CRFs and pair-centered
SVMs approaches. In such approaches we mainly relied on tokens occurring between
the two entities which form each pair, however tokens preceding and following the pairs
could also be taken into account.
In this paper we presented three different approaches for the extraction of DDIs that we
have developed within the “First Challenge Task: Drug-Drug Interaction Extraction”
competition. We employed three different methodologies: two machine learning-based
(CRFs and SVMs) and one which combines a machine learning-based (CRFs) with
a rule-based technique. The latter achieved better results with an overall F1 score
of about 44%. This figure doesn’t seem encouraging: the comparison with the other
systems that face the same problem with the same corpus within this competition
probably will allow to understand this result and realize the weakness of our approaches.</p>
      <p>Extracting Drug-Drug Interaction</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. http://labda.inf.uc3m.es/ddiextraction2011/dataset.html</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Usunier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Sequence labelling SVMs trained in one pass</article-title>
          .
          <source>In: ECML PKDD</source>
          <year>2008</year>
          . pp.
          <fpage>146</fpage>
          -
          <lpage>161</lpage>
          . Springer (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cunningham</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bontcheva</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tablan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Gate: A framework and graphical development environment for robust nlp tools and applications</article-title>
          .
          <source>In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02)</source>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brownlowt</surname>
            ,
            <given-names>N.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hersh</surname>
          </string-name>
          , W.R.,
          <string-name>
            <surname>Campbell</surname>
            ,
            <given-names>E.M.:</given-names>
          </string-name>
          <article-title>Automating concept identification in the electronic medical record: An experiment in extracting dosage information</article-title>
          .
          <source>In: Proc. AMIA Annu Fall Symp</source>
          . pp.
          <fpage>388</fpage>
          -
          <lpage>392</lpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gold</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elhadad</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimino</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hripcsak</surname>
          </string-name>
          , G.:
          <article-title>Extracting structured medication event information from discharge summaries</article-title>
          .
          <source>In: Proc. AMIA Annu Symp</source>
          . pp.
          <fpage>237</fpage>
          -
          <lpage>241</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>HeppleIn</surname>
          </string-name>
          , M.:
          <article-title>Independence and commitment: Assumptions for rapid training and execution of rule-based pos taggers</article-title>
          . In:
          <article-title>Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-</article-title>
          <year>2000</year>
          ) (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Institute of Medicine (ed.):
          <article-title>Preventing Medication Errors</article-title>
          . The National Academics Press, Washington (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Knox</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Law</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jewison</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ly</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frolkis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banco</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mak</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveu</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Djoumbou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eisner</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wishart</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Drugbank 3.0: a comprehensive resource for 'omics' research on drugs</article-title>
          .
          <source>Nucleic Acids Res</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Levin</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krol</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doshi</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          , Reich, D.L.:
          <article-title>Extraction and mapping of drug names from free text to a standardized nomenclature</article-title>
          .
          <source>In: Proc AMIA Annu Symp</source>
          . pp.
          <fpage>438</fpage>
          -
          <lpage>442</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lindberg</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Humphreys</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCray</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The unified medical language system</article-title>
          .
          <source>Methods Inf Med</source>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plaisantin</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korchia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rozanes</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Serrot</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joubert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darmoni</surname>
            ,
            <given-names>S.J.:</given-names>
          </string-name>
          <article-title>Automatic construction of dictionaries, application to product characteristics indexing</article-title>
          .
          <source>In: Proc Workshop on Advances in Bio Text Mining</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Segura-Bedmar</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martínez</surname>
            , P., de Pablo-Sánchez,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Extracting drug-drug interactions from biomedical texts</article-title>
          .
          <source>In: Workshop on Advances in Bio Text Mining</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Segura-Bedmar</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez</surname>
          </string-name>
          , P.,
          <string-name>
            <surname>de Pablo-Sanchez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Using a shallow linguistic kernel for drug-drug interaction extraction</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          , In Press (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>A.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>An algorithm to derive a numerical daily dose from unstructured text dosage instructions</article-title>
          .
          <source>Pharmacoepidemiology and Drug Safety</source>
          <volume>15</volume>
          ,
          <fpage>161</fpage>
          -
          <lpage>166</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sirohi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peissig</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Study of effect of drug lexicons on medication extraction from electronic medical records</article-title>
          .
          <source>In: Proc. Pacific Symposium on Biocomputing</source>
          . vol.
          <volume>10</volume>
          , pp.
          <fpage>308</fpage>
          -
          <lpage>318</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Sutton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Grmm:
          <article-title>Graphical models in mallet</article-title>
          . http://mallet.cs.umass.edu/grmm/ (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Tsochantaridis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Altun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Large margin methods for structured and interdependent output variables</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>6</volume>
          ,
          <fpage>1453</fpage>
          -
          <lpage>1484</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stenner</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Johnson, K.B.,
          <string-name>
            <surname>Waitman</surname>
            ,
            <given-names>L.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denny</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          :
          <article-title>Medex: a medication information extraction system for clinical narratives</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>17</volume>
          ,
          <fpage>19</fpage>
          -
          <lpage>24</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>