<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A web prototype for detecting chemical compounds and drugs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel Sánchez-Cisneros</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sara Lana-Serrano</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Isabel Segura-Bedmar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leonardo Campillos</string-name>
          <email>leonardo.campillos@uam.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paloma Martínez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ComputerScienceDepartment, Universidad Carlos III de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad Autónoma de Madrid</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad Politécnica de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper introduces a web prototype for named entity recognition of chemical compounds and drugs. The tool is based on a system developed to participate in the ChemDNER task organized as part of Biocreative 2013 workshop. The system combines the ChemSpot tool as well as a set of semanticbased rules, which were defined according to the guidelines provided to task participants. The prototype is available at http://multimedica.uc3m.es:8080/biocreative2013demo/</p>
      </abstract>
      <kwd-group>
        <kwd>Drug named entity recognition</kwd>
        <kwd>information extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Most research on named entity recognition (NER) in the biomedical domain are based
on dictionary based methods and Supervised Machine Learning (SML) methods. The
main problems with the former approach are their domain dependency and their
inability to recognize terms not included in the dictionaries. Machine learning techniques
build classification models based on annotated corpus and produce the best results [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
although they require annotated corpora.
      </p>
      <p>
        Current trends try to develop hybrid systems that combine best of two approaches. In
this work we present a prototype that combines existing systems such as ChemSpot
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and Metamap [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] with gazetteers extracted from biomedical resources such as
MeSH1, DrugBank2, Wikipedia3 and ChEBI [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Lastly, based on error analysis of the
development set, we defined a set of semantic rules to detect false negatives and
discard false positives generated by the previous processes. In this paper, we present a
web tool designed on this system. The tool allows user to introduce a text and then
detect chemical compounds and drugs occurring in the text.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Description of the prototype</title>
      <p>
        1 http://www.nlm.nih.gov/mesh/meshhome.html
2 http://www.drugbank.ca
3 http://wikipedia.org
by traversing recursively the relationships: is_a, has_role, is_conjugate_acid_of and
is_conjugate_base_of. In the next phase, a gazetteer tagger implemented in the
GATE4 environment is used. Based on error analysis of the development set, a set of
27 gazetteers with more than 340,000 entries have been compiled to process texts in
order to rule out false positive instances and to annotate false negative instances that
were not recognized in the previous steps. The sixth module is the ANNIE PoS tagger
included in GATE. Pos tags are used to discard some instances as well as to define the
rules used in the last two steps to classify the entities according to PoS tagging, affix
processing and multiword processing. More information about the processes and
resources used can be found at [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
The system was evaluated on the test dataset provided by the BioCreative IV
(CHEMDNER 2013 task5). It was able to recognize chemical and drug named entities
with an F-measure of 0,594 over Chemical Entity Mentions (CEM) evaluation. As
future work, we plan to conduct an evaluation with users to measure the usability of
our tool.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>TThis work was supported by the EU project TrendMiner [FP7-ICT287863], by the
project MultiMedica [TIN 2010-20644-C03-01)], and by the research network
MA2VICMR [S2009/TIC-1542].</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Rocktschel</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huber</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weidlich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leser</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>WBI-NER: The impact ofdomainspecific features on the performance of identifying and classifying mentions of drugs</article-title>
          .
          <source>Proceedings of SemEval</source>
          <year>2013</year>
          , pp.
          <fpage>356</fpage>
          -
          <lpage>363</lpage>
          , (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Rocktschel</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weidlich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leser</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Chemspot: a hybrid system for chemical named entity recognition</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>28</volume>
          (
          <issue>12</issue>
          ), pp.
          <fpage>1633</fpage>
          -
          <lpage>1640</lpage>
          , (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          :
          <article-title>Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program</article-title>
          .
          <source>Proceedings of the AMIA Symposium</source>
          , American Medical Informatics Association, pp.
          <fpage>17</fpage>
          -
          <lpage>21</lpage>
          , (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Degtyarenko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Matos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ennis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hastings</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zbinden</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McNaught</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alcántara</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darsow</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guedj</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ashburner</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>ChEBI: a database and ontology for chemical entities of biological interest</article-title>
          .
          <source>Nucleic Acids Research</source>
          ,
          <volume>36</volume>
          , pp.
          <fpage>344</fpage>
          -
          <lpage>350</lpage>
          , (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lana-Serrano</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sánchez-Cisneros</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Campillos</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Segura-Bedmar</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <article-title>Recognizing chemical compounds and drugs: a rule-based approach using semantic information</article-title>
          .
          <source>Proceedings of the fourth BioCreative challenge evaluation workshop</source>
          , vol
          <volume>2</volume>
          , (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>