<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using  Ontology  Fingerprints  to  disambiguate  gene  name  entities  in  the   biomedical  literature  </article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Guocai Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jieyi Zhao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Trevor Cohen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cui Tao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jingchun Sun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hua Xu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elmer V. Bernstam</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrew Lawson</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jia Zeng</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amber M. Johnson</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vijaykumar Holla</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ann M. Bailey</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Funda Meric-Bernstam</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>W. Jim Zheng</string-name>
          <email>Wenjin.j.Zheng@uth.tmc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Computational Biomedicine, School of Biomedical informatics, University of Texas Health Science Center at Houston</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center</institution>
          ,
          <addr-line>1400 Holcombe Blvd., FC8.3044, Houston, TX 77030</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Public Health Science, Medical University of South Carolina</institution>
          ,
          <addr-line>135 Cannon Street, Suite 300, Charleston, South Carolina, 29425</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <abstract>
        <p>articles were selected and marked by oncologists and research staff from the Institute for Personalized Cancer Therapy at the UT MD Anderson Cancer Center. For the selected genes, we obtained 93.6% precision for gene name disambiguation and 80.4% AUC for gene and article association. For additional 223 human genes relevant to cancer, by using the Ontology Fingerprints generated from the publications before December 20, 2009 for these genes to predict the association of these genes with papers published after 2009, we got a highest precision up to 92.7%.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Personalized cancer therapy relies on
extensive knowledge of cancer genes, their
variants and treatments that target these variants.
While most of this knowledge can be extracted
from the biomedical literature, identifying genes
and their associated publications with high
precision is still a daunting task, often challenged
by ambiguous gene names in the text. One way
to disambiguate gene name is through gene
normalization - the task of mapping a named
entity in text to an identifier in a database.
However, many genes have multiple names or
aliases, part of them share identical names, even
though they are distinct genes with different
functions. Developing new methods to distinguish
these ambiguous gene names will significantly
improve the accuracy of information retrieval and
other research-enabling applications.</p>
      <p>To overcome this hurdle, we generated a
nonsupervised approach to create ontology profiles
termed Ontology Fingerprints for selected genes
that are relevant for personalized cancer therapy
from the literature. The Ontology Fingerprint for a
gene consists of a set of associated GO terms
and their ancestors defined by biologists, with an
enrichment p-value mapping to each term to
reflect the significance of the term. We first used
the ABGene/GNAT to identify gene names from
the PubMed abstracts, and matched the names
to the gene name or alias of known genes. The
ambiguous names were then assessed by
evaluating the degree to which the abstract
matched the Ontology Fingerprints of the genes.</p>
    </sec>
    <sec id="sec-2">
      <title>Focusing only on genes targeted by therapeutics for personalized cancer therapy. Eleven of these genes and relevant PubMed</title>
    </sec>
    <sec id="sec-3">
      <title>The core algorithm was implemented using a</title>
      <p>GPU-based MapReduce framework to handle big
data and to improve performance. Comparing
with running the program on Lonestar cluster, we
can gain the same magnitude of speed when
using the GPU MapReduce framework. Overall,
the MapReduce framework makes execution of
the program more convenient and affordable,
especially on a workstation with an appropriate
graphic card.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>