=Paper= {{Paper |id=Vol-1327/5 |storemode=property |title=Using Ontology Fingerprints to Disambiguate Gene Name Entities in the Biomedical Literature |pdfUrl=https://ceur-ws.org/Vol-1327/icbo2014_paper_22.pdf |volume=Vol-1327 |dblpUrl=https://dblp.org/rec/conf/icbo/ChenZCTSXBZJHBM14 }} ==Using Ontology Fingerprints to Disambiguate Gene Name Entities in the Biomedical Literature== https://ceur-ws.org/Vol-1327/icbo2014_paper_22.pdf
                                             ICBO 2014 Proceedings

    Using	
  Ontology	
  Fingerprints	
  to	
  disambiguate	
  gene	
  name	
  entities	
  in	
  the	
  
                                 biomedical	
  literature	
  
             1           1              1           1            1         1                   1                 2
Guocai Chen , Jieyi Zhao , Trevor Cohen , Cui Tao , Jingchun Sun , Hua Xu , Elmer V. Bernstam , Andrew Lawson ,
        3                    3                 3              3                       3              1,*
Jia Zeng , Amber M. Johnson , Vijaykumar Holla , Ann M. Bailey , Funda Meric-Bernstam , W. Jim Zheng

1
 Center for Computational Biomedicine, School of Biomedical informatics, University of Texas Health Science Center
at Houston
2
 Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 300,
Charleston, South Carolina, 29425
3
 Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson
Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030


    Personalized cancer therapy relies on                    articles were selected and marked by oncologists
extensive knowledge of cancer genes, their                   and research staff from the Institute for
variants and treatments that target these variants.          Personalized Cancer Therapy at the UT MD
While most of this knowledge can be extracted                Anderson Cancer Center. For the selected genes,
from the biomedical literature, identifying genes            we obtained 93.6% precision for gene name
and their associated publications with high                  disambiguation and 80.4% AUC for gene and
precision is still a daunting task, often challenged         article association. For additional 223 human
by ambiguous gene names in the text. One way                 genes relevant to cancer, by using the Ontology
to disambiguate gene name is through gene                    Fingerprints generated from the publications
normalization - the task of mapping a named                  before December 20, 2009 for these genes to
entity in text to an identifier in a database.               predict the association of these genes with
However, many genes have multiple names or                   papers published after 2009, we got a highest
aliases, part of them share identical names, even            precision up to 92.7%.
though they are distinct genes with different
                                                                 We investigated the feasibility of using
functions. Developing new methods to distinguish
                                                             Ontology Fingerprints to discover associations
these ambiguous gene names will significantly
                                                             between genes and PubMed articles, as well as
improve the accuracy of information retrieval and
                                                             to disambiguate gene name mentions. We
other research-enabling applications.
                                                             obtained reasonable accuracy for gene name
    To overcome this hurdle, we generated a non-             disambiguation and gene and PubMed article
supervised approach to create ontology profiles              association. The Ontology Fingerprint method
termed Ontology Fingerprints for selected genes              can improve gene normalization and the analysis
that are relevant for personalized cancer therapy            of gene and article association. We conclude that
from the literature. The Ontology Fingerprint for a          Ontology Fingerprints can help disambiguate
gene consists of a set of associated GO terms                gene names mentioned in text and analyze the
and their ancestors defined by biologists, with an           association between genes and articles.
enrichment p-value mapping to each term to
                                                                 The core algorithm was implemented using a
reflect the significance of the term. We first used
                                                             GPU-based MapReduce framework to handle big
the ABGene/GNAT to identify gene names from
                                                             data and to improve performance. Comparing
the PubMed abstracts, and matched the names
                                                             with running the program on Lonestar cluster, we
to the gene name or alias of known genes. The
                                                             can gain the same magnitude of speed when
ambiguous names were then assessed by
                                                             using the GPU MapReduce framework. Overall,
evaluating the degree to which the abstract
                                                             the MapReduce framework makes execution of
matched the Ontology Fingerprints of the genes.
                                                             the program more convenient and affordable,
    Focusing only on genes targeted by                       especially on a workstation with an appropriate
therapeutics for personalized cancer therapy.                graphic card.
Eleven of these genes and relevant PubMed

*Corresponding Author: Wenjin.j.Zheng@uth.tmc.edu

                                                        66