<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluating gene/protein name tagging and mapping for article re- trieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chong Min Lee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manabu Torii</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jinesh Shah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yi-Ting Tsai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhang-Zhi Hu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hongfang Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lab of Text Intelligence in Biomedicine Georgetown University Medical Center Washington</institution>
          ,
          <addr-line>DC 20007</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>104</fpage>
      <lpage>109</lpage>
      <abstract>
        <p>Background: Tagging gene/protein names in text and mapping them to database entries are critical tasks in biological literature mining. Most of the existing tagging and normalization approaches, however, have not been evaluated for practical use in article retrieval towards efficient biocuration. Results: By utilizing literature cross-reference information provided by NCBI Entrez Gene database, we found that the coverage of gene/protein databases with respect to gene/protein names found in text is around 94%. The upper bound of the recall in retrieving MEDLINE citations by gene/protein names is around 70-80% when citations crossreferred by many genes are overlooked and flexible matching of names are used. Of genes/proteins failed to be retrieved by names, over 30% are caused by citations not discussing cross-referred genes/proteins in the abstracts and around 60% are caused by the gene/protein name tagging system trained on the BioCreAtIvE II gene mention corpus.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Conclusions: The study demonstrates that
existing gene/protein databases have a decent
coverage of gene/protein names used in
MEDLINE abstracts. Approaches and data
resources for gene/protein tagging and mapping
need to be selected appropriately for
individual practical tasks.
Literature mining has become important part of
modern biomedical research and involved in
tasks ranging from helping biologists retrieve
research articles to automatically extracting
designated types of information from articles
        <xref ref-type="bibr" rid="ref7 ref8 ref9">(Krallinger and Valencia 2005; Krallinger,
Valencia et al. 2008)</xref>
        . One of the practical
applications of biomedical literature mining is to detect
articles describing a specific gene or protein.
Many existing molecular databases provide
literature cross-reference information. For example,
the National Library of Medicine (NLM) began
an initiative to link scientific publications to
Entrez Gene entries via Gene Reference Into
Function (GeneRIF). Similar to the sequence
submission mechanism in GeneBank, GeneRIF records
can be provided by individual researchers. For
protein annotations, the UniProt consortium has
devoted to providing annotation evidences,
including those from literature, during the curation
of protein records.
      </p>
      <p>
        Given the current level of maturity of
biomedical literature mining applications, it may be
difficult to fully automate the knowledge
acquisition at the level that is comparable with expert
curators. However, there have been evidences
that literature mining applications can
significantly boost the efficiency and the quality of the
curators’ work. For example, Textpresso
        <xref ref-type="bibr" rid="ref13">(Muller,
Kenny et al. 2004)</xref>
        is an information retrieval
system that can retrieve sentences from
fulllength articles. The system is equipped with a
semantic classification system consisting of 33
term categories. Target documents retrieved and
stored in the system are pre-processed, and
phrases identified in documents are
automatically labeled with semantic categories. Category
annotation allows users to formulate sentence
retrieval queries that consist of term categories as
well as key phrases. Another retrieval system is
PubSearch (http://pubsearch.org/)
        <xref ref-type="bibr" rid="ref3">(Harris, Clark
et al. 2004)</xref>
        , a web-based curation tool for genes,
that allows curators to search for documents
containing designated genes and also to annotate
documents. PreBind
        <xref ref-type="bibr" rid="ref2">(Donaldson, Martin et al.
2003)</xref>
        was designed to support human curation of
BIND, an online database of protein-protein
interaction. In the PreBind system, protein names
and their synonyms were first extracted from
sequence databases, RefSeq and SGD, and
MEDLINE records containing protein names and
their synonyms were retrieved. Then, MEDLINE
citations potentially containing protein
interaction information were identified using a text
categorization system. It reportedly reduced the
duration of the task of extracting protein
interaction information by 70%.
      </p>
      <p>
        Recently, automated gene/protein tagging and
mapping systems have achieved reasonable
performance when species information is provided,
as evidenced in BioCreAtIvE workshops
        <xref ref-type="bibr" rid="ref1 ref11 ref12 ref12 ref4 ref4 ref5 ref5 ref6 ref7 ref7 ref7 ref9 ref9">(Morgan, Hirschman et al. 2004; Hirschman,
Colosimo et al. 2005; Hirschman, Yeh et al.
2005; Krallinger, Leitner et al. 2007; Altman,
Bergman et al. 2008; Krallinger, Morgan et al.
2008; Krallinger, Valencia et al. 2008; Morgan,
Lu et al. 2008)</xref>
        . However, it is not clear how
these systems perform in retrieving articles
relevant to a specific gene or protein. Additionally, it
is not clear how important it is to have a
comprehensive list of gene/protein names and to be able
to handle variant forms of a gene/protein name in
text.
      </p>
      <p>Utilizing literature cross-reference information
provided by Entrez Gene (i.e., GeneRIF), we
designed an experiment to answer the following
several questions related to gene/protein tagging,
and mapping for gene/protein curation:
• what is the coverage of an existing
gene/protein dictionary assembled from
existing databases, BioThesaurus,
regarding to gene/protein names mentioned in
abstracts;
• what is the performance of an existing
gene/protein tagging system,
BioTaggerGM, when evaluated in a practical curation
setting;
• how flexible!matching criteria needs to be
when dictionary lookup is employed for
gene/protein name mapping; and
• what is the upper bound of the recall when
using such automated systems to link
MEDLINE citations to gene/protein
records in databases based on gene/protein
names mentioned in abstracts.</p>
      <p>In the following, we first describe the
resources and systems used in the study. The study
design and assessment method are presented next.
We then present and discuss the results and
conclude the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>The study was designed to utilize existing
resources and systems
2.1</p>
      <sec id="sec-2-1">
        <title>Resources</title>
        <p>Resources used in the data include text and gene
resources available from National Center of
Biotechnology Information (NCBI;
http://www.ncbi.nlm.nih.gov) and gene/protein
terminology resources and tagging systems
available at Lab of Text Intelligence at
Georgetown University
(http://biomine.dbb.georgetown.edu). The
following provides a brief summarization for each
of them.</p>
        <p>MEDLINE is an NLM’s premier
bibliographic database. We used the 2010 distribution
of MEDLINE that contains citations information
over 18 million articles published in the life
science domain.</p>
        <p>
          Entrez Gene
          <xref ref-type="bibr" rid="ref15">(Wheeler, Church et al. 2004)</xref>
          is
NCBI’s database for gene-specific information.
Each gene is given a unique identifier (GENEID)
in the database. Among the information in the
database, literature cross-reference information
of genes is provided in GeneRIF.
        </p>
        <p>
          BioThesaurus
          <xref ref-type="bibr" rid="ref10">(Liu, Hu et al. 2006)</xref>
          is a
webbased thesaurus designed to map protein and
gene names to protein entries in the UniProt
Knowledgebase (UniProtKB) or gene entries in
Entrez Gene. The latest gene-centric version
(July 1, 2010) contains over 11 million names
extracted from 32 molecular biological databases
according to the cross-references provided by
UniProtKB or Entrez Gene, as well as
crossreference information provided in each
individual database.
        </p>
        <p>
          BioTagger-GM
          <xref ref-type="bibr" rid="ref14">(Torii, Hu et al. 2009)</xref>
          is a
gene/protein name tagger utilizing BioThesaurus
and Conditional Random Field (CRF). The
tagger was trained on the training data of the
BioCreAtIvE II gene mention task. The trained CRF
model together with a post-processing module
yielded an F-score over 86% on the test data of
BioCreAtIvE II gene mention.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Data preparation</title>
        <p>We obtain a collection of paired IDs (PMID,
GENEID) from GeneRIF, where a gene record
GENEID has a literature reference PMID. For
each pair, gene/protein names associated with
gene record GENEID are retrieved from
BioThesaurus, and additionally the abstract of the cited
reference PMID is processed by BioTagger-GM
for detection of gene/protein names. For example,
we retrieved two pairs (19570885, 20393) and
(19570885, 20497) from GeneRIF, where two
genes with GENEIDs 20393 and 20497 are
cross-referred with one literature citation with
PMID 19570885. BioTagger-GM identifies
several gene/protein names including one name (i.e.,
“SGK1”) for the gene with GENEID 20393 and
one name (i.e., “NCC”) for the gene with
GENEID 20497.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Name mapping</title>
        <p>For each pair (PMID, GENEID), we used two
approaches to find mappings between names
identified by BioTagger-GM in the abstract of
the cited reference PMID and names of the gene
record GENEID in BioThesaurus. When a
mapping with the best score between a pair of names
is found, the pair is considered to be detectable
through automated approaches. The first
approach is a relaxing method, where exact
matching was tested first and then the name without
the first and/or the last words were tried for
dictionary lookup. Possible number of removed
words is limited to two words. The number of
removed words is recorded as the penalty score
of the mapping ranging from 0 to 2. The second
approach is to use a similarity measure, Jaccard
Index (JI) defined as
. In the formula,
Ci is the set of words. The similarity measured by
JI ranges from 0 to 1. When two names do not
have a shared word, the score is 0, and 1 if
identical. Additionally, a normalization procedure is
used to accommodate variations caused by
lexical variants of words and punctuation marks.
Specifically, punctuation marks are ignored, all
letters are lower-cased, and lexical variants are
normalized based on Specialist Lexicon provided
in the Unified Medical Language System
(UMLS).
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Assessment</title>
        <p>The statistics and assessment measures are
calculated in each of the five groups of abstracts,
grouped according to the number of distinctive
genes referred in an abstract (1, 2:4, 5:16,
17:256, and &gt; 256). We report the coverage of
(PMID, GENEID) pairs on how many candidate
pairs of detected names and GENEIDs are
generated by varying thresholds of scores in each
range.</p>
        <p>There are several causes for those failed to be
mapped:
• Gene not mentioned in the abstract
Any name for a listed gene does not appear
in an abstract, e.g., genes are mentioned in
the full-length article but not in the abstract.</p>
        <p>To estimate the numbers of (PMID, GENEID)
pairs failed to be identified in the full-scale, we
sampled 100 pairs from each group that are
failed to map before removing any word. Figure
1 shows the evaluation interface we built to
analyze the results. Note that BioTagger-GM and
BioThesaurus could fail at the same time (the
second and the third causes listed above). For
example, as shown in Figure 1, given a GeneRIF
pair (1703206, 16410), the name to be detected
in the abstract is “VNR alpha chain”, while
BioTagger-GM failed to tag it and BioThesaurus did
not cover the name, even though a similar name
“vitronectin receptor alpha chain” can be found
in BioThesaurus.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>There are 2,410,237 (PMID, GENEID) pairs
extracted from GeneRIF associated with 469,273
articles with an average of 5.14 genes per article.
Over 90% of the articles have less than five
genes cross-referred. In average, there are 4.88
gene/protein names per abstract identified by
BioTagger-GM. Table 1 and Figure 2 show the
statistics and assessment results for abstracts in
the five groups (i.e., 1, 2:4, 5:16, 17:256, and &gt;
256). In articles with one gene cross-referred,
63.3% of them are identifiable using exact string
matching (i.e., BN0). Generally, the
measurement increases around 10% (e.g., 63.3% BN0 to
74.4% AN0) after string normalization and
around additional 9% (e.g., 83.02% AN1) if a
leading or a trailing word in a name was
removed. For some articles with many genes
crossreferred, the chance of finding their names in the
abstract decreases to almost 0%. The results
obtained using JI are similar to the ones obtained
using the relaxing approach. Figure 2 also shows
the percentage of mapped pairs decreases when
the number of genes cross-referred by the
abstract increases.</p>
      <p>Table 2 shows the distribution of the causes
for failed mapping of pairs. Two analysts agreed
most of the times with the causes. Note that some
pairs can have two causes of failed mapping:
“BioTagger-GM failed” and “Not in
BioThesaurus” (11 pairs in Group 1 and 5 in Group 2:4).
When the number of genes cross-referred is less
than 5, around 34% of them could not be
identified because the genes are not mentioned in the
abstract and around 60% of them are failed
because of BioTagger-GM failures. For abstracts
with many genes cross-referred, the dominant
cause of pairs failed to be mapped is genes not
mentioned in abstracts.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>We assessed gene/protein entity tagging and
mapping in article retrieval to assist gene/protein
curations by utilizing literature cross-reference
information provided in GeneRIF and publicly
available resources.</p>
      <p>The results suggest that papers linked to many
genes very rarely contain the corresponding
names in their abstracts. In fact, most of the
papers associated with more than 256 genes (the
fifth group) are either about genome sequencing
projects or about databases, which would be
irrelevant to the curation of particular genes or
proteins.</p>
      <p>The manual evaluation indicates that
gene/protein names listed in databases are
comprehensive in including gene/protein names
mentioned in text. Among the pairs failed to be
mapped in Group 1 and Group 2:4 (about 30% of
the total), less than 20% of them are caused by
names not in BioThesaurus, which indicates over</p>
      <p>Jaccard Index
94% coverage of names by BioThesaurus (i.e.,
120%×30%=0.94).</p>
      <p>Relaxing dictionary lookup is important for
article retrieval. As observed in Figure 2, there is
an increase up to 10% in the coverage of (PMID,
GENEID) pairs after normalization. Also, an
increase of 20% in coverage is observed when
allowing at most two-word difference. One main
factor of such a big increase is due to additional
modifiers in names detected by BioTagger-GM.
For example, species names frequently occur as
modifiers of gene/protein names in text, while in
BioThesaurus species names seldom occur in the
names. Also, words indicating semantic
categories such as “gene” or “protein” may be present
in the names used in text (e.g., “SMAD2 gene”
vs. “SMAD2”).</p>
      <p>The study demonstrates that among pairs
failed to be mapped in Group 1 and Group 2:4,
over 30% are caused by names not mentioned in
the abstract. It indicates that an upper bound of a
recall is around 91% (1-30%*30%=0.91) for
article retrieval when using abstracts only.</p>
      <p>Among the cases where a name could not be
properly mapped, 60% of them were names not
detected by BioTagger-GM. The tagger failed
even when names were included in BioThesaurus,
although BioThesaurus lookup results are used as
features in the tagger. This might be attributed to
the fact that mere lookup of BioThesaurus yields
a low precision by itself, even though the recall
can be high. Another important consideration is
the definition of “genes/proteins”. The
annotation guidelines of genes/proteins for the
BioCreAtIvE corpus may not conform to the notion
of genes/proteins for the purpose of GeneRIF
annotation.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>We have conducted an assessment of the
coverage of gene/protein names in databases with
respect to gene/protein names in text, and
automated gene/protein tagging and mapping for
retrieving articles relevant to specific genes or
proteins. The study demonstrates that existing
gene/protein databases have a decent coverage of
gene/protein names mentioned in the text. The
study provides an upper bound of recall when
using automated methods to retrieve articles. The
study suggests that the most appropriate
approach and data resources to facilitate
gene/protein tagging and mapping needs to be
selected for the specific task in hand.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by NIH
1-R01LM009959-01A1 and NSF CAREER 0845523.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Altman</surname>
          </string-name>
          , R. B.,
          <string-name>
            <surname>C. M. Bergman</surname>
          </string-name>
          , et al. (
          <year>2008</year>
          ).
          <article-title>"Text mining for biology--the way forward: opin-ions from leading scientists</article-title>
          .
          <source>" Genome Biol 9 Suppl</source>
          <volume>2</volume>
          :
          <fpage>S7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Donaldson</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martin</surname>
          </string-name>
          , et al. (
          <year>2003</year>
          ).
          <article-title>"PreBIND and Textomy-mining the biomedical literature for proteinprotein interactions using a support vector machine."</article-title>
          <source>BMC Bioinformatics</source>
          <volume>4</volume>
          (
          <issue>11</issue>
          ):
          <fpage>1471</fpage>
          -
          <lpage>2105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          , et al. (
          <year>2004</year>
          ).
          <article-title>"The Gene On-tology (GO) database and informatics re-source."</article-title>
          <source>Nucleic Acids Res</source>
          <volume>32</volume>
          (
          <string-name>
            <surname>Database</surname>
          </string-name>
          is-sue):
          <fpage>D258</fpage>
          -
          <lpage>61</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hirschman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Colosimo</surname>
          </string-name>
          , et al. (
          <year>2005</year>
          ).
          <article-title>"Overview of BioCreAtIvE task 1B: normalized gene lists</article-title>
          .
          <source>" BMC Bioinformatics 6 Suppl</source>
          <volume>1</volume>
          :
          <fpage>S11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hirschman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yeh</surname>
          </string-name>
          , et al. (
          <year>2005</year>
          ).
          <article-title>"Overview of BioCreAtIvE: critical assessment of informa-tion extraction for biology." BMC Bioinfor-matics 6</article-title>
          <issue>Suppl 1</issue>
          :
          <fpage>S1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Leitner</surname>
          </string-name>
          , et al. (
          <year>2007</year>
          ).
          <article-title>Assessment of the second biocreative PPI task: automatic extraction of protein-protein interactions</article-title>
          .
          <source>Proceedings of the Second Biocreative Chal-lenge Evaluation Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Morgan</surname>
          </string-name>
          , et al. (
          <year>2008</year>
          ).
          <article-title>"Evaluation of text-mining systems for biology: overview of the Second BioCreative community chal-lenge."</article-title>
          <source>Genome Biol 9 Suppl</source>
          <volume>2</volume>
          :
          <fpage>S1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>A</given-names>
            .
            <surname>Valencia</surname>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>"Text-mining and information-retrieval services for mo-lecular biology</article-title>
          .
          <source>" Genome Biol</source>
          <volume>6</volume>
          (
          <issue>7</issue>
          ):
          <fpage>224</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Valencia</surname>
          </string-name>
          , et al. (
          <year>2008</year>
          ).
          <article-title>"Linking genes to literature: text mining, information extraction, and retrieval applications for bi-ology."</article-title>
          <source>Genome Biol 9 Suppl</source>
          <volume>2</volume>
          :
          <fpage>S8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Liu</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            <given-names>ZZ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            <given-names>C.</given-names>
          </string-name>
          <article-title>BioThesaurus: a web-based thesaurus of protein and gene names</article-title>
          .
          <source>Bioinformatics. Jan 1</source>
          <year>2006</year>
          ;
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <fpage>103</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Morgan</surname>
            ,
            <given-names>A. A.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hirschman</surname>
          </string-name>
          , et al. (
          <year>2004</year>
          ).
          <article-title>"Gene name identification and normalization using a model organism database." J Biomed In-form 37(6</article-title>
          ):
          <fpage>396</fpage>
          -
          <lpage>410</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Morgan</surname>
            ,
            <given-names>A. A.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          , et al. (
          <year>2008</year>
          ).
          <article-title>"Overview of BioCreative II gene normalization</article-title>
          .
          <source>" Genome Biol 9 Suppl</source>
          <volume>2</volume>
          :
          <fpage>S3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Muller</surname>
            ,
            <given-names>H. M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>E. E.</given-names>
            <surname>Kenny</surname>
          </string-name>
          , et al. (
          <year>2004</year>
          ).
          <article-title>"Textpres-so: an ontology-based information retrieval and extraction system for biological litera-ture."</article-title>
          <source>PLoS Biol</source>
          <volume>2</volume>
          (
          <issue>11</issue>
          ):
          <fpage>e309</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Torii</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            <given-names>Z</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            <given-names>CH</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>H.</given-names>
          </string-name>
          <article-title>BioTaggerGM: a gene/protein name recognition system</article-title>
          .
          <source>J Am Med Inform Assoc. Mar-Apr</source>
          <year>2009</year>
          ;
          <volume>16</volume>
          (
          <issue>2</issue>
          ):
          <fpage>247</fpage>
          -
          <lpage>255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wheeler</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>D. M. Church</surname>
          </string-name>
          , et al. (
          <year>2004</year>
          ).
          <article-title>"Data-base resources of the National Center for Biotechnology Information: update</article-title>
          .
          <source>" Nucleic Acids Res</source>
          <volume>32</volume>
          (Database issue):
          <fpage>D35</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>