<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Integration of Hybrid Bio-Ontologies using Bayesian Networks for Knowledge Discovery</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ken McGarry¤</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sheila Garfield¤</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nick Morrisy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Wermter¤</string-name>
          <email>stefan.wermterg@sunderland.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Relative frequencies and probabilities calculations for Bayesian Network CPT's</institution>
          ,
          <addr-line>conditional probability tables</addr-line>
        </aff>
      </contrib-group>
      <kwd-group>
        <kwd>relating to insulin resistance</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The large amounts of genomic and proteomic data that are
generated by biological experiments is now enabling deeper
insights into cellular and molecular function. New
technologies such as microarrays and electrophoresis gels are
providing vast quantities of experimental data at unprecedented
rates. All of this information needs to be stored and carefully
annotated. With each new experiment providing details of
new protein-to-protein interactions, new biological pathways
and new genes it is essential that these discoveries are made
available to the scientific community. To this end, online
scientific databases are now in place that disseminate these
results. These databases such as the popular Gene Ontology
(GO) are updated at intervals to reflect the latest
developments [Ashburner, 2000].</p>
      <p>The updating is done by experts who manually revise each
entry by reading the research literature and annotating the
database collections accordingly. If necessary, they will
contact the experimenters to resolve any ambiguities or problems.
In terms of data quality, the databases are quite reliable and
robust. Unfortunately, hand annotation is a slow process and
the databases are lagging behind the experimental work by a
considerable margin. This prevents researchers from
immediately accessing the most recent discoveries.</p>
      <p>Unless the researchers are familiar with the journals where
the new results are published, they would be unlikely to
encounter this information. Given, the fragmented and highly
specialized nature of biological research, this may seldom
occur. Therefore the need for automated extraction of
knowledge from the literature is well motivated. However, recent
advances in text analytics combines techniques from
information retrieval (IR) and information extraction (IE) which
allows researchers to explore the relevant literature more
effectively [Mack and Henenberger, 2002]. However, these
techniques require knowledge discovery methods to uncover
complex embedded structures, relationships and connections
between seemingly unrelated facts that typically exist in the
biomedical literature [Tiffin et al., 2005].</p>
      <p>Validate with
existing knowledge of
pathways and
interactions
biomedical
ontologies,
knowledge bases
(GO,BIND &amp;</p>
      <p>KEGG)</p>
      <p>Gaps in knowledge
defined and experimental
procedures to follow
inference with BN on
proteins/genes without
annotations</p>
      <p>PUBMED
biomedical
text</p>
      <p>CPT's
Transfer of knowledge into
Bayesian Network format</p>
      <p>Our particular research area is that of diabetes, in
particular the effects of insulin resistance on protein expression
and insulin regulated protein trafficking in fat cells. In
recent years there has been a dramatic worldwide increase of
those suffering with diabetes. In the year 2000, there were
171 million cases and by 2030 the World Health Organization
(WHO) has predicted there will be 366 million people
suffering from this condition (www:who:int=diabetes=f acts=).
The WHO data is for diagnosed cases but the undiagnosed
cases are estimated by the WHO at 14.6 million alone for the
US.</p>
      <p>In this paper we present our results of how we
automatically generate a viable ontology based on information
extraction of keywords from the research literature. The keywords
define the entities and relationships of important genes, gene
relationships, protein-to-protein interactions operate and
coexist in biological processes related to insulin resistance.
Furthermore, the ontology is cast within a probabilistic
framework using Bayesian networks which are used for the
inferencing and prediction of protein function. Figure 1 gives
the overall methodology for the extraction of information and
construction of the ontology.</p>
      <p>The remainder of this paper is structured as follows;
section two outlines our information extraction scheme for
identifying the entities and relationships of interest, section three
provides an overview of biological ontologies and gives
details of how we use Bayesian networks for inference and
reasoning. Section four discusses our methodology and
experimental results, section five reviews the related work and our
claim for novelty and finally section six presents the
conclusions.</p>
      <p>The algorithm encodes through regular expressions
templates for recognizing the types of “action” words that
typically occur in biological texts. We discuss this process in
more detail in section 4. However, the main problem that our
algorithm considers is to discover in advance the kind of
information that can be encountered. Rather than attempt to
parse the entire corpus we exploit certain linguistic
regularities and search for specific semantic relations that need only
be defined once. The algorithm takes into account a
variable distance between related terms i.e. longer passages of
text, and therefore provides a much more reliable
identification of the relationships. Seeking up two words difference has
empirically shown to be a reasonable trade-off of accuracy
versus computational complexity. Examples of relationships
include:
² A inhibits B
² A activates B
² A interacts with B
² A suppresses B
3</p>
    </sec>
    <sec id="sec-2">
      <title>Biological Ontologies and Bayesian</title>
    </sec>
    <sec id="sec-3">
      <title>Networks</title>
      <p>In this section we briefly motivate the need for ontologies and
2 Information Extraction define their limitations with respect to the biological field and
for knowledge discovery. Ontologies describe the concepts
Unstructured text is a very flexible and powerful means of and relationships that exist for a particular area of interest.
communication, it allows us to describe quite complex con- They are very useful for the semantic labeling of concepts
cepts. The semantic meaning of a sentence can be expressed or definitions [Grivell, 2002; Bard and Rhee, 2004]. This
in many different ways but it is this flexibility which is the process ensures that entities which are equivalent to other
encause of difficulty for algorithmic sentence analysis by com- tities in separate databases are identified as referring to the
puters. One technique of overcoming this problem is to use same concepts. Even if these entities have different names or
information extraction (IE) to seek out the important entities forms they can still be identified by semantic labeling. The
in the text and the relationships between them [Hearst, 1992; role of semantics therefore is much deeper than matching the
Rosario and Hearst, 2004]. The IE process can involve encod- co-occurrence of a tag or label, since it defines the
relationing patterns by hand such as regular expressions to search for ship that exists between concepts. Figure 3 shows the
structhe required entities and relations or to use semi-automated ture and elements of the gene ontology that are pertinent to
machine learning techniques [Nahm and Mooney, 2002; our study. The first entry refers to GO:0008150 and is one of
Krauthammer and Nenadic, 2004]. The algorithm we devel- the three top level structures (biological process,
physiologioped is shown in figure 2. cal process and cellular process) in the gene ontology
hierarchy; the last number (GO:0015758) defines the relationships
for the glucose transport pathway. The numbers in brackets
Inputs: Abstract file A, String str refer to the number of entries at that particular level.
Outputs: Keyword file B The use of ontologies in biology for the semantic
integration of heterogeneous data is receiving increased attention,
Load file A however problems occur because of the dynamic, changing
WhRileemuonvperoencedsosefdlin“eabcshtarraacctste”rsin A nature of biological knowledge [McGarry et al., 2006]. These
Read each line into str difficulties arise from the highly complex structures that are
Search string for concept term expensive and problematic to update and maintain [Blaschke
If contains phrase (the j a j an) + 2words(and j) + 2words and Valencia, 2002]. Another, related problem is that current
write word preceding key phrase and string after key phrase to B ontologies have a rather limited vocabulary and cannot
exelseif str contains phrase (the j a j an) + 1word(and j) + 2words press the richness of biological information. Little attention
write word preceding key phrase and string after key phrase to B has been paid to defining the relations, much of the research
elseif str contains phrase (the j a j an) + 2words effort and complexity of structure has concentrated on
definwrite word preceding key phrase and string after key phrase to B ing the terms. Other considerations that are important are the
close A and B spatial and temporal characteristics of the entities.</p>
      <p>Furthermore, ontologies such DAML+OIL, OWL and</p>
      <p>Figure 2: Information extraction algorithm RDF are based on crisp logic and have difficulty managing
uncertainty; incomplete data and noisy information that is
encountered in many domains, especially the bioinformatic
field. Our research is concerned with Type 2 diabetes, in
order to develop a suitable ontology it is necessary to identify
the relevant entities within the domain, their attributes and the
relationships that exist between these entities.</p>
      <sec id="sec-3-1">
        <title>3.1 Bayesian networks for Ontology Inference and</title>
      </sec>
      <sec id="sec-3-2">
        <title>Integration</title>
        <p>The integration of sub-symbolic and symbolic computation
has received considerable interest over the years [McGarry et
al., 1999]. Within this framework the Bayesian approach can
be seen as both a learning mechanism and as a knowledge
representation technique.</p>
        <p>Bayes theorem is shown in equation 1 and presents the
probability of the hypothesis (H) conditionalised on evidence
(E).</p>
        <p>P (H j E) =</p>
        <p>P (E j H)P (H)</p>
        <p>P (E j H)P (H) + P (E j :H)P (:H)
where: P (H j E) defines the probability of a hypothesis
conditioned on certain evidence, P (E j H) is the likelihood,
P (H) is the probability of the hypothesis prior to obtaining
any evidence, is the P (E) evidence. Therefore, according to
Bayesian theory we can update our beliefs regarding the
hypothesis when provided with new evidence that is conditional
upon using probabilities and is called conditionalization.</p>
        <p>The conditional probability distributions (CPD) are
described by P (Xi j Ui), where Xi represents node i and Ui are
its parent nodes. We must specify the prior probabilities of the
nodes and the conditional probabilities of the nodes given all
the combinations of their ancestor nodes. The joint
distribution of random variables is given by X = fX1; :::; Xng and
together with the CPD values is used to calculate the choice
of Xi and is given by :</p>
        <p>P (X1; :::; Xn) =</p>
        <p>Y P (Xi j Ui)
i</p>
        <p>The CPD’s values are easy enough to calculate and
inference but require the number of parameters is dependent upon
(2)</p>
        <p>Diagnostic reasoning</p>
        <p>Query
ACE
ADRB3</p>
        <p>GLUT4
Query
GLUT1</p>
        <p>Evidence
(1)
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Methods and Results</title>
      <p>the number of parent nodes, they are usually represented in
table format. The nodes are assumed to be discrete or
categorical values, however, continuous values may be discretised
[Korb and Nicholson, 2004].</p>
      <p>P (X1; :::; Xn) = 1 Y ¼j [Cj ]</p>
      <p>Z
j</p>
      <p>(3)
ACE</p>
      <p>Predictive reasoning</p>
      <p>Evidence</p>
      <p>GLUT4
ADRB3
Query</p>
      <p>GLUT1
Query
IR
IR</p>
      <p>Query</p>
      <p>In figure 4, the various possibilities for inferencing are
shown within the insulin resistance domain. The first
network shows the diagnostic reasoning approach which enables
the relationships between symptoms and causes to be
evaluated, thus when given some evidence regarding the presence
of Glut4 we can update our beliefs about the likelihood of
IR being present. When using predictive reasoning we can
derive new information about effects given some new
information regarding the causes.</p>
      <p>We reviewed the literature associated with Type 2 diabetes,
the initial focus associated with protein interaction in diabetes
and from this review a list of “events” indicative of protein
interactions was identified, eg, activate, inhibit and modulate.
This list was used as the starting point to help identify which
entities are involved in each type of action or relation.
After identifying the names of possible event relations the focus
moved to identifying potential entities involved in these
relations. In order to complete this task a suitable dataset was
required. A search of the PubMed database was conducted
and 6113 abstracts, related to Type 2 diabetes were used; this
dataset is used throughout each subsequent stage of this work.
Initially a count was made of the number of times each of the
action words occurred in this sample dataset. Some of the
words, eg, “acetylate” and “destabilize” did not occur at all,
while other words such as “interaction” and “suppression”
occurred more frequently.</p>
      <p>We now explain how the various parts of our system
function together, the information extraction technique
synthesizes the entities and relationships from the literature
abstracts and generates the structure for a specific ontology on
Action Word
acetylate
acetylated
acetylates
acetylation
activate
activated
activates
activation
bind
binding
binds
bound
destabilization
destabilize
destabilized
destabilizes
insulin resistance. We then use the ontologies structure to
build a Bayesian network for the purposes of inference and
prediction of new protein-to-protein interactions. The
relative frequencies of the keywords (entities and relationships)
are used to construct the conditional probability tables which
define the parent/child node relationships.
4.1</p>
      <sec id="sec-4-1">
        <title>The Extracted Ontology and Bayesian network</title>
      </sec>
      <sec id="sec-4-2">
        <title>Mapping</title>
        <p>Initially, one of these action words, “interaction” was selected
to identify possible entities involved in a relation. The word
“interaction” however generally forms part of a phrase such
as “interaction between”, “interaction of”, and “interaction
with”, and therefore each of these phrases would be used by
the algorithm to search for potential entities. The first phrase
used was “interaction between”. Examples of the resulting
phrases extracted are provided in the table 2.
Ultimately, the successful application of Bayesian
techniques is dependent on the use of prior knowledge to improve
the estimation of the posterior. If a prior belief exists about a
situation then we can use this information to pre-structure our
BN. For example if a particular gene (IPA) is known to
regulate several target genes (GDH, GL4, HK2), we would then
assign this relationship within the BN by setting the edges
between these two entities and setting the values in the
conditional probability table to define the structural prior
accordingly. This is a powerful strategy, but only when it makes
sense to do so. The application of incorrect beliefs will
produce unreliable estimates of the true posterior regardless of
the abundance of the likelihood evidence. Equation 4 shows
how we modify the BN with prior knowledge (causal
intervention) from the extracted ontology [Chrisman et al., 2003].</p>
        <p>P (Xi;j = z j parM (x); M; µ : Xi;j = Z; :::) = 1
(4)
where parM are the parameters within the model, Xi;j are
the known effects of the parents of a given node, µ is the
conditional probability conditionalized and represents the causal
conditions. The biological knowledge is incorporated into the
BN by specifying the probability for the existence of each
potential connection (edge) between them. We assume
independence between edges and the variables in the BN are also
assumed to be discrete, this ensures that the calculations are
computationally tractable.</p>
        <p>Figure 5 shows the structure of a section of our ontology.
The nodes are the entities and the arcs determine the
relationships between them. The numbers in brackets preceded by
“GO:” are the probabilities of the term occurring in the GO
ontology, the numbers.</p>
        <p>For example the following abstract fragment captures
knowledge about several proteins and their interactions:
“Overexpression of the cytosolic domain of syntaxin 6
did not affect insulin-stimulated glucose transport, but
increased basal deGlc transport and cell surface Glut4
levels. Moreover, the syntaxin 6 cytosolic domain significantly
reduced the rate of Glut4 reinternalization after insulin
withdrawal and perturbed subendosomal Glut4 sorting; the
corresponding domains of syntaxins 8 and 12 were without
effect.”</p>
        <p>We encountered difficulties with negative implications, i.e.
the “did not” and “without effect” phrases negate the
occurrence of the relationship but would be taken by the
information extraction algorithm as a positive relationship. A more
elaborate NLP technique or further crafting of specific regular
expression templates would reduce this effect.
TM:000146</p>
        <p>node 3
TM:000147
is_a
interacts</p>
        <p>with
interacts
with
node 4
TM:000148
We determined a base line accuracy for our system by
“rediscovering” known protein-to-protein interactions from the
literature and validating the relationships through accessing
a number of online database and ontology repositories. The
most up to date and complete is the gene ontology (GO), we
compare extracted relationships from our ontology with the
GO structure. To determine the accuracy, we apply the well
known information retrieval measures of recall and precision.
We define recall as the percentage of entity relations
represented in the GO and correctly identified. We define precision
as the the percentage of relations found in GO and returned
by our system.</p>
        <p>The recall and precision are calculated by:
recall = T P=(T P + T N ),
precision = T P=(T P + F P ),
where: TP=true positives such as , FP= false positives, TN=
true negatives and FN= false negatives.
We should note that certain errors in GO have been
identified, inconsistencies and even spelling mistakes. We have
also identified that certain GO terms are too general and a
more specific term would have been more appropriate. Thus
entries with low semantic similarity but high functional
similarity can be identified. Figure 6 presents the results of a
comparison between the semantic richness between GO and our
extracted ontology. We define the semantic richness measure
to be based on the correlations between functional similarity
and semantic content, a detailed description of this approach
can be found in [Lord et al., 2003].</p>
        <p>The GO ontology structure is extremely limited with
total reliance on 00is a00 type links. This means that a large
amount of semantic information that was originally available
0.9
0.8
from the research articles is missing. We suspect that as
ontologies such as GO increase in the number of entities, the
relationships between will take on increased value. However,
without incorporating the semantic similarity of the entities
any increase in size will reduce the ontology to free text.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Related Work</title>
      <p>Research into the automatic generation of ontologies from
textual data has received limited attention to date, notable
exceptions are the work of Blaschke and Valencia, which used
clustering techniques at a document level [Blaschke and
Valencia, 2002]. The majority of the research attempts to
alleviate partial gaps in the knowledge or to repair incorrect
annotations in existing ontologies [Missikoff et al., 2003;
Wolstencroft et al., 2005]. Using probabilistic techniques to model
ontologies is receiving increased attention but this is for
manually curated ontologies [Mitra et al., 2005; Smith et al.,
2005]. The modeling of biological networks with bayesian
networks using genomic data has seen considerable attention
in recent years [Ong et al., 2002]. The initial work on
integrating heterogeneous data within a bayesian network
framework was led by Friedman and Segal [Friedman et al., 2000;
Segal et al., 2001]. This work proved that Bayesian networks
could be trained on genomic data to reconstruct the
relationships between genes. The work by Pan et al is the most
similar to ours, however the authors used Bayesian networks to
integrate two ontologies from similar problem domains [Pan
et al., 2005]. Comparisons between the semantic similarity
and genetic sequence similarity of ontologies has been
conducted by Lord [Lord et al., 2003]. We found this work
particulary useful as motivation for the development of a richer
vocabulary to define entity relationships.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>The fusion of low level information from sub-symbolic
techniques with logic or higher order structures is critically
dependent on the level of granularity used. The nodes of our
Bayesian networks are robust to semantic topic drift or
catastrophic interference which typically occurs when MLP or
other neural feed-forward techniques are trained in dynamic
situations using heterogeneous data. In the case of our
bioinformatics work we use Bayesian networks to learn from data
but also to map existing ontological relations to new Bayesian
network structures. Clearly, further work is needed,
however, we have extended the current knowledge of
automatically generating and integrating ontologies from low level
data. The utilization of ontologies as a framework for
guiding the knowledge discovery process has to date received
little attention. The experimental results presented in this
paper led us to conclude that a principled approach such as the
Bayesian framework can successfully integrate and represent
heterogeneous data and knowledge.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work was part supported by a Research Development
Fellowship funded by HEFCE and the Biosystems
Informatics Institute (Bii).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Ashburner</source>
          , 2000]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ashburner</surname>
          </string-name>
          .
          <article-title>Gene ontology: tool for the unification of biology</article-title>
          .
          <source>Nature Genetics</source>
          ,
          <volume>25</volume>
          :
          <fpage>25</fpage>
          -
          <lpage>29</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Bard and Rhee</source>
          , 2004]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bard</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Rhee</surname>
          </string-name>
          .
          <article-title>Ontologies in biology: design applications and future challenges</article-title>
          .
          <source>Nature Reviews Genetics</source>
          ,
          <volume>5</volume>
          :
          <fpage>213</fpage>
          -
          <lpage>222</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Blaschke and Valencia</source>
          , 2002]
          <string-name>
            <given-names>C.</given-names>
            <surname>Blaschke</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Valencia</surname>
          </string-name>
          .
          <article-title>Automatic ontology construction from the literature</article-title>
          .
          <source>Genome Informatics</source>
          ,
          <volume>13</volume>
          :
          <fpage>201</fpage>
          -
          <lpage>213</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Chrisman et al.,
          <year>2003</year>
          ]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chrisman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Langley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bray</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pohorille</surname>
          </string-name>
          .
          <article-title>Incorporating biological knowledge into evaluation of causal regulatory hypothesis</article-title>
          .
          <source>In Proceedings of the Pacific Symposium on Biocomputing</source>
          , pages
          <fpage>128</fpage>
          -
          <lpage>139</lpage>
          , Kauai, Hawaii.,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Friedman et al.,
          <year>2000</year>
          ]
          <string-name>
            <given-names>N.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Linial</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Nachman</surname>
          </string-name>
          , and
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Pe'er. Using bayesian networks to analyze expression data</article-title>
          .
          <source>Journal of Computational Biology</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          -4):
          <fpage>601</fpage>
          -
          <lpage>620</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Grivell</source>
          , 2002]
          <string-name>
            <given-names>L.</given-names>
            <surname>Grivell</surname>
          </string-name>
          .
          <article-title>Mining the bibliome: searching for a needle in a haystack?: new computing tools are needed to effectively scan the growing amount of scientific literature for useful information</article-title>
          .
          <source>EMBO Reports</source>
          ,
          <volume>3</volume>
          (
          <issue>31</issue>
          ):
          <fpage>200</fpage>
          -
          <lpage>203</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Hearst</source>
          , 1992]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hearst</surname>
          </string-name>
          .
          <article-title>Automatic acquisition of hyponyms from large text corpora</article-title>
          .
          <source>In Proceedings of the 14th conference on Computational linguistics</source>
          , pages
          <fpage>539</fpage>
          -
          <lpage>545</lpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Korb and Nicholson</source>
          , 2004]
          <string-name>
            <given-names>K.</given-names>
            <surname>Korb</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Nicholson</surname>
          </string-name>
          .
          <source>Bayesian Artificial Intelligence. Chapman</source>
          and Hall/CRC,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[Krauthammer and Nenadic</source>
          , 2004]
          <string-name>
            <given-names>M.</given-names>
            <surname>Krauthammer</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Nenadic</surname>
          </string-name>
          .
          <article-title>Term identification in the biomedical literature</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          ,
          <volume>37</volume>
          :
          <fpage>512</fpage>
          -
          <lpage>526</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Lord et al.,
          <year>2003</year>
          ]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brass</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Goble</surname>
          </string-name>
          .
          <article-title>Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>19</volume>
          :
          <fpage>1275</fpage>
          -
          <lpage>1283</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Mack and Henenberger</source>
          , 2002]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mack</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Henenberger</surname>
          </string-name>
          .
          <article-title>Text-based knowledge discovery: search and mining of life-sciences documents</article-title>
          .
          <source>Drug Discovery Today</source>
          ,
          <volume>7</volume>
          :
          <fpage>11</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>[McGarry</surname>
          </string-name>
          et al.,
          <year>1999</year>
          ]
          <string-name>
            <given-names>K.</given-names>
            <surname>McGarry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wermter</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. MacIntyre.</surname>
          </string-name>
          <article-title>Hybrid neural systems: from simple coupling to fully integrated neural networks</article-title>
          .
          <source>Neural Computing Surveys</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>62</fpage>
          -
          <lpage>93</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>[McGarry</surname>
          </string-name>
          et al.,
          <year>2006</year>
          ]
          <string-name>
            <given-names>K.</given-names>
            <surname>McGarry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Garfield</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Morris</surname>
          </string-name>
          .
          <article-title>Recent trends in knowledge and data integration for the life sciences</article-title>
          .
          <source>Expert Systems: the Journal of Knowledge Engineering</source>
          ,
          <volume>23</volume>
          (
          <issue>5</issue>
          ):
          <fpage>337</fpage>
          -
          <lpage>348</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Missikoff et al.,
          <year>2003</year>
          ]
          <string-name>
            <given-names>M.</given-names>
            <surname>Missikoff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Velardi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Fabriani</surname>
          </string-name>
          .
          <article-title>Text mining techniques to automatically enrich a domain ontology</article-title>
          .
          <source>Applied Intelligence</source>
          ,
          <volume>18</volume>
          :
          <fpage>323</fpage>
          -
          <lpage>340</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Mitra et al.,
          <year>2005</year>
          ]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          .
          <article-title>Ontology mapping discovery with uncertainty</article-title>
          .
          <source>In Fourth International Semantic Web Conference (ISWC)</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>[Nahm and Mooney</source>
          , 2002]
          <string-name>
            <given-names>U.</given-names>
            <surname>Nahm</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Mooney</surname>
          </string-name>
          .
          <article-title>Text mining with information extraction</article-title>
          . In U. Nahm and
          <string-name>
            <given-names>R.</given-names>
            <surname>Mooney</surname>
          </string-name>
          .
          <article-title>Text Mining with Information Extraction</article-title>
          .
          <source>In Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases</source>
          .,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Ong et al.,
          <year>2002</year>
          ]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Glasner</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Page</surname>
          </string-name>
          .
          <article-title>Modelling regulatory pathways in E. coli from time series expression profiles</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>18</volume>
          (
          <issue>1</issue>
          ):
          <fpage>241</fpage>
          -
          <lpage>248</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [Pan et al.,
          <year>2005</year>
          ]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          .
          <article-title>A bayesian network approach to ontology mapping</article-title>
          .
          <source>In ISWC 2005 4th International Semantic Web Conference</source>
          , pages
          <fpage>563</fpage>
          -
          <lpage>577</lpage>
          , Galway, Ireland,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>[Rosario and Hearst</source>
          , 2004]
          <string-name>
            <given-names>B.</given-names>
            <surname>Rosario</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hearst</surname>
          </string-name>
          .
          <article-title>Classifying semantic relations in bioscience texts</article-title>
          .
          <source>In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL2004)</source>
          , pages
          <fpage>430</fpage>
          -
          <lpage>437</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Segal et al.,
          <year>2001</year>
          ]
          <string-name>
            <given-names>E.</given-names>
            <surname>Segal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tasker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gasch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Friedman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Koller</surname>
          </string-name>
          .
          <article-title>Rich probabilistic models for gene expression</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>17</volume>
          (
          <issue>1</issue>
          ):
          <fpage>243</fpage>
          -
          <lpage>252</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>[Smith</surname>
          </string-name>
          et al.,
          <year>2005</year>
          ]
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ceusters</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kohler</surname>
          </string-name>
          .
          <article-title>Relations in biomedical ontologies</article-title>
          .
          <source>Genome Biology</source>
          ,
          <volume>6</volume>
          (
          <issue>5</issue>
          ):
          <fpage>46</fpage>
          -
          <lpage>58</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [Tiffin et al.,
          <year>2005</year>
          ]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tiffin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kelso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Powell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bajic</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Hide</surname>
          </string-name>
          .
          <article-title>Integration of text and data-mining using ontologies successfully selects disease gene candidates</article-title>
          .
          <source>Nucleic Acids Research</source>
          ,
          <volume>33</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1544</fpage>
          -
          <lpage>1552</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [Wolstencroft et al.,
          <year>2005</year>
          ]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wolstencroft</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. McEntire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tabernero</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Brass</surname>
          </string-name>
          .
          <article-title>Constructing ontology-driven protein family databases</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>21</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1685</fpage>
          -
          <lpage>1692</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>