<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Structuring mined knowledge for the support of hypothesis generation in molecular biology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Roos</string-name>
          <email>roos@science.uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. Scott Marshall</string-name>
          <email>marshall@science.uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrew P. Gibson</string-name>
          <email>a.p.gibson@uva.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pieter W. Adriaans</string-name>
          <email>adriaans@science.uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Informatics Institute, University of Amsterdam Kruislaan 403</institution>
          ,
          <addr-line>1098 SJ Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Swammerdam Institute for Life Sciences, University of Amsterdam Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Hypothesis generation in the life sciences is an empirical process in which obtaining and structuring knowledge from literature plays a significant role. Text mining and Information Extraction techniques are seen as key for programmatically accessing the knowledge captured in the form of free text. We describe progress towards an application that supports the task of generating a hypothesis about biomolecular mechanisms using Semantic Web technologies and a workflow to carry out text mining in a service-oriented architecture. The output is a semantic model with putative biological relationships that have been extracted from literature, with each relationship linked to the corresponding evidence. We present preliminary data that extends a model for chromatin (de)condensation. The methodology can be used to bootstrap the process of human-guided construction of semantically rich biological models using the results of knowledge extraction processes.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge extraction</kwd>
        <kwd>Hypothesis support</kwd>
        <kwd>Molecular biology</kwd>
        <kwd>Chromatin</kwd>
        <kwd>Web service</kwd>
        <kwd>Workflow</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>OWL</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Conceiving or improving a hypothesis about a biomolecular mechanism usually</title>
      <p>implies integration of various types of information and distillation into a
comprehensible model. This includes information from literature, our own
knowledge, and interpretations of experimental data. Many Web resources such as
Entrez PubMed1 provide such information. However, the difficulty of information
retrieval from literature reveals the scale of today’s information overload: over 17
million biomedical documents are now available from PubMed. Support for
extracting information from these resources is therefore a general requirement, with
many scientists finding it increasingly challenging to ensure that all potentially</p>
    </sec>
    <sec id="sec-2">
      <title>1 http://www.ncbi.nlm.nih.gov/pubmed/</title>
      <p>
        relevant facts are considered whilst forming a hypothesis. Developments in the area of
information extraction promise to deliver applications that will more directly support
the task of hypothesis generation. The general approach requires retrieving relevant
documents, recognizing named entities (e.g. proteins) and their relationships, and
storing results for later inspection [
        <xref ref-type="bibr" rid="ref6">6, 10</xref>
        ].
      </p>
      <p>In this study, we address the question of how the results of a knowledge extraction
procedure should be stored to best support hypothesis conception for experimental
biology. In particular, we focus on epigenetics and chromatin research, where typical
examples are qualitative hypothetical models that attempt to explain the role of
various proteins in changing the level of condensation of DNA as a means to regulate
transcription (see for instance [12]). To support the linking of a knowledge extraction
process to this type of modelling, we present an approach that extracts information
from text and populates an OWL-based knowledge base with the extraction results.
2</p>
      <sec id="sec-2-1">
        <title>Methods and tools for knowledge extraction</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Knowledge extraction was performed by web services from the Adaptive Information</title>
    </sec>
    <sec id="sec-4">
      <title>Discovery Application (AIDA) toolbox, a set of web services and infrastructure being</title>
      <p>
        developed for knowledge extraction and knowledge management in a virtual
laboratory for e-science1. It contains services for document retrieval based on Lucene2
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], entity and relation recognition applying conditional random fields [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and access
to Sesame [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a RDF repository that serves as our knowledge base. Ontologies were
created in Protégé and conform to the OWL1.1 specification.
      </p>
    </sec>
    <sec id="sec-5">
      <title>The general steps of the knowledge extraction process [6, 10] were implemented as a workflow in Taverna [3]. We added steps to provide a likelihood score, cross references to biological databases, and tabular results (Fig. 1). The likelihood of finding a document with query (q) and discovery (d) was calculated by:</title>
      <p>Query
Add query to
semantic model
 log QQDDexp , QDexp  
 Q  / N , in which Q,
 D 
Retrieve documents
from Medline</p>
    </sec>
    <sec id="sec-6">
      <title>D, and QD are the frequencies of documents</title>
      <p>Atdodsdeomcaunmteicnmtso(dIDels) containing q, d, and q and d; QDexp is the</p>
    </sec>
    <sec id="sec-7">
      <title>Extract proteins expected frequency of documents containing</title>
      <p>(Homo sapiens) q and d assuming independence of Q and D;
sAedmdapnrtoicteminosdteol N is the total number of documents in
ranCkainlcguslactoeres MedLine. The workflow further contains a</p>
    </sec>
    <sec id="sec-8">
      <title>Add scores to web service for adding protein name</title>
      <p>Ccrroesasterbefieorloengciceasl semantic model spyronvoindyinmgs UntoiProtthiedenotrifiigeirnsalfor qhuuemryan, arnatd,</p>
    </sec>
    <sec id="sec-9">
      <title>Add cross references and mouse that we also used to filter false</title>
      <p>
        Convert to to semantic model positives. This service, kindly provided by
table (html) Martijn Schuemie, wraps components from
Fig. 1 - Workflow to extract proteins from the text analysis tool Anni2.0 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. At each
literature and store them in a knowledge base.
      </p>
    </sec>
    <sec id="sec-10">
      <title>1 http://adaptivedisclosure.org</title>
    </sec>
    <sec id="sec-11">
      <title>2 http://lucene.apache.org/</title>
      <p>step in the workflow, the results are converted into OWL instance statements in RDF
format in order to populate the ontologies pre-loaded in our knowledge base.</p>
    </sec>
    <sec id="sec-12">
      <title>References to our scientific research objects (ontologies, workflows, AIDA services) are stored as a pack on myExperiment.org that is available for download upon request (http://www.myexperiment.org/packs/27).</title>
      <p>3</p>
      <sec id="sec-12-1">
        <title>Model Representation in OWL</title>
        <sec id="sec-12-1-1">
          <title>3.1 Different types of knowledge</title>
          <p>In order to represent our biological hypothesis, we would like an OWL ontology of
the relevant biological domain entities and their biological relationships. The purpose
of our knowledge extraction procedure is to populate this model with instances. We
would also like to model the evidence that has led to these instances. This leads to a
clash between our intention of enriching a biological model, and representing the
artifacts of a text mining procedure such as ‘term’, ‘interaction assertion’, or ‘term
collocation’. For these, we have concrete instance but that have no direct meaning in
the biological domain. Within our OWL representation, we purposefully kept five
distinct OWL models in order to avoid the conflation of knowledge from the different
stages of our knowledge extraction process. Our models represent:




</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>Biological knowledge for our hypothesis (Protein, Association)</title>
    </sec>
    <sec id="sec-14">
      <title>Documents (Terms, PubMed Identifiers)</title>
    </sec>
    <sec id="sec-15">
      <title>Knowledge extraction process (Workflows, Processes)</title>
    </sec>
    <sec id="sec-16">
      <title>Mined results (Extracted terms, extracted relationships)</title>
    </sec>
    <sec id="sec-17">
      <title>Mapping model to integrate the above through references.</title>
      <p>Decondensed chromatin
HDAC</p>
      <p>HAT
Histone
acetylation
Histone methylation at H3K9
DNA methylation
Condensed chromatin</p>
      <p>Chromatin
condensation
hypothesis
Protein
hasParticipant some
HDAC1
PCAF
hasModelComponent
hasModelComponent
hasModelComponent
hasParticipant
hasParticipant</p>
      <p>Protein
Association
HDAC1-PCAF
interaction</p>
      <sec id="sec-17-1">
        <title>3.1.1 Biological model</title>
      </sec>
    </sec>
    <sec id="sec-18">
      <title>In the context of our example hypothesis (Fig. 2) we start with a minimal set of</title>
      <p>classes for a biological model with proteins and protein-protein associations (Fig. 3).
We cannot directly inspect concrete instances of proteins or their interactions. We
regard instances in the biological model as interpretations of certain observations, in
our case, of text mining results. We also do not consider such instances as biological
facts; they are restricted to a hypothetical model. The evidence for the interpretation is
important, but it is not within the scope of this model. In the case of text mining,
evidence is modeled by the document and text mining models.</p>
      <sec id="sec-18-1">
        <title>3.1.2 Document model</title>
        <p>A model of the structure of documents and statements therein is less ambiguous than
the biological model, because we can directly inspect concrete instances such as
(references to) documents or pieces of text (Fig. 4). We can be sure of the scope of
the model and we can be clear about the distinction between classes and instances
because we computationally process the documents. For our knowledge extraction
experiment, we have created classes for documents, protein or gene terms, and
mentions of associations between proteins or genes. Unfortunately, we cannot make a
distinction between proteins and genes at this stage due to the limits of biological text
mining.</p>
      </sec>
    </sec>
    <sec id="sec-19">
      <title>1 http://www.uniprot.org/uniprot/Q13547</title>
    </sec>
    <sec id="sec-20">
      <title>2 http://www.uniprot.org/uniprot/Q92831</title>
      <p>Protein
or gene
term</p>
      <p>PMID:
15298701
“HDAC1”
“p68”
isComponentOf
isComponentOf
relates
relates
isComponentOf
Association</p>
      <p>term
“associate”
relatesBy
Protein or gene
association
assertion</p>
      <sec id="sec-20-1">
        <title>3.1.3 Text mining model</title>
      </sec>
    </sec>
    <sec id="sec-21">
      <title>Next, we want to structure what we know of the knowledge extraction process that may serve as evidence for the population of our biological model (Fig. 5). The aim of this step is to create assertions about instances of text mining processes, which</title>
      <p>Document
search
query
“HDAC1 AND
chromatin”</p>
      <p>Discovery</p>
      <p>score
searchesWith
4.78</p>
      <p>Text mining</p>
      <p>process
AIDA based
extraction
process</p>
      <p>Retrieved
document</p>
      <p>Discovered
protein or
gene term</p>
      <p>PMID:
15298701
discoveredBy
“p68”</p>
      <p>“interacts”
discoveredBy
hasDiscoveryScore
discoveredBy
discoveredBy</p>
      <p>Discovered
association
term</p>
      <p>Discovered
association
assertion
text model as text mining discoveries. For clarity, property restrictions between classes and
model components of the text mining process are not shown.
process instances of documents that contain instances of terms. In addition, in this
model we represent information about the likelihood of terms and relationships being
found in the literature. We also gain valuable knowledge provenance that can be used
to track down any conflicting statements later on. This allows us to qualify the
uncertainty of the text mining procedure. For more complete knowledge provenance,
we have also created a semantic model representing the implementation of the text
mining process as a workflow of (AIDA) Web Services (not shown).</p>
      <sec id="sec-21-1">
        <title>3.1.4 Mapping model</title>
        <p>At this point, we have a clear framework for the description of our biological domain
and the documents and the text mining results as instances in our document and
process ontologies. The next step is to relate the mined information to the biological
domain model. Our strategy is to initially keep the domain model simple at the class
and object property level, and to map sets of instances from our results to the domain
model. For this, we created an additional mapping model that defines reference
properties between the models (Fig. 6). We can now see that an interaction between
the proteins labeled ‘p68’ and ‘HDAC1’ in our hypothetical model is referred to by a
mention of an association between the terms ‘p68’ and ‘HDAC1’, with a likelihood
score for finding this combination in literature.</p>
      </sec>
    </sec>
    <sec id="sec-22">
      <title>The difficulty of distinguishing between genes and proteins during text mining also presents a problem for mapping to the biological model. When the number of proteins is small enough we may choose to initially map the text mining results to proteins, or we could create a perhaps more factual ‘gene or protein’ class in the biological model.</title>
      <p>BiologicalModel</p>
      <p>references
Chromatin
condensation
hypothesis</p>
      <p>Protein
or gene</p>
      <p>references
Association
HDAC1-p68
association
HDAC1</p>
      <p>p68
references
Fig. 6 – Mined knowledge mapping strategy. Instances from the results set (right) refer to
instances in the domain model (left).</p>
      <sec id="sec-22-1">
        <title>Preliminary results</title>
        <p>The final result of the knowledge extraction workflow is a knowledge base
extended with text mining results captured in OWL. We performed an example
experiment starting with the query ‘HDAC1 AND chromatin’. As a result we could
query our knowledge base to find an instance of our biological hypothesis model and
its partial representation by the input query and its expanded form (35 synonyms were
added for document retrieval). We could further find 257 proteins linked to this model
as putative components. We could also recover that these links were discovered
through 489 protein terms found in 276 documents, and by what process, Web
Service and workflow. The data is per individual: for each we stored its specific links
to other individuals within a domain (e.g. the biological) and between domains. For
instance, NF-KappaB is linked to our initial hypothesis and ‘HDAC1’ within the
biological model, and to its associated term which was found in 10 abstracts. As our
knowledge base grows with instances and different types of evidence we can perform
increasingly interesting queries in search of novel relations with respect to our nascent
hypothesis. A prototypical example is the protein referred to by the term ‘p68’ that
was found to be collocated with the query term ‘HDAC1’ and also in a direct mention
of this interaction in an abstract by Wilson et al. [13], suggesting p68 as a candidate
for investigating its role in relation to HDAC1 and chromatin.
5</p>
      </sec>
      <sec id="sec-22-2">
        <title>Conclusion</title>
        <p>
          We have demonstrated first steps towards automating support for the processes
involved in the formation of scientific hypotheses, particularly in studying
biomolecular mechanisms. Text mining supports a researcher by inspecting more
papers than an individual could and without human bias, while the use of an
OWLbased knowledge base supports exploration of semantic relationships of one or many
experiments. Our focus is on modeling information that is extracted during a
computational experiment, rather than on improving a particular text mining
procedure. The approach is not limited to the modeling of text mining results but
could be applied to the results of other computational experiments. Our method shares
some features with the general task of ontology learning from text [
          <xref ref-type="bibr" rid="ref2 ref9">2, 9</xref>
          ], and that of
populating a predefined ontology with instances obtained from text mining [14].
However, our aim is to provide a method for improving and reusing a biological
hypothesis. We do not aim to construct a comprehensive hierarchy for a domain, nor
are we specifically interested in recall as long as the text mining is reasonably
unbiased. Semantic Web standards and tools allow us to explicitly represent the
biological knowledge, share it as a resource online, and make it interoperable with
other knowledge resources. Models representing provenance add a layer of trust into
the results because the biological assertions are verifiable. It will be interesting to see
how much our approach can make use of the data provenance in future versions of
        </p>
      </sec>
    </sec>
    <sec id="sec-23">
      <title>Taverna [8]. The rich potential of Semantic Web technologies will support the future extension of the domain model to suit more complex knowledge; its exploration hopefully supported by increasingly user friendly query tools and DL-reasoners [11].</title>
    </sec>
    <sec id="sec-24">
      <title>We thank Edgar Meij, Sophia Katrenko, Willem van Hage, and Martijn Schuemie for providing</title>
    </sec>
    <sec id="sec-25">
      <title>Web Services, and the myGrid team and OMII-UK for their support. This work was carried out for the Virtual Laboratory for e-Science project (http://www.vl-e.nl) and BioRange, supported by BSIK grants from the Dutch Ministry of Education, Culture and Science (OC&amp;W). VL-e is part of the ICT innovation program of the Ministry of Economic Affairs (EZ).</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Broekstra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kampman</surname>
          </string-name>
          , A. and
          <string-name>
            <surname>van Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Sesame: A Generic Architecture for Storing and Querying RDF</article-title>
          and
          <string-name>
            <given-names>RDF</given-names>
            <surname>Schema</surname>
          </string-name>
          .
          <source>The Semantic Web - ISWC 2002: First International Semantic Web Conference</source>
          , Vol.
          <volume>2342</volume>
          /
          <year>2002</year>
          . Springer Berlin / Heidelberg, Sardinia, Italy (
          <year>2002</year>
          )
          <fpage>54</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Gomez-Perez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Manzano-Macho</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>An overview of methods and tools for ontology learning from texts</article-title>
          .
          <source>Knowledge Engineering Review</source>
          ,
          <volume>19</volume>
          (
          <issue>3</issue>
          ):
          <fpage>187</fpage>
          -
          <lpage>212</lpage>
          , (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hull</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolstencroft</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stevens</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goble</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pocock</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Oinn</surname>
          </string-name>
          , T.:
          <article-title>Taverna: a tool for building and running workflows of services</article-title>
          .
          <source>Nucl. Acids Res</source>
          .,
          <volume>34</volume>
          (
          <issue>Web Server issue</issue>
          ):
          <fpage>W729</fpage>
          -
          <lpage>W732</lpage>
          , (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Jelier</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuemie</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veldhoven</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dorssers</surname>
            ,
            <given-names>L.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jenster</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kors</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Anni 2.0: a multipurpose text-mining tool for the life sciences</article-title>
          .
          <source>Genome biology</source>
          ,
          <volume>9</volume>
          (
          <issue>6</issue>
          ):
          <fpage>R96</fpage>
          , (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Katrenko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Adriaans</surname>
          </string-name>
          , P.W.:
          <article-title>Using Semi-Supervised Techniques to Detect Gene Mentions</article-title>
          .
          <source>In Proc. Second BioCreative Challenge Workshop</source>
          , Madrid, Spain (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Valencia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Text-mining and information-retrieval services for molecular biology</article-title>
          .
          <source>Genome biology</source>
          ,
          <volume>6</volume>
          (
          <issue>7</issue>
          ):
          <fpage>224</fpage>
          , (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Meij</surname>
            <given-names>E J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>IJzereef L H L.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Azzopardi</surname>
            <given-names>L A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kamps</surname>
            <given-names>J</given-names>
          </string-name>
          .,
          <string-name>
            <surname>de Rijke</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voorhees</surname>
            <given-names>E.M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>P.</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.L.</surname>
          </string-name>
          :
          <article-title>Combining Thesauri-based Methods for Biomedical Retrieval</article-title>
          .
          <source>The Fourteenth Text REtrieval Conference (TREC</source>
          <year>2005</year>
          ).
          <article-title>National Institute of Standards and Technology</article-title>
          . NIST Special Publication (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Missier</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belhajjame</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Goble</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Data lineage model for Taverna workflows with lightweight annotation requirements</article-title>
          .
          <source>IPAW'08</source>
          ,
          <string-name>
            <surname>Salt</surname>
            <given-names>Lake City</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Utah</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Missikoff</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velardi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Fabriani</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Text mining techniques to automatically enrich a domain ontology</article-title>
          .
          <source>Applied Intelligence</source>
          ,
          <volume>18</volume>
          (
          <issue>3</issue>
          ):
          <fpage>323</fpage>
          -
          <lpage>340</lpage>
          , (
          <year>2003</year>
          )
          <fpage>10</fpage>
          .
          <string-name>
            <surname>Natarajan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berrar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hack</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Dubitzky</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Knowledge discovery in biology and biotechnology texts: a review of techniques, evaluation strategies, and applications</article-title>
          . Critical reviews in biotechnology,
          <volume>25</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>31</fpage>
          -
          <lpage>52</lpage>
          , (
          <year>2005</year>
          )
          <fpage>11</fpage>
          .
          <string-name>
            <surname>Ruttenberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rees</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samwald</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Marshall</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          :
          <source>Life Sciences on the Semantic Web: The Neurocommons and Beyond. Brief Bioinform</source>
          ,
          <article-title>(invited paper accepted for publication in HCLS special issue</article-title>
          ), (
          <year>2008</year>
          )
          <fpage>12</fpage>
          .
          <string-name>
            <surname>Verschure</surname>
            ,
            <given-names>P.J.:</given-names>
          </string-name>
          <article-title>Chromosome organization and gene control: it is difficult to see the picture when you are inside the frame</article-title>
          .
          <source>Journal of cellular biochemistry</source>
          ,
          <volume>99</volume>
          (
          <issue>1</issue>
          ):
          <fpage>23</fpage>
          -
          <lpage>34</lpage>
          , (
          <year>2006</year>
          )
          <fpage>13</fpage>
          .Wilson,
          <string-name>
            <given-names>B.J.</given-names>
            ,
            <surname>Bates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.J.</given-names>
            ,
            <surname>Nicol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.M.</given-names>
            ,
            <surname>Gregory</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.J.</given-names>
            ,
            <surname>Perkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.D.</given-names>
            and
            <surname>Fuller-Pace</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.V.</surname>
          </string-name>
          :
          <article-title>The p68 and p72 DEAD box RNA helicases interact with HDAC1 and repress transcription in a promoter-specific manner</article-title>
          .
          <source>BMC molecular biology</source>
          ,
          <volume>5</volume>
          :
          <fpage>11</fpage>
          , (
          <year>2004</year>
          )
          <fpage>14</fpage>
          .
          <string-name>
            <surname>Witte</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kappler</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>C.J.O.</given-names>
          </string-name>
          :
          <article-title>Ontology Design for Biomedical Text Mining</article-title>
          . In: Baker,
          <string-name>
            <given-names>C.J.O.</given-names>
            <surname>andCheung</surname>
          </string-name>
          , K.-H. (eds.):
          <article-title>Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences</article-title>
          . Springer Science+Business Media, New York (
          <year>2007</year>
          )
          <fpage>281</fpage>
          -
          <lpage>313</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>