<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Structuring mined knowledge for the support of hypothesis generation in molecular biology</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Marco</forename><surname>Roos</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Informatics Institute</orgName>
								<orgName type="institution">University of Amsterdam</orgName>
								<address>
									<addrLine>Kruislaan 403</addrLine>
									<postCode>1098 SJ</postCode>
									<settlement>Amsterdam</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">M</forename><surname>Scott Marshall</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Informatics Institute</orgName>
								<orgName type="institution">University of Amsterdam</orgName>
								<address>
									<addrLine>Kruislaan 403</addrLine>
									<postCode>1098 SJ</postCode>
									<settlement>Amsterdam</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andrew</forename><forename type="middle">P</forename><surname>Gibson</surname></persName>
							<email>a.p.gibson@uva.nl</email>
							<affiliation key="aff1">
								<orgName type="department">Swammerdam Institute for Life Sciences</orgName>
								<orgName type="institution">University of Amsterdam Amsterdam</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pieter</forename><forename type="middle">W</forename><surname>Adriaans</surname></persName>
							<email>adriaans@science.uva.nl</email>
							<affiliation key="aff0">
								<orgName type="department">Informatics Institute</orgName>
								<orgName type="institution">University of Amsterdam</orgName>
								<address>
									<addrLine>Kruislaan 403</addrLine>
									<postCode>1098 SJ</postCode>
									<settlement>Amsterdam</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Structuring mined knowledge for the support of hypothesis generation in molecular biology</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">9666410C08440C56EB476057CE6965F9</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T03:57+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Knowledge extraction</term>
					<term>Hypothesis support</term>
					<term>Molecular biology</term>
					<term>Chromatin</term>
					<term>Web service</term>
					<term>Workflow</term>
					<term>Semantic Web</term>
					<term>OWL</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Hypothesis generation in the life sciences is an empirical process in which obtaining and structuring knowledge from literature plays a significant role. Text mining and Information Extraction techniques are seen as key for programmatically accessing the knowledge captured in the form of free text. We describe progress towards an application that supports the task of generating a hypothesis about biomolecular mechanisms using Semantic Web technologies and a workflow to carry out text mining in a service-oriented architecture. The output is a semantic model with putative biological relationships that have been extracted from literature, with each relationship linked to the corresponding evidence. We present preliminary data that extends a model for chromatin (de)condensation. The methodology can be used to bootstrap the process of human-guided construction of semantically rich biological models using the results of knowledge extraction processes.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Conceiving or improving a hypothesis about a biomolecular mechanism usually implies integration of various types of information and distillation into a comprehensible model. This includes information from literature, our own knowledge, and interpretations of experimental data. Many Web resources such as Entrez PubMed 1 provide such information. However, the difficulty of information retrieval from literature reveals the scale of today's information overload: over 17 million biomedical documents are now available from PubMed. Support for extracting information from these resources is therefore a general requirement, with many scientists finding it increasingly challenging to ensure that all potentially relevant facts are considered whilst forming a hypothesis. Developments in the area of information extraction promise to deliver applications that will more directly support the task of hypothesis generation. The general approach requires retrieving relevant documents, recognizing named entities (e.g. proteins) and their relationships, and storing results for later inspection <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b9">10]</ref>.</p><p>In this study, we address the question of how the results of a knowledge extraction procedure should be stored to best support hypothesis conception for experimental biology. In particular, we focus on epigenetics and chromatin research, where typical examples are qualitative hypothetical models that attempt to explain the role of various proteins in changing the level of condensation of DNA as a means to regulate transcription (see for instance <ref type="bibr" target="#b11">[12]</ref>). To support the linking of a knowledge extraction process to this type of modelling, we present an approach that extracts information from text and populates an OWL-based knowledge base with the extraction results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Methods and tools for knowledge extraction</head><p>Knowledge extraction was performed by web services from the Adaptive Information Discovery Application (AIDA) toolbox, a set of web services and infrastructure being developed for knowledge extraction and knowledge management in a virtual laboratory for e-science 1 . It contains services for document retrieval based on Lucene 2 <ref type="bibr" target="#b6">[7]</ref>, entity and relation recognition applying conditional random fields <ref type="bibr" target="#b4">[5]</ref>, and access to Sesame <ref type="bibr" target="#b0">[1]</ref>, a RDF repository that serves as our knowledge base. Ontologies were created in Protégé and conform to the OWL1.1 specification.</p><p>The general steps of the knowledge extraction process <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b9">10]</ref> were implemented as a workflow in Taverna <ref type="bibr" target="#b2">[3]</ref>. We added steps to provide a likelihood score, cross references to biological databases, and tabular results (Fig. <ref type="figure" target="#fig_0">1</ref>). The likelihood of finding a document with query (q) and discovery (d) was calculated by:</p><formula xml:id="formula_0">N D Q QD QD QD / , log exp exp                </formula><p>, in which Q, D, and QD are the frequencies of documents containing q, d, and q and d; QD exp is the expected frequency of documents containing q and d assuming independence of Q and D; N is the total number of documents in MedLine. The workflow further contains a web service for adding protein name synonyms to the original query and providing UniProt identifiers for human, rat, and mouse that we also used to filter false positives. This service, kindly provided by Martijn Schuemie, wraps components from the text analysis tool Anni2.0 <ref type="bibr" target="#b3">[4]</ref>. At each  step in the workflow, the results are converted into OWL instance statements in RDF format in order to populate the ontologies pre-loaded in our knowledge base.</p><p>References to our scientific research objects (ontologies, workflows, AIDA services) are stored as a pack on myExperiment.org that is available for download upon request (http://www.myexperiment.org/packs/27).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Model Representation in OWL</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Different types of knowledge</head><p>In order to represent our biological hypothesis, we would like an OWL ontology of the relevant biological domain entities and their biological relationships. The purpose of our knowledge extraction procedure is to populate this model with instances. We would also like to model the evidence that has led to these instances. This leads to a clash between our intention of enriching a biological model, and representing the artifacts of a text mining procedure such as 'term', 'interaction assertion', or 'term collocation'. For these, we have concrete instance but that have no direct meaning in the biological domain. Within our OWL representation, we purposefully kept five distinct OWL models in order to avoid the conflation of knowledge from the different stages of our knowledge extraction process. Our models represent: </p><formula xml:id="formula_1"></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Histone acetylation</head><p>Fig. <ref type="figure">2</ref> -Example biological model: cartoon representation of a hypothesis for a chromatin (de)condensation mechanism. HDAC and HAT refer to enzymes with histone deacetylase activity and histone acetylase activity, respectively. For more details see figure <ref type="figure" target="#fig_2">3</ref> in <ref type="bibr" target="#b11">[12]</ref> on which this figure is based.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1">Biological model</head><p>In the context of our example hypothesis (Fig. <ref type="figure">2</ref>) we start with a minimal set of classes for a biological model with proteins and protein-protein associations (Fig. <ref type="figure" target="#fig_2">3</ref>). We cannot directly inspect concrete instances of proteins or their interactions. We regard instances in the biological model as interpretations of certain observations, in our case, of text mining results. We also do not consider such instances as biological facts; they are restricted to a hypothetical model. The evidence for the interpretation is important, but it is not within the scope of this model. In the case of text mining, evidence is modeled by the document and text mining models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2">Document model</head><p>A model of the structure of documents and statements therein is less ambiguous than the biological model, because we can directly inspect concrete instances such as (references to) documents or pieces of text (Fig. <ref type="figure">4</ref>). We can be sure of the of the model and we can be clear about the distinction between classes and instances because we computationally process the documents. For our knowledge extraction experiment, we have created classes for documents, protein or gene terms, and mentions of associations between proteins or genes. Unfortunately, we cannot make a distinction between proteins and genes at this stage due to the limits of biological text mining. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.3">Text mining model</head><p>Next, we want to structure what we know of the knowledge extraction process that may serve as evidence for the population of our biological model (Fig. <ref type="figure">5</ref>). The aim of this step is to create assertions about instances of text mining processes, which process instances of documents that contain instances of terms. In addition, in this model we represent information about the likelihood of terms and relationships being found in the literature. We also gain valuable knowledge provenance that can be used to track down any conflicting statements later on. This allows us to qualify the uncertainty of the text mining procedure. For more complete knowledge provenance, we have also created a semantic model representing the implementation of the text mining process as a workflow of (AIDA) Web Services (not shown).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.4">Mapping model</head><p>At this point, we have a clear framework for the description of our biological domain and the documents and the text mining results as instances in our document and process ontologies. The next step is to relate the mined information to the biological domain model. Our strategy is to initially keep the domain model simple at the class and object property level, and to map sets of instances from our results to the domain model. For this, we created an additional mapping model that defines reference properties between the models (Fig. <ref type="figure" target="#fig_4">6</ref>). We can now see that an interaction between the proteins labeled 'p68' and 'HDAC1' in our hypothetical model is referred to by a mention of an association between the terms 'p68' and 'HDAC1', with a likelihood score for finding this combination in literature. The difficulty of distinguishing between genes and proteins during text mining also presents a problem for mapping to the biological model. When the number of proteins is small enough we may choose to initially map the text mining results to proteins, or we could create a perhaps more factual 'gene or protein' class in the biological model. The final result of the knowledge extraction workflow is a knowledge base extended with text mining results captured in OWL. We performed an example experiment starting with the query 'HDAC1 AND chromatin'. As a result we could query our knowledge base to find an instance of our biological hypothesis model and its partial representation by the input query and its expanded form (35 synonyms were added for document retrieval). We could further find 257 proteins linked to this model as putative components. We could also recover that these links were discovered through 489 protein terms found in 276 documents, and by what process, Web Service and workflow. The data is per individual: for each we stored its specific links to other individuals within a domain (e.g. the biological) and between domains. For instance, NF-KappaB is linked to our initial hypothesis and 'HDAC1' within the biological model, and to its associated term which was found in 10 abstracts. As our knowledge base grows with instances and different types of evidence we can perform increasingly interesting queries in search of novel relations with respect to our nascent hypothesis. A prototypical example is the protein referred to by the term 'p68' that was found to be collocated with the query term 'HDAC1' and also in a direct mention of this interaction in an abstract by Wilson et al. <ref type="bibr" target="#b12">[13]</ref>, suggesting p68 as a candidate for investigating its role in relation to HDAC1 and chromatin.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>We have demonstrated first steps towards automating support for the processes involved in the formation of scientific hypotheses, particularly in studying biomolecular mechanisms. Text mining supports a researcher by inspecting more papers than an individual could and without human bias, while the use of an OWLbased knowledge base supports exploration of semantic relationships of one or many experiments. Our focus is on modeling information that is extracted during a computational experiment, rather than on improving a particular text mining procedure. The approach is not limited to the modeling of text mining results but could be applied to the results of other computational experiments. Our method shares some features with the general task of ontology learning from text <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b8">9]</ref>, and that of populating a predefined ontology with instances obtained from text mining <ref type="bibr" target="#b13">[14]</ref>. However, our aim is to provide a method for improving and reusing a biological hypothesis. We do not aim to construct a comprehensive hierarchy for a domain, nor are we specifically interested in recall as long as the text mining is reasonably unbiased. Semantic Web standards and tools allow us to explicitly represent the biological knowledge, share it as a resource online, and make it interoperable with other knowledge resources. Models representing provenance add a layer of trust into the results because the biological assertions are verifiable. It will be interesting to see how much our approach can make use of the data provenance in future versions of Taverna <ref type="bibr" target="#b7">[8]</ref>. The rich potential of Semantic Web technologies will support the future extension of the domain model to suit more complex knowledge; its exploration hopefully supported by increasingly user friendly query tools and DL-reasoners <ref type="bibr" target="#b10">[11]</ref>.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 -</head><label>1</label><figDesc>Fig. 1 -Workflow to extract proteins from literature and store them in a knowledge base.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>Biological knowledge for our hypothesis (Protein, Association)  Documents (Terms, PubMed Identifiers)  Knowledge extraction process (Workflows, Processes)  Mined results (Extracted terms, extracted relationships)  Mapping model to integrate the above through references.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 -</head><label>3</label><figDesc>Fig. 3 -Biological domain model for hypothesis support with example instances. HDAC1 1 and PCAF 2 are examples of proteins implied in chromatin (de)condensation and known to interact. In this and following figures, diamonds represent instances, dashed arrows connected from diamonds instance-of relationships. The other dashed arrows represent properties between classes or instances. For clarity inverse relationships are not shown.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 4 -Fig. 5 -</head><label>45</label><figDesc>Fig.<ref type="bibr" target="#b3">4</ref> -Basic ontological model that represents the relationship between documents and terms and statements used in the text.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 6 -</head><label>6</label><figDesc>Fig. 6 -Mined knowledge mapping strategy. Instances from the results set (right) refer to instances in the domain model (left).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Query Retrieve documents from Medline Extract proteins (Homo sapiens) Calculate ranking scores Create biological cross references Convert to table (html)</head><label></label><figDesc></figDesc><table><row><cell>Add query to</cell></row><row><cell>semantic model</cell></row><row><cell>Add documents (IDs)</cell></row><row><cell>to semantic model</cell></row><row><cell>Add proteins to</cell></row><row><cell>semantic model</cell></row><row><cell>Add scores to</cell></row><row><cell>semantic model</cell></row><row><cell>Add cross references</cell></row><row><cell>to semantic model</cell></row><row><cell>1 http://adaptivedisclosure.org</cell></row><row><cell>2 http://lucene.apache.org/</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.ncbi.nlm.nih.gov/pubmed/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>We thank Edgar Meij, Sophia Katrenko, Willem van Hage, and Martijn Schuemie for providing Web Services, and the myGrid team and OMII-UK for their support. This work was carried out for the Virtual Laboratory for e-Science project (http://www.vl-e.nl) and BioRange, supported by BSIK grants from the Dutch Ministry of Education, Culture and Science (OC&amp;W). VL-e is part of the ICT innovation program of the Ministry of Economic Affairs (EZ).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema</title>
		<author>
			<persName><forename type="first">J</forename><surname>Broekstra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kampman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Van Harmelen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2002: First International Semantic Web Conference</title>
				<meeting><address><addrLine>Berlin / Heidelberg, Sardinia, Italy</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2002">2002. 2002</date>
			<biblScope unit="volume">2342</biblScope>
			<biblScope unit="page">54</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">An overview of methods and tools for ontology learning from texts</title>
		<author>
			<persName><forename type="first">A</forename><surname>Gomez-Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Manzano-Macho</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge Engineering Review</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="187" to="212" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Taverna: a tool for building and running workflows of services</title>
		<author>
			<persName><forename type="first">D</forename><surname>Hull</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Wolstencroft</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Stevens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Goble</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Pocock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Oinn</surname></persName>
		</author>
		<idno>W729-W732</idno>
	</analytic>
	<monogr>
		<title level="j">Nucl. Acids Res</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
	<note>Web Server issue</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Anni 2.0: a multipurpose text-mining tool for the life sciences</title>
		<author>
			<persName><forename type="first">R</forename><surname>Jelier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Schuemie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Veldhoven</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">C</forename><surname>Dorssers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Jenster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Kors</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Genome biology</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page">R96</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Using Semi-Supervised Techniques to Detect Gene Mentions</title>
		<author>
			<persName><forename type="first">S</forename><surname>Katrenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">W</forename><surname>Adriaans</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Second BioCreative Challenge Workshop</title>
				<meeting>Second BioCreative Challenge Workshop<address><addrLine>Madrid, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Text-mining and information-retrieval services for molecular biology</title>
		<author>
			<persName><forename type="first">M</forename><surname>Krallinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Valencia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Genome biology</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page">224</biblScope>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Combining Thesauri-based Methods for Biomedical Retrieval</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Meij</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ijzereef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Azzopardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename><surname>Kamps</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>De Rijke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Voorhees</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">L</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Fourteenth Text REtrieval Conference (TREC 2005)</title>
				<imprint>
			<publisher>NIST Special Publication</publisher>
			<date type="published" when="2006">2006</date>
		</imprint>
		<respStmt>
			<orgName>National Institute of Standards and Technology</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Data lineage model for Taverna workflows with lightweight annotation requirements</title>
		<author>
			<persName><forename type="first">P</forename><surname>Missier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Belhajjame</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Goble</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IPAW&apos;08</title>
				<meeting><address><addrLine>Salt Lake City, Utah</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Text mining techniques to automatically enrich a domain ontology</title>
		<author>
			<persName><forename type="first">M</forename><surname>Missikoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Velardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fabriani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Intelligence</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="323" to="340" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Knowledge discovery in biology and biotechnology texts: a review of techniques, evaluation strategies, and applications</title>
		<author>
			<persName><forename type="first">J</forename><surname>Natarajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Berrar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Hack</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Dubitzky</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Critical reviews in biotechnology</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">1-2</biblScope>
			<biblScope unit="page" from="31" to="52" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Life Sciences on the Semantic Web: The Neurocommons and Beyond</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ruttenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Rees</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Samwald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Marshall</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Brief Bioinform</title>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
	<note>invited paper accepted for publication in HCLS special issue)</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Chromosome organization and gene control: it is difficult to see the picture when you are inside the frame</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Verschure</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of cellular biochemistry</title>
		<imprint>
			<biblScope unit="volume">99</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="23" to="34" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">The p68 and p72 DEAD box RNA helicases interact with HDAC1 and repress transcription in a promoter-specific manner</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">J</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J</forename><surname>Bates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Nicol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Gregory</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">D</forename><surname>Perkins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">V</forename><surname>Fuller-Pace</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC molecular biology</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page">11</biblScope>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Ontology Design for Biomedical Text Mining</title>
		<author>
			<persName><forename type="first">R</forename><surname>Witte</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kappler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J O</forename><surname>Baker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences</title>
				<editor>
			<persName><forename type="first">C</forename><forename type="middle">J O</forename><surname>Baker</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K.-H</forename><surname>Andcheung</surname></persName>
		</editor>
		<meeting><address><addrLine>New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer Science+Business Media</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page">281</biblScope>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
