<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Using biomedical databases as knowledge sources for large-scale text mining</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Fabio</forename><surname>Rinaldi</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computational Linguistics</orgName>
								<orgName type="institution">University of Zurich</orgName>
								<address>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Using biomedical databases as knowledge sources for large-scale text mining</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E594F3F2C2D3E40542404A7BF5659788</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T09:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper we discuss how terminological knowledge extracted from biomedical databases can be used effectively in large-scale processing of the biomedical literature. We briefly present an integrated information extraction and text mining environment which is capable of reliably identifying and disambiguating several categories of relevant domain entities, which can then constitute relevant indexing entries in order to allow efficient retrieval of relevant documents and passages. Additionally the system generates ranked lists of candidate interactions among the detected entities, which can be useful for several purposes, from assisted literature curation to question answering systems.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The rapid increase of novel scientific results in the domain of molecular biology renders it necessary to collect this information in structured repositories, so that it becomes easily accessible to the end users. Well-known databases like UniProt, Mint, IntAct, BioGrid, collect information about proteins and their interactions. PharmGKB <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b11">12]</ref> curates knowledge about the impact of genetic variation on drug response for clinicians and researchers. The Comparative Toxicogenomics Database (CTD) collects interactions between chemicals and genes in order to support the study on the effects of environmental chemicals on health <ref type="bibr" target="#b4">[5]</ref>. A significant amount of manual effort is needed in order to extract from the literature the information required to accurately fill those databases (a process referred to as "curation"). Text mining solutions are increasingly requested to support the process of curation of biomedical databases.</p><p>The OntoGene project<ref type="foot" target="#foot_0">1</ref> focuses on the improvement of biomedical text mining through the usage of advanced natural language processing techniques. Our approach relies upon information delivered by a pipeline of NLP tools, including sentence splitting, tokenization, part of speech tagging, term recognition, noun and verb phrase chunking, and a dependency-based syntactic analysis of input sentences <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b7">8]</ref>. The results of the entity detection feed directly into the process of identification of interactions.</p><p>Different implementations of the OntoGene system have been used for participation in several well-known text mining shared tasks, such as BioCreative, CALBC and BioNLP, obtaining always competitive results. For example, in the BioCreative 2009 challenge the OntoGene system obtained the best results for protein-protein interactions <ref type="bibr" target="#b9">[10]</ref>. More recently, within the scope of the SASEBio project (Semi-Automated Semantic Enrichment of the Biomedical Literature), we have developed a user-friendly interface (ODIN: OntoGene Document INspector) which can be used by database curator to inspect the results of the text mining system. The interface is designed to simplify the interaction of the user with the text mining system, allowing for example modification of incorrect results. The system can then learn based upon this interaction.</p><p>In the rest of this short paper we briefly describe the OntoGene pipeline architecture and the ODIN interface for assisted curation. <ref type="foot" target="#foot_1">2</ref></p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Information Extraction</head><p>Biomedical terminological resources can be leveraged for construction of large-scale knowledge bases. One example is KaBOB (Knowledge Base of Biology), a large RDF store based upon 17 prominent biomedical daabases. KaBOB contains 5.6-billion RDFtriples <ref type="bibr" target="#b0">[1]</ref>. Similar kinds of integrated data networks can be used for knowledge discovery purposes through usage of semantic web technologies (see for example <ref type="bibr" target="#b1">[2]</ref>).</p><p>In our own work we have used such databases as knowledge sources for the process of semi-automated information extraction. In the rest of this section we describe the OntoGene Text Mining pipeline which is used to (a) provide all basic preprocessing (e.g. tokenization) of the target documents, (b) identify all mentions of domain entities and normalize them to database identifiers, and (c) extract candidate interactions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Preprocessing and Detection of Domain Entities</head><p>Several large-scale terminological resources are used by the OntoGene system in order to detect names of relevant domain entities in biomedical literature (proteins, genes, chemicals, diseases, etc.) and ground them to widely accepted identifiers assigned by the original database, such as UniProt Knowledgebase, National Center for Biotechnology Information (NCBI) Taxonomy, Proteomics Standards Initiative Molecular Interactions Ontology (PSI-MI), Cell Line Knowledge Base (CLKB), etc.</p><p>From the original databases we extract preferred names and synonyms for each term, together with its unique identifier. This information is used to annotate the input documents using an efficient lookup procedure. A term normalization step is used to take into account a number of possible surface variations of the terms. The same normalization is applied to the list of known terms at the beginning of the annotation process, when it is read into memory, and to the candidate terms in the input text, so that a matching between variants of the same term becomes possible despite the differences in the surface strings <ref type="bibr" target="#b7">[8]</ref>. For more technical details of the OntoGene terminology recognition process, see <ref type="bibr" target="#b6">[7]</ref>.</p><p>The terminological resource obtained as described above is used to annotate biomedical text in a relatively straightforward way. First, in a preprocessing stage, the input text is transformed into a custom XML format, and sentences and tokens boundaries are identified. For this task, we use the LingPipe tokenizer and sentence splitter which have been trained on biomedical corpora. The tokenizer produces a granular set of tokens, e.g. words that contain a hyphen (such as 'Pop2p-Cdc18p') are split into several tokens, revealing the inner structure of such constructs which would allow to discover the interaction mention in "Pop2p-Cdc18p interaction". Tagging of terms is performed by sequentially processing each token in a sentence and, if it can start a term, annotate the longest possible match (partial overlaps are excluded). In the case of success, all the possible IDs (as found in the term list) are assigned to the candidate term.</p><p>Ambiguity is a serious problem for several types of entities. For example names of some proteins and genes can refer to several different database identifiers. For example, hemoglobin can refer to human hemoglobin or to mouse hemoglobin (or to any other species). Besides, even in humans there are several different types of hemoglobin. Using knowledge about the organisms which are the focus of the experiments described in each paper we can disambiguate to a large extent entities such as proteins and genes. In the OntoGene pipeline we apply an approach which we first described in <ref type="bibr" target="#b2">[3]</ref>. We first create a ranked list of 'focus' organisms based on all mentions of proteins, genes, cell lines and organisms in the paper. In the disambiguation process we remove all the IDs that do not correspond to an organism present in the list. Additionally, the scores provided for each organism can be used in ranking the candidate IDs for each entity. Such ranking is useful in a semi-automated curation environment where the curator is expected to take the final decision. However, it can also be used in a fully automated environment as a factor in ranking any other derived information, such as interactions where the given entity participates.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Detection of Interactions</head><p>Mentions of relevant domain entities in a given text span are used by the OntoGene system to create candidate interactions. The selected text span can vary from a sentence to a larger observation window. Simple co-occurrence in the selected text span is a lowprecision, but high-recall indication of a potential relationship among those entities. In order to obtain better precision the OntoGene system uses the syntactic structure of the sentence, and the global distribution of interactions in the original database. In this section we describe in detail how candidate interactions are ranked by our system, according to their relevance for the original database.</p><p>The OntoGene system creates an initial ranking of the candidate relations from the selected text span using only the frequency of the respective entities with the following formula:</p><p>relscore(e 1 , e 2 ) = (f (e 1 ) + f (e 2 ))/f (E) where f (e 1 ) and f (e 2 ) are the number of times the entities e 1 and e 2 are observed in the abstract, while f (E) is the total count of all identifiers in the abstract. An additional zone-based boost might be used in some cases (e.g. for entities mentioned in the title). The OntoGene pipeline makes use of an internally developed dependency parser <ref type="bibr" target="#b12">[13]</ref> in order to parse all sentences in the input documents. The information derived from the dependency analysis is used to improve on the baseline ranking for candidate interaction. Besides, the syntactic analysis provides useful information for the extraction of the interaction type. Given two terms identified in the same sentence, a collector traverses the tree from each of the two terms upwards to the lowest common parent node, recording all intermediate nodes and dependency paths along the route. An example of such a traversal can be seen in Figure <ref type="figure" target="#fig_0">1</ref>. Such traversals have been used in many PPI applications, they are commonly called tree walks or paths.</p><p>Each candidate interaction is assigned a score, obtained by combining several features, including: (1) Syntactic path, which encodes the information provided by the dependency structure between the two entities in the candidate interaction; (2) Known interaction: in order to better distinguish between 'novel' interactions (more important for the curation process) and 'older' interactions (already known, thus less important for the curation process), we penalize interactions that are already reported in the reference databases, in proportion to their 'age' (date at which the interaction was first reported); (3) Novelty score: we also use linguistic clues in order to to distinguish between sentences that report the results detected by the authors (e.g. "Here we report that...") from sentences that report background results. Interactions in 'novelty' sentences are scored higher than interactions in 'background' sentences; (4) Zoning: different structural zones of the paper have often different levels of relevance. We observed that novel interactions are often mentioned in the abstract and the conclusions, while the introduction and methods section are less likely and therefore get lower scores; (5) Pair salience: the frequency of mentions in the paper of each of the entities in the candidate pair is an important indicator of the relevance of that interaction in the paper. Scores from each feature are then combined and normalized to the [0,1] range, in order to produce a ranking for the candidate interactions.</p><p>The results of the OntoGene text mining system are made accessible through a curation system called ODIN ("OntoGene Document INspector") which allows a user to dynamically inspect the results of the text mining pipeline. An experiment in interactive curation has been performed recently in collaboration with the PharmGKB database <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b11">12]</ref>. The results of this experiment are described in <ref type="bibr" target="#b5">[6]</ref>. <ref type="bibr" target="#b8">[9]</ref> provides fur-Figure <ref type="figure">2</ref>: Entity annotations and candidate interactions on a sample PubMed abstract ther details on the architecture of the system. Figure <ref type="figure">2</ref> shows a screenshot of ODIN.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Conclusion</head><p>In this paper we briefly described the OntoGene text mining system, targeted at the extraction of entities and relationships from the biomedical literature. The OntoGene pipeline leverages upon manually curated resources and is capable of reliably identifying entity and relationships which can optionally be delivered using standard semanticweb formats such as RDF or OWL. The long-term vision of the project is a deeper integration of databases and literature.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Example of sentence analysis and detection of an interaction.</figDesc><graphic coords="4,133.77,124.80,343.69,114.13" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="5,133.77,124.80,343.70,184.77" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.ontogene.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Readers interested in more details are invited to consult the journal publications available from the OntoGene web site.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This research is partially funded by the Swiss National Science Foundation (grant 100014-118396/1) and Novartis Pharma AG, NIBR-IT, Text Mining Services, CH-4002, Basel, Switzerland.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">An ontological representation of biomedical data sources and records</title>
		<author>
			<persName><forename type="first">Michael</forename><surname>Bada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kevin</forename><surname>Livingston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lawrence</forename><surname>Hunter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Bio-Ontologies</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Semantic web for integrated network analysis in biomedicine</title>
		<author>
			<persName><forename type="first">Huajun</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Li</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhaohui</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tong</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lavanya</forename><surname>Dhanapalan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jake</forename><forename type="middle">Y</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Briefings in Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="177" to="192" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">TX Task: Automatic Detection of Focus Organisms in Biomedical Publications</title>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Kappeler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kaarel</forename><surname>Kaljurand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Rinaldi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the BioNLP workshop</title>
				<meeting>the BioNLP workshop<address><addrLine>Boulder, Colorado</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="80" to="88" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Integrating genotype and phenotype information: An overview of the PharmGKB project</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">E</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Easton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fergerson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hewett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Oliver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Rubin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Shafa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Stuart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">B</forename><surname>Altman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Pharmacogenomics Journal</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="167" to="170" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The Comparative Toxicogenomics Database (CTD): a resource for comparative toxicological studies</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Mattingly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Rosenstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">T</forename><surname>Colby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">N</forename><surname>Forrest</surname><genName>Jr</genName></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Boyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Experimental Zoology Part A: Comparative Experimental Biology</title>
		<imprint>
			<biblScope unit="volume">305</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page" from="689" to="692" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Using ODIN for a PharmGKB re-validation experiment</title>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Rinaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Simon</forename><surname>Clematide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yael</forename><surname>Garten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michelle</forename><surname>Whirl-Carrillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Li</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Joan</forename><forename type="middle">M</forename><surname>Hebert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Katrin</forename><surname>Sangkuhl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Caroline</forename><forename type="middle">F</forename><surname>Thorn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Teri</forename><forename type="middle">E</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Russ</forename><forename type="middle">B</forename><surname>Altman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Database: The Journal of Biological Databases and Curation</title>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Terminological resources for text mining over biomedical scientific literature</title>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Rinaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kaarel</forename><surname>Kaljurand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rune</forename><surname>Saetre</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Artificial Intelligence in Medicine</title>
		<imprint>
			<biblScope unit="volume">52</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="107" to="114" />
			<date type="published" when="2011-06">June 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">OntoGene in BioCreative II</title>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Rinaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Kappeler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kaarel</forename><surname>Kaljurand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerold</forename><surname>Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Manfred</forename><surname>Klenner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Simon</forename><surname>Clematide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Hess</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jean-Marc</forename><surname>Von Allmen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pierre</forename><surname>Parisot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Romacker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Therese</forename><surname>Vachon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Genome Biology</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">S13</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
	<note>Suppl</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Relation mining experiments in the pharmacogenomics domain</title>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Rinaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerold</forename><surname>Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Simon</forename><surname>Clematide</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.jbi.2012.04.014</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Biomedical Informatics</title>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">OntoGene in BioCreative II.5</title>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Rinaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerold</forename><surname>Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kaarel</forename><surname>Kaljurand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Simon</forename><surname>Clematide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Therese</forename><surname>Vachon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Romacker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE/ACM Transactions on Computational Biology and Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="472" to="480" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">An Environment for Relation Mining over Richly Annotated Corpora: the case of GENIA</title>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Rinaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerold</forename><surname>Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kaarel</forename><surname>Kaljurand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Hess</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Romacker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page">S3</biblScope>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
	<note>Suppl</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">PharmGKB: Understanding the effects of individual genetic variants</title>
		<author>
			<persName><forename type="first">Katrin</forename><surname>Sangkuhl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dorit</forename><forename type="middle">S</forename><surname>Berlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Russ</forename><forename type="middle">B</forename><surname>Altman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Teri</forename><forename type="middle">E</forename><surname>Klein</surname></persName>
		</author>
		<idno type="PMID">18949600</idno>
	</analytic>
	<monogr>
		<title level="j">Drug Metabolism Reviews</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="539" to="551" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Hybrid Long-Distance Functional Dependency Parsing</title>
		<author>
			<persName><forename type="first">Gerold</forename><surname>Schneider</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
		<respStmt>
			<orgName>Institute of Computational Linguistics, University of Zurich</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Doctoral Thesis</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
