<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Biological Event Extraction using Subgraph Matching</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Haibin</forename><surname>Liu</surname></persName>
							<email>haibin@cs.dal.ca</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Computer Science</orgName>
								<orgName type="institution">Dalhousie University Halifax</orgName>
								<address>
									<region>NS</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Christian</forename><surname>Blouin</surname></persName>
							<email>cblouin@cs.dal.ca</email>
							<affiliation key="aff1">
								<orgName type="department">Faculty of Computer Science</orgName>
								<orgName type="institution">Dalhousie University Halifax</orgName>
								<address>
									<region>NS</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vlado</forename><surname>Kešelj</surname></persName>
							<affiliation key="aff2">
								<orgName type="department">Faculty of Computer Science</orgName>
								<orgName type="institution">Dalhousie University Halifax</orgName>
								<address>
									<region>NS</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Biological Event Extraction using Subgraph Matching</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">20B949369905477DEB188C127A20DC91</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T05:35+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>An important task in biological information extraction is to identify descriptions of biological relations and events involving genes or proteins. We propose a graph-based approach to automatically learn rules for detecting biological events in the literature. The detection is performed by searching for isomorphism between event rules and the dependency graphs of complete sentences. When applying our approach to the datasets of the Task 1 of the BioNLP shared task, we achieved an 37.28% F-score in detecting biological events across 9 event types.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Recent research in information extraction in the biological domain has focused on extracting semantic relations between molecular biology concepts <ref type="bibr" target="#b5">(Fundel et al., 2007)</ref>. State-of-the-art protein annotation methods have achieved reasonable success with a performance of 88% F-score <ref type="bibr" target="#b15">(Wilbur et al., 2007)</ref>. A task of interest is to automatically extract protein-protein interactions (PPI). To date, most of the biological knowledge about these interactions is only available in the form of unstructured text from scientific articles <ref type="bibr" target="#b0">(Abulaish and Dey, 2007)</ref>. The best-performing system from the BioCreative II challenge <ref type="bibr" target="#b7">(Hunter et al., 2008)</ref> only achieved a 29% F-score in identifying protein pairs in a sentence that have a biologically relevant relationship. This suggests that the problem of biological relation extraction is difficult and far from solved.</p><p>Sentences in the biological literature often have long-range dependencies. Therefore, co-occurrence based or surface pattern based shallow analysis on biological texts suffers from either low precision or recall <ref type="bibr" target="#b5">(Fundel et al., 2007;</ref><ref type="bibr" target="#b0">Abulaish and Dey, 2007)</ref>. As a result, full parsing has been explored as the basis for relation extraction to perform intensive syntactical and semantic analysis <ref type="bibr" target="#b0">(Abulaish and Dey, 2007;</ref><ref type="bibr" target="#b5">Fundel et al., 2007;</ref><ref type="bibr" target="#b12">Rinaldi et al., 2007)</ref>. In the BioNLP'09 shared task on biological event extraction <ref type="bibr" target="#b9">(Kim et al., 2009)</ref>, 20 out of the total 24 participating teams resorted to a full parsing strategy, including all top 10 performing teams. However, most of previous work extracts relevant relations based on a limited set of manually designed rules that map interpreted syntactic structures into the semantic relations. We propose an approach to automatically learn rules that characterize a wide range of biological relations and events from a syntactically and semantically annotated corpus, and our approach is also based on full parsing of biological texts.</p><p>More recently, the dependency representation obtained from full parsing, with its ability to reveal longrange dependencies, has shown an advantage in biological relation extraction over the traditional Penn Treebank-style phrase structure trees <ref type="bibr" target="#b10">(Miyao et al., 2009)</ref>. Relations are generally extracted from the dependency representation by two approaches. In one approach, the dependency representation is traversed and paths that contain the relevant terms describing the relations predefined in the rules are extracted as candidate relations <ref type="bibr" target="#b5">(Fundel et al., 2007;</ref><ref type="bibr" target="#b11">Rinaldi et al., 2004)</ref>. In the other, relations are learned from the dependency representation using supervised machine learning based on specialized feature representations or kernels, encoded with dependency paths from the representation <ref type="bibr" target="#b1">(Airola et al., 2008;</ref><ref type="bibr" target="#b2">Björne et al., 2009)</ref>.</p><p>Graphs provide a powerful primitive for modeling biological data such as pathways and protein interaction networks <ref type="bibr" target="#b13">(Tian et al., 2007;</ref><ref type="bibr" target="#b16">Yan et al., 2006)</ref>. Since the dependency representation maps straightforwardly onto a directed graph (de Marneffe and <ref type="bibr" target="#b4">Manning, 2008)</ref>, properties and operations of graphs can be naturally applied to the problem of biological relation extraction. We propose a graph matching-based approach to extract biological events from the scientific literature in tackling the primary task of the BioNLP'09 shared task on biological event extraction. The extraction is performed by matching the dependency representation of automatically learned rules to the de-pendency representation of biological sentences. This process is treated as a subgraph matching problem, which corresponds to the search for a subgraph isomorphic to a rule graph within a sentence graph.</p><p>The rest of the paper is organized as follows: In Section 2, we introduce the BioNLP'09 shared task on event extraction. Section 3 describes our subgraph matching-based event extraction method. Sections 4 elaborates the implementation details. Performance is evaluated in Section 5. Finally, Section 6 summarizes the paper and introduces future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">BioNLP'09 Shared Task</head><p>The BioNLP'09 shared task <ref type="bibr" target="#b9">(Kim et al., 2009)</ref> focused on the recognition of biological events that appear in the biological literature. When a biological event is described in text, we can analyze it by recognizing an event type, the event trigger, one or more event arguments, and the source text (ST ), where the event is described. The source text is composed of tokens, which are defined as finite strings of characters from a finite alphabet. The alphabet is a finite set of symbols Σ. Tokens come from W , the set of all finite strings of characters from Σ, i.e., W = Σ + . The source text is a finite sequence of tokens, i.e., any member of W * . We define a biological event in a way consistent with the shared task, which is as follows:</p><p>Definition 1. A biological event is a four-tuple e = (Type, Trigger, Arguments, ST). ST ∈ W * , called the source text, is a sequence of tokens that contains the event; Type ∈ T e is an event type from a finite set of event types T e ; Trigger is a substring of tokens from ST that signals the event; Arguments is a non-empty, finite set of pairs (l, a) where l ∈ L is a label from a finite set of semantic role labels L, and a is a token from ST, or another biological event.</p><p>For the shared task, T e consists of nine event types defined in Table <ref type="table">1</ref>, and L = {Theme, Cause}. A gold event denotes a biological event where all the information has been manually annotated by domain experts.</p><p>The primary task of the shared task was to detect biological events such as protein binding and phosphorylation, given only the annotation of protein names. It was required to extract type, trigger, and primary arguments of each event. This task is an example of extraction of semantically typed, complex events for which the arguments can also be other events. We focus on the primary task and propose a graph matching-based method to cope with the problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Subgraph Matching-based Event Extraction</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Dependency Representation</head><p>The dependency representation is designed to provide a simple description of the grammatical relationships in a sentence that can be effectively used to extract textual relations <ref type="bibr" target="#b4">(de Marneffe and Manning, 2008)</ref>.</p><p>The dependency representation of a sentence is formed by tokens in the sentence and the binary relations between them. A single dependency relation is represented as relation <ref type="bibr">(governor, dependent)</ref>, where governor and dependent are tokens, and relation is a type of the grammatical dependency relation. A dependency representation is essentially a labeled directed graph, which is named dependency graph.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Event Rule Induction</head><p>A biological event rule is defined as follows:</p><p>Definition 2. A biological event rule is a pair r = (e, G r ). G r = (V r , E r ) is a dependency graph, which characterizes the contextual structure of events. e = (Type, Trigger, Arguments) encodes a detailed event frame, where Type is the event type, Trigger = {(t 1 , v 1 ), (t 2 , v 2 ), • • •} records the event trigger and is a non-empty finite sequence of tokens associated with nodes in G r , i.e., Trigger ∈ (W × V r ) + , and Arguments = {(t 1 , l 1 , v 1 ), (t 2 , l 2 , v 2 ), • • •} records the event arguments and is a non-empty finite sequence of tokens associated with semantic role labels and nodes in G r , i.e., Arguments ∈ (W × L × V r ) + .</p><p>The biological event rules are learned from labeled training sentences using the following induction method. Starting with the dependency graph of each training sentence, the directions of edges are first removed so that the directed graph is transformed into an undirected graph, where a path must exist between any two nodes since the graph is always connected. For each gold event, the shortest dependency path in the undirected graph connecting the event trigger nodes to each event argument node is selected. The union of all shortest dependency paths is then computed for each event, and the original directed dependency representation of the path union is retrieved and used as the graph representation of the event.</p><p>For multi-token event triggers, the shortest dependency path connecting the node of every trigger token to the node of each event argument is selected, and the union of the paths is then computed for each trigger. For regulation events, when a sub-event is used as an argument, only the type and the trigger of the sub-event are preserved as the argument of the main events. The shortest dependency path is extracted so as to connect the trigger nodes of the main event to the trigger nodes of the sub-event. In case that there exists more than one shortest path, all of the paths are considered. As a result, each gold event is transformed into the form of a biological event rule. The obtained rules are categorized in terms of the nine event types of the task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Sentence Matching</head><p>We propose a sentence matching approach to attempt to match event rules to each testing sentence. Since the event rules and the sentences all possess a dependency graph, the matching process is a subgraph matching problem, which corresponds to the search for a subgraph isomorphic to an event rule graph within the graph of a testing sentence. This problem is also called subgraph isomorphism, defined in this work as follows:</p><formula xml:id="formula_0">Definition 3. An event rule graph G r = (V r , E r ) is isomorphic to a subgraph of a sentence graph G s = (V s , E s ), denoted by G r ∼ = S s ⊆ G s , if there is an injective mapping f : V r → V s such that, for every directed pair of nodes v i , v j ∈ V r , if (v i , v j ) ∈ E r then (f (v i ), f (v j )) ∈ E s ,</formula><p>and the edge label of (v i , v j ) is the same as the edge label of (f (v i ), f (v j )).</p><p>The subgraph isomorphism problem is NP-complete <ref type="bibr" target="#b3">(Cormen et al., 2001)</ref>. Considering that the graphs of rules and sentences involved in our matching process are small, a simple subgraph matching algorithm using a backtracking approach is appropriate. It is named "Injective Graph Embedding Algorithm" and designed based on the Huet's graph unification algorithm <ref type="bibr" target="#b6">(Huet, 1975)</ref>. The main and the recursive part of the algorithm are formalized in Algorithm 1 and Algorithm 2.</p><p>For each sentence, the algorithm returns all the matched rules together with the corresponding injective mappings from rule nodes to sentence tokens. Biological events are then extracted by applying the event descriptions of tokens in each matched rule such as the type to the corresponding tokens of the sentence. In practice, it only takes the algorithm a couple of seconds to return the results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Implementation</head><p>We assume that a sentence is a suitable level of text granularity in event extraction. The target text is first segmented into sentences. Then, each sentence is tokenized with whitespace separating tokens. We require that every protein be separated from surrounding text and become one individual token. All the protein</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Algorithm 1 Main algorithm</head><p>Input: Dependency graph of a testing sentence s, Gs = (Vs, Es)</p><p>where Vs is the set of nodes and Es is the set of edges of the graph; a finite set of biological event rules R = {r1, r2, names are replaced with a unified tag "BIO Entity".</p><p>GENIA tagger <ref type="bibr" target="#b14">(Tsuruoka et al., 2005)</ref> is used to associate each word in the tokenized sentences with its most likely Part-of-Speech tag. The POS-tagged sentences are submitted to the Stanford unlexicalized natural language parser (Klein and Manning, 2003) to analyze the syntactic and semantic structure of the sentences. The Stanford parser returns a dependency graph for each sentence after parsing.</p><p>For each gold event, the shortest path in the undirected graph connecting the event trigger to each event argument is extracted using the Dijkstra's algorithm <ref type="bibr" target="#b3">(Cormen et al., 2001)</ref> with equal weight for edges. Sentence matching is performed following the procedure of Algorithm 1 and Algorithm 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Results and Evaluation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Dataset</head><p>We use the BioNLP'09 Shared Task datasets for evaluation. A training set and a development set are provided for the purpose of developing task solution. They are prepared based on the publicly available portion of the GENIA event corpus <ref type="bibr" target="#b8">(Kim et al., 2008)</ref> with the gold protein annotation and the gold event annotation given. A testing set is prepared from a held-out part of the corpus and provided without the gold event annotation.</p><p>Table <ref type="table">1</ref> shows the nine event types considered in the shared task. Since these types are all related to protein biology, they take proteins (P) as their theme. Regulation events always take a theme argument and, when expressed, also a cause argument. As a unique feature of the shared task, regulation events may take another event (E), namely sub-event, as its theme or cause.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Rule Induction Results</head><p>For training data, only sentences that contain at least one protein and one event are considered candidates for further processing. For testing data, candidate sentences contain at least one protein. Our proposed graph matching-based method focuses on extracting biological events from sentences. Therefore, only sentencebased events are considered in this work. After removing duplicate rules, we obtained 6,435 event rules,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Event type</head><p>Primary arguments 1 Gene expression Theme(P) 2 Transcription Theme(P) 3 Protein catabolism Theme(P) 4 Phosphorylation Theme(P) 5 Localization Theme(P) 6 Binding (Theme(P)) + 7 Regulation Theme(P/E), (Cause(P/E)) ? 8 Positive regulation Theme(P/E), (Cause(P/E)) ? 9 Negative regulation Theme(P/E), (Cause(P/E)) ?</p><p>Table <ref type="table">1</ref>: Event types and primary arguments which are distributed over nine event types.</p><p>We observed that some event rules of an event type are overlapped with rules of other event types. For instance, a Transcription rule is isomorphic to a Gene expression rule in terms of the graph representation and they also share a same event trigger token. In fact, tokens like "gene expression" and "induction" are used as event trigger of both Transcription and Gene expression in training data. Therefore, the detection of some Gene expression events is always accompanied by certain Transcription events.</p><p>In tackling this problem, we processed the rules and built a non-overlapping rule set. When the dependency graphs of two rules across different event types are isomorphic to each other and two rules share a same event trigger token, we keep the rule of the event type in which the trigger token of the rule occurs more frequent as a trigger in the training data, and remove the rule of the other event type from the set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Event Extraction Results on Development Set</head><p>The non-overlapping rule sets in terms of different combinations of matching features are then applied to the 988 candidate development sentences using our graph matching algorithm. Table <ref type="table" target="#tab_3">2</ref> shows the event extraction results based on each feature.</p><p>The least specific matching criterion when matching between rules and sentences is "E", which assumes that, without checking any information about nodes, as long as edge directions and labels are the same, both edges and nodes of a rule and a sentence can match with each other. It achieves the highest recall among all the runs and captures more than half of the gold events in the sentences. However, the precision is quite low, leading to a low F-score as too many false positives are generated due to the disregard of node information.</p><p>As the strictest matching criteria, "E+P+A" requires that the edges (E), the POS tags (P) and all tokens (A) be exactly the same for the edges and the nodes of a rule and a sentence to match with each other. It achieves the highest precision 69.72% and an F-score over 40%. This indicates that a certain number of biological events are described in very similar way in the literature, involving the same grammatical structures and identical contextual contents. Comparing to "P+A", adding the edge features improves the overall precision of event extraction by a large margin, nearly 13%. "E+P+T" requires that edge directions and labels of all edges be identical, POS tags of all tokens be identical, and tokens of only event triggers (T) be identical. It achieves better performance than "E+P+A" when relaxing the matching criteria from all tokens being the same to only event trigger tokens having to be identical. The best 2 of the first 6 runs in Table <ref type="table" target="#tab_3">2</ref> are "E+P+T" and "P+A". Next, we attempted to relax the matching criterion of POS tags for nouns and verbs. For nouns, the plural form of nouns is allowed to match with the singular form, and proper nouns are allowed to match with regular nouns. For verbs, past tense, present tense and base present form are allowed to match with each other. Further, the event trigger tokens are stemmed to their root forms allowing the trigger tokens derived from a same root word to match. "E+P*+T*" and "P*+A+T*" in Table <ref type="table" target="#tab_3">2</ref> demonstrate the improved performance to the above best two runs. These modifications improve the recall but produce many incorrect events, leading to only a small increase on the overall F-score.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Feature</head><p>Prec  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">Event Extraction Results on Testing Set</head><p>We decided to conduct four runs on the testing sentences in terms of 4 features: "E", "E+P+A", "E+P*+T*" and "P*+A+T*". For "E" and "E+P+A", aiming to investigate the highest recall and precision on the testing sentences that can be achieved by our method. "E+P*+T*" achieves the best overall F-score of 37.28% among all the runs. Similarly to the development set, the highest precision 58.64% on the testing sentences is achieved by the strictest matching criteria "E+P+A". The highest recall 52.17% is obtained by the least specific matching criterion "E", indicating that a large amount of biological events is described in quite different grammatical structures in the literature. Although "P*+A+T*" produced the best performance on the development set, it does not perform as well on the testing set. This clearly suggests that when requiring every token to be exactly the same for matching nodes of a rule and a sentence, the event rules have less stable generalization power to capture the underlying events.</p><p>Table <ref type="table" target="#tab_5">4</ref> gives the performance comparison of our method with top-performing teams in the task. The official evaluation shows that our best results would rank 6th in extracting biological events in the testing data compared to the results of the 24 participating teams. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion and future work</head><p>We use dependency graphs to automatically induce biological event rules from annotated events. These rules are then used to extract biological events from the literature. The extraction process is treated as a subgraph matching problem to search for the graph of an event rule within the dependency graph of a sentence. We conducted the experiments to tackle the primary task of the BioNLP shared task, and our method achieves an 37.28% F-score on the testing data in detecting biological events across nine event types.</p><p>In future work, we would like to experiment with more matching criteria when mapping event rules to sentences. We also plan to expand the coverage of event trigger tokens using external lexical resources for new event triggers and synonyms of existing triggers.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>• • • , ri, • • •}, where ri = (ei, Gr i ). Gr i = (Vr i , Er i ) is the dependency graph of ri.</figDesc><table><row><cell cols="3">Output: MR : a set of biological event rules from R matched with</cell></row><row><cell></cell><cell cols="2">s together with the injective mapping</cell></row><row><cell cols="2">Main algorithm:</cell><cell></cell></row><row><cell cols="2">1: MR ← ∅</cell><cell></cell></row><row><cell cols="2">2: for all ri ∈ R do</cell><cell></cell></row><row><cell>3:</cell><cell>str i ← StartNode(Gr i )</cell><cell>//StartNode finds the start</cell></row><row><cell>4:</cell><cell cols="2">//node str i of the rule graph Gr i</cell></row><row><cell>5:</cell><cell cols="2">STs ← {sts 1 , sts 2 , • • • , sts j , • • •}</cell></row><row><cell>6:</cell><cell cols="2">//STs : the set of start nodes of the sentence graph Gs</cell></row><row><cell>7:</cell><cell>for all sts j ∈ STs do</cell><cell></cell></row><row><cell>8:</cell><cell cols="2">create an empty stack σ and push (str i , sts j ) onto</cell></row><row><cell>9:</cell><cell>the stack σ</cell><cell></cell></row><row><cell>10:</cell><cell>IM ← ∅</cell><cell>//IM : records of injective matches</cell></row><row><cell>11:</cell><cell cols="2">//between nodes in Gr i and Gs</cell></row><row><cell>12:</cell><cell cols="2">call MatchNode(σ, rIM, Gr i , Gs)</cell></row><row><cell>13:</cell><cell cols="2">//rIM : reference of IM</cell></row><row><cell>14:</cell><cell></cell><cell></cell></row></table><note>if MatchNode returned TRUE then 15: MR ← MR ∪ {ri with IM } 16: return MR</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2 :</head><label>2</label><figDesc>Event extraction of non-overlapping set on development set using different features</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head></head><label></label><figDesc>Table3gives the event extraction results on the 1,670 testing sentences in terms of the 4 features.</figDesc><table><row><cell>Feature</cell><cell cols="3">Prec.(%) Recall(%) F-score(%)</cell></row><row><cell>E</cell><cell>0.84</cell><cell>52.17</cell><cell>1.65</cell></row><row><cell>E+P+A</cell><cell>58.64</cell><cell>26.02</cell><cell>36.05</cell></row><row><cell>E+P*+T*</cell><cell>41.77</cell><cell>33.66</cell><cell>37.28</cell></row><row><cell>P*+A+T*</cell><cell>39.61</cell><cell>32.18</cell><cell>35.51</cell></row><row><cell cols="4">Table 3: Event extraction of non-overlapping set on testing</cell></row><row><cell cols="3">sentences using different features</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 4 :</head><label>4</label><figDesc>Performance comparison with participating teams</figDesc><table><row><cell>Team</cell><cell cols="3">Prec.(%) Recall(%) F-score(%)</cell></row><row><cell>UTurku</cell><cell>58.48</cell><cell>46.73</cell><cell>51.95</cell></row><row><cell>JULIELab</cell><cell>47.52</cell><cell>45.82</cell><cell>46.66</cell></row><row><cell>ConcordU</cell><cell>61.59</cell><cell>34.98</cell><cell>44.62</cell></row><row><cell>UT+DBCLS</cell><cell>55.59</cell><cell>36.90</cell><cell>44.35</cell></row><row><cell>VIBGhent</cell><cell>51.55</cell><cell>33.41</cell><cell>40.54</cell></row><row><cell>DalhousieU</cell><cell>41.77</cell><cell>33.66</cell><cell>37.28</cell></row><row><cell>UTokyo</cell><cell>53.56</cell><cell>28.13</cell><cell>36.88</cell></row><row><cell>UNSW</cell><cell>45.78</cell><cell>28.22</cell><cell>34.92</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Biological relation extraction and query answering from medline abstracts using ontology-based text mining</title>
		<author>
			<persName><forename type="first">Muhammad</forename><surname>Abulaish</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lipika</forename><surname>Dey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Data &amp; Knowledge Engineering</title>
		<imprint>
			<biblScope unit="volume">61</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="228" to="262" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning</title>
		<author>
			<persName><forename type="first">Antti</forename><surname>Airola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sampo</forename><surname>Pyysalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jari</forename><surname>Björne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tapio</forename><surname>Pahikkala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Filip</forename><surname>Ginter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tapio</forename><surname>Salakoski1</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page">2</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
	<note>Suppl</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Extracting complex biological events with rich graph-based feature sets</title>
		<author>
			<persName><forename type="first">Jari</forename><surname>Björne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Juho</forename><surname>Heimonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Filip</forename><surname>Ginter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Antti</forename><surname>Airola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tapio</forename><surname>Pahikkala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tapio</forename><surname>Salakoski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">BioNLP &apos;09: Proceedings of the Workshop on BioNLP</title>
				<meeting><address><addrLine>Morristown, NJ, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="10" to="18" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Introduction to Algorithms</title>
		<author>
			<persName><forename type="first">H</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Charles</forename><forename type="middle">E</forename><surname>Cormen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ronald</forename><forename type="middle">L</forename><surname>Leiserson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Clifford</forename><surname>Rivest</surname></persName>
		</author>
		<author>
			<persName><surname>Stein</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
			<publisher>The MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The Stanford typed dependencies representation</title>
		<author>
			<persName><forename type="first">Marie-Catherine</forename><surname>De Marneffe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CrossParser &apos;08: Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation</title>
				<meeting><address><addrLine>Morristown, NJ, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Relex-relation extraction using dependency parse trees</title>
		<author>
			<persName><forename type="first">Katrin</forename><surname>Fundel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Robert</forename><surname>Küffner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ralf</forename><surname>Zimmer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="365" to="371" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A unification algorithm for typed lambda-calculus</title>
		<author>
			<persName><forename type="first">P</forename><surname>Gérard</surname></persName>
		</author>
		<author>
			<persName><surname>Huet</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Theor. Comput. Sci</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="27" to="57" />
			<date type="published" when="1975">1975</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Opendmap: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression</title>
		<author>
			<persName><forename type="first">Lawrence</forename><surname>Hunter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhiyong</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">James</forename><surname>Firby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">William</forename><forename type="middle">A</forename><surname>Baumgartner</surname><genName>Jr</genName></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Helen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Philip</forename><forename type="middle">V</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ogren</surname></persName>
		</author>
		<author>
			<persName><surname>Bretonnel Cohen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Corpus annotation for mining biomedical events from literature</title>
		<author>
			<persName><forename type="first">Jin-Dong</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tomoko</forename><surname>Ohta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jun</forename><surname>Ichi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tsujii</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">10</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Overview of bionlp&apos;09 shared task on event extraction</title>
		<author>
			<persName><forename type="first">Jin-Dong</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshinobu</forename><surname>Kano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tomoko</forename><surname>Ohta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sampo</forename><surname>Pyysalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jun'ichi</forename><surname>Tsujii</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dan</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the NAACL-HLT 2009 Workshop on Natural Language Processing in Biomedicine (BioNLP&apos;09)</title>
				<meeting>the NAACL-HLT 2009 Workshop on Natural Language Processing in Biomedicine (BioNLP&apos;09)<address><addrLine>Morristown, NJ, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2003">2009. 2003</date>
			<biblScope unit="page" from="423" to="430" />
		</imprint>
	</monogr>
	<note>ACL &apos;03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Evaluating contributions of natural language parsers to protein-protein interaction extraction</title>
		<author>
			<persName><forename type="first">Yusuke</forename><surname>Miyao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kenji</forename><surname>Sagae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rune</forename><surname>Saetre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Takuya</forename><surname>Matsuzaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jun'ichi</forename><surname>Tsujii</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="394" to="400" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Mining relations in the GENIA corpus</title>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Rinaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerold</forename><surname>Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kaarel</forename><surname>Kaljurand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">James</forename><surname>Dowdall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christos</forename><surname>Andronis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andreas</forename><surname>Persidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ourania</forename><surname>Konstanti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Second European Workshop on Data Mining and Text Mining for Bioinformatics</title>
				<meeting>the Second European Workshop on Data Mining and Text Mining for Bioinformatics<address><addrLine>Pisa, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Mining of relations between proteins over biomedical scientific literature using a deeplinguistic approach</title>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Rinaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerold</forename><surname>Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kaarel</forename><surname>Kaljurand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Hess</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christos</forename><surname>Andronis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ourania</forename><surname>Konstandi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andreas</forename><surname>Persidis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artif. Intell. Med</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="127" to="136" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Saga: a subgraph matching tool for biological graphs</title>
		<author>
			<persName><forename type="first">Yuanyuan</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><forename type="middle">C</forename><surname>Mceachin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Carlos</forename><surname>Santos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><forename type="middle">J</forename><surname>States</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jignesh</forename><forename type="middle">M</forename><surname>Patel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="232" to="239" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Developing a Robust Part-of-Speech Tagger for Biomedical Text</title>
		<author>
			<persName><forename type="first">Yoshimasa</forename><surname>Tsuruoka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yuka</forename><surname>Tateishi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jin-Dong</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tomoko</forename><surname>Ohta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><surname>Mcnaught</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sophia</forename><surname>Ananiadou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jun</forename><surname>Ichi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tsujii</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">LNCS</title>
		<imprint>
			<biblScope unit="volume">3746</biblScope>
			<biblScope unit="page" from="382" to="392" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Biocreative 2. gene mention task</title>
		<author>
			<persName><forename type="first">John</forename><surname>Wilbur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lawrence</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lorraine</forename><surname>Tanabe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Second BioCreative Challenge Evaluation Workshop</title>
				<meeting>Second BioCreative Challenge Evaluation Workshop</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="7" to="16" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Searching substructures with superimposed distance</title>
		<author>
			<persName><forename type="first">Xifeng</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Feida</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jiawei</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Philip</forename><forename type="middle">S</forename><surname>Yu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICDE &apos;06: Proceedings of the 22nd International Conference on Data Engineering</title>
				<meeting><address><addrLine>Washington, DC, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
