<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Biological Event Extraction using Subgraph Matching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Haibin Liu</string-name>
          <email>haibin@cs.dal.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Blouin</string-name>
          <email>cblouin@cs.dal.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vlado Kesˇelj</string-name>
          <email>vlado@cs.dal.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Computer Science, Dalhousie University</institution>
          ,
          <addr-line>Halifax, NS</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <fpage>110</fpage>
      <lpage>115</lpage>
      <abstract>
        <p>An important task in biological information extraction is to identify descriptions of biological relations and events involving genes or proteins. We propose a graph-based approach to automatically learn rules for detecting biological events in the literature. The detection is performed by searching for isomorphism between event rules and the dependency graphs of complete sentences. When applying our approach to the datasets of the Task 1 of the BioNLP shared task, we achieved an 37.28% F-score in detecting biological events across 9 event types.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Recent research in information extraction in the
biological domain has focused on extracting semantic
relations between molecular biology concepts
        <xref ref-type="bibr" rid="ref6">(Fundel et
al., 2007)</xref>
        . State-of-the-art protein annotation methods
have achieved reasonable success with a performance
of 88% F-score
        <xref ref-type="bibr" rid="ref17">(Wilbur et al., 2007)</xref>
        . A task of
interest is to automatically extract protein-protein
interactions (PPI). To date, most of the biological
knowledge about these interactions is only available in the
form of unstructured text from scientific articles
        <xref ref-type="bibr" rid="ref1 ref17 ref6">(Abulaish and Dey, 2007)</xref>
        . The best-performing system from
the BioCreative II challenge
        <xref ref-type="bibr" rid="ref8">(Hunter et al., 2008)</xref>
        only
achieved a 29% F-score in identifying protein pairs in a
sentence that have a biologically relevant relationship.
This suggests that the problem of biological relation
extraction is difficult and far from solved.
      </p>
      <p>
        Sentences in the biological literature often have
long-range dependencies. Therefore, co-occurrence
based or surface pattern based shallow analysis on
biological texts suffers from either low precision or recall
        <xref ref-type="bibr" rid="ref1 ref17 ref6 ref6">(Fundel et al., 2007; Abulaish and Dey, 2007)</xref>
        . As a
result, full parsing has been explored as the basis for
relation extraction to perform intensive syntactical and
semantic analysis
        <xref ref-type="bibr" rid="ref1 ref14 ref17 ref6 ref6">(Abulaish and Dey, 2007; Fundel et al.,
2007; Rinaldi et al., 2007)</xref>
        . In the BioNLP’09 shared
task on biological event extraction
        <xref ref-type="bibr" rid="ref10">(Kim et al., 2009)</xref>
        ,
20 out of the total 24 participating teams resorted to
a full parsing strategy, including all top 10
performing teams. However, most of previous work extracts
relevant relations based on a limited set of manually
designed rules that map interpreted syntactic structures
into the semantic relations. We propose an approach to
automatically learn rules that characterize a wide range
of biological relations and events from a syntactically
and semantically annotated corpus, and our approach is
also based on full parsing of biological texts.
      </p>
      <p>
        More recently, the dependency representation
obtained from full parsing, with its ability to reveal
longrange dependencies, has shown an advantage in
biological relation extraction over the traditional Penn
Treebank-style phrase structure trees
        <xref ref-type="bibr" rid="ref12">(Miyao et al.,
2009)</xref>
        . Relations are generally extracted from the
dependency representation by two approaches. In one
approach, the dependency representation is traversed
and paths that contain the relevant terms describing the
relations predefined in the rules are extracted as
candidate relations
        <xref ref-type="bibr" rid="ref13 ref6">(Fundel et al., 2007; Rinaldi et al.,
2004)</xref>
        . In the other, relations are learned from the
dependency representation using supervised machine
learning based on specialized feature representations or
kernels, encoded with dependency paths from the
representation
        <xref ref-type="bibr" rid="ref2 ref3">(Airola et al., 2008; Bjo¨rne et al., 2009)</xref>
        .
      </p>
      <p>
        Graphs provide a powerful primitive for modeling
biological data such as pathways and protein
interaction networks
        <xref ref-type="bibr" rid="ref15 ref18">(Tian et al., 2007; Yan et al., 2006)</xref>
        .
Since the dependency representation maps
straightforwardly onto a directed graph
        <xref ref-type="bibr" rid="ref5 ref9">(de Marneffe and
Manning, 2008)</xref>
        , properties and operations of graphs can
be naturally applied to the problem of biological
relation extraction. We propose a graph matching-based
approach to extract biological events from the scientific
literature in tackling the primary task of the BioNLP’09
shared task on biological event extraction. The
extraction is performed by matching the dependency
representation of automatically learned rules to the
dependency representation of biological sentences. This
process is treated as a subgraph matching problem,
which corresponds to the search for a subgraph
isomorphic to a rule graph within a sentence graph.
      </p>
      <p>The rest of the paper is organized as follows: In
Section 2, we introduce the BioNLP’09 shared task
on event extraction. Section 3 describes our subgraph
matching-based event extraction method. Sections 4
elaborates the implementation details. Performance is
evaluated in Section 5. Finally, Section 6 summarizes
the paper and introduces future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>BioNLP’09 Shared Task</title>
      <p>
        The BioNLP’09 shared task
        <xref ref-type="bibr" rid="ref10">(Kim et al., 2009)</xref>
        focused
on the recognition of biological events that appear in
the biological literature. When a biological event is
described in text, we can analyze it by recognizing an
event type, the event trigger, one or more event
arguments, and the source text (ST ), where the event is
described. The source text is composed of tokens, which
are defined as finite strings of characters from a finite
alphabet. The alphabet is a finite set of symbols Σ.
Tokens come from W , the set of all finite strings of
characters from Σ, i.e., W = Σ+. The source text is a
finite sequence of tokens, i.e., any member of W ∗. We
define a biological event in a way consistent with the
shared task, which is as follows:
      </p>
      <p>Definition 1. A biological event is a four-tuple e =
(Type, Trigger, Arguments, ST). ST ∈ W ∗, called the
source text, is a sequence of tokens that contains the
event; Type ∈ Te is an event type from a finite set of
event types Te; Trigger is a substring of tokens from ST
that signals the event; Arguments is a non-empty, finite
set of pairs (l, a) where l ∈ L is a label from a finite set
of semantic role labels L, and a is a token from ST, or
another biological event.</p>
      <p>For the shared task, Te consists of nine event types
defined in Table 1, and L = {Theme, Cause}. A gold
event denotes a biological event where all the
information has been manually annotated by domain experts.</p>
      <p>The primary task of the shared task was to detect
biological events such as protein binding and
phosphorylation, given only the annotation of protein names. It
was required to extract type, trigger, and primary
arguments of each event. This task is an example of
extraction of semantically typed, complex events for which
the arguments can also be other events. We focus on
the primary task and propose a graph matching-based
method to cope with the problem.</p>
      <p>Subgraph Matching-based Event Extraction
3.1</p>
      <sec id="sec-2-1">
        <title>Dependency Representation</title>
        <p>
          The dependency representation is designed to provide a
simple description of the grammatical relationships in
a sentence that can be effectively used to extract textual
relations
          <xref ref-type="bibr" rid="ref5 ref9">(de Marneffe and Manning, 2008)</xref>
          .
        </p>
        <p>The dependency representation of a sentence is
formed by tokens in the sentence and the binary
relations between them. A single dependency relation
is represented as relation(governor, dependent), where
governor and dependent are tokens, and relation is a
type of the grammatical dependency relation. A
dependency representation is essentially a labeled directed
graph, which is named dependency graph.
3.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Event Rule Induction</title>
        <p>A biological event rule is defined as follows:</p>
        <p>Definition 2. A biological event rule is a pair
r = (e, Gr). Gr = (Vr, Er) is a dependency graph,
which characterizes the contextual structure of events.
e = (Type, Trigger, Arguments) encodes a detailed
event frame, where Type is the event type, Trigger =
{(t1, v1), (t2, v2), · · ·} records the event trigger and is
a non-empty finite sequence of tokens associated with
nodes in Gr, i.e., Trigger ∈ (W × Vr)+, and
Arguments = {(t1, l1, v1), (t2, l2, v2), · · ·} records the event
arguments and is a non-empty finite sequence of tokens
associated with semantic role labels and nodes in Gr,
i.e., Arguments ∈ (W × L × Vr)+.</p>
        <p>The biological event rules are learned from
labeled training sentences using the following induction
method. Starting with the dependency graph of each
training sentence, the directions of edges are first
removed so that the directed graph is transformed into an
undirected graph, where a path must exist between any
two nodes since the graph is always connected. For
each gold event, the shortest dependency path in the
undirected graph connecting the event trigger nodes to
each event argument node is selected. The union of all
shortest dependency paths is then computed for each
event, and the original directed dependency
representation of the path union is retrieved and used as the graph
representation of the event.</p>
        <p>For multi-token event triggers, the shortest
dependency path connecting the node of every trigger token
to the node of each event argument is selected, and the
union of the paths is then computed for each trigger.
For regulation events, when a sub-event is used as an
argument, only the type and the trigger of the sub-event
are preserved as the argument of the main events. The
shortest dependency path is extracted so as to connect
the trigger nodes of the main event to the trigger nodes
of the sub-event. In case that there exists more than
one shortest path, all of the paths are considered. As a
result, each gold event is transformed into the form of
a biological event rule. The obtained rules are
categorized in terms of the nine event types of the task.</p>
      </sec>
      <sec id="sec-2-3">
        <title>3.3 Sentence Matching</title>
        <p>We propose a sentence matching approach to attempt
to match event rules to each testing sentence. Since
the event rules and the sentences all possess a
dependency graph, the matching process is a subgraph
matching problem, which corresponds to the search for a
subgraph isomorphic to an event rule graph within the
graph of a testing sentence. This problem is also called
subgraph isomorphism, defined in this work as follows:</p>
        <p>Definition 3. An event rule graph Gr = (Vr, Er)
is isomorphic to a subgraph of a sentence graph Gs =
(Vs, Es), denoted by Gr =∼ Ss ⊆ Gs, if there is an
injective mapping f : Vr → Vs such that, for every
directed pair of nodes vi, vj ∈ Vr, if (vi, vj ) ∈ Er then
(f (vi), f (vj )) ∈ Es, and the edge label of (vi, vj ) is
the same as the edge label of (f (vi), f (vj )).</p>
        <p>
          The subgraph isomorphism problem is NP-complete
          <xref ref-type="bibr" rid="ref4">(Cormen et al., 2001)</xref>
          . Considering that the graphs of
rules and sentences involved in our matching process
are small, a simple subgraph matching algorithm
using a backtracking approach is appropriate. It is named
“Injective Graph Embedding Algorithm” and designed
based on the Huet’s graph unification algorithm
          <xref ref-type="bibr" rid="ref7">(Huet,
1975)</xref>
          . The main and the recursive part of the algorithm
are formalized in Algorithm 1 and Algorithm 2.
        </p>
        <p>For each sentence, the algorithm returns all the
matched rules together with the corresponding
injective mappings from rule nodes to sentence tokens.
Biological events are then extracted by applying the event
descriptions of tokens in each matched rule such as the
type to the corresponding tokens of the sentence. In
practice, it only takes the algorithm a couple of seconds
to return the results.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4 Implementation</title>
      <p>We assume that a sentence is a suitable level of text
granularity in event extraction. The target text is first
segmented into sentences. Then, each sentence is
tokenized with whitespace separating tokens. We
require that every protein be separated from surrounding
text and become one individual token. All the protein</p>
      <sec id="sec-3-1">
        <title>Algorithm 1 Main algorithm</title>
        <p>Input: Dependency graph of a testing sentence s, Gs = (Vs, Es)
where Vs is the set of nodes and Es is the set of edges
of the graph; a finite set of biological event rules R =
{r1, r2, · · · , ri, · · ·}, where ri = (ei, Gri ). Gri =
(Vri , Eri ) is the dependency graph of ri.</p>
        <p>Output: MR : a set of biological event rules from R matched with
s together with the injective mapping
Main algorithm:
1: MR ← ∅
2: for all ri ∈ R do
3: stri ← StartNode(Gri ) //StartNode finds the start
4: //node stri of the rule graph Gri
5: STs ← {sts1 , sts2 , · · · , stsj , · · ·}
6: //STs : the set of start nodes of the sentence graph Gs
7: for all stsj ∈ STs do
8: create an empty stack σ and push (stri , stsj ) onto
9: the stack σ
10: IM ← ∅ //IM : records of injective matches
11: //between nodes in Gri and Gs
12: call MatchNode(σ, rIM, Gri , Gs)
13: //rIM : reference of IM
14: if MatchNode returned TRUE then
15: MR ← MR ∪ {ri with IM }
16: return MR
names are replaced with a unified tag “BIO Entity”.</p>
        <p>
          GENIA tagger
          <xref ref-type="bibr" rid="ref16">(Tsuruoka et al., 2005)</xref>
          is used to
associate each word in the tokenized sentences with its
most likely Part-of-Speech tag. The POS-tagged
sentences are submitted to the Stanford unlexicalized
natural language parser
          <xref ref-type="bibr" rid="ref11">(Klein and Manning, 2003)</xref>
          to
analyze the syntactic and semantic structure of the
sentences. The Stanford parser returns a dependency graph
for each sentence after parsing.
        </p>
        <p>
          For each gold event, the shortest path in the
undirected graph connecting the event trigger to each event
argument is extracted using the Dijkstra’s algorithm
          <xref ref-type="bibr" rid="ref4">(Cormen et al., 2001)</xref>
          with equal weight for edges.
Sentence matching is performed following the procedure of
Algorithm 1 and Algorithm 2.
5
5.1
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results and Evaluation</title>
      <sec id="sec-4-1">
        <title>Dataset</title>
        <p>
          We use the BioNLP’09 Shared Task datasets for
evaluation. A training set and a development set are provided
for the purpose of developing task solution. They are
prepared based on the publicly available portion of the
GENIA event corpus
          <xref ref-type="bibr" rid="ref9">(Kim et al., 2008)</xref>
          with the gold
protein annotation and the gold event annotation given.
A testing set is prepared from a held-out part of the
corpus and provided without the gold event annotation.
        </p>
        <p>Table 1 shows the nine event types considered in the</p>
        <sec id="sec-4-1-1">
          <title>Algorithm 2 Recursive subroutine</title>
          <p>Recursive subroutine: MatchNode(σ, rIMparent, Gri , Gs)
1: IMcurrent ← IMparent //assign IMparent from the
2: //parent level to the current IMcurrent
3: while stack σ is not empty do
4: pop node pair (vr, vs) from stack σ
5: if an injective match between vr and vs already exists
6: in IMcurrent then
7: do nothing
8: else if an injective match is possible between vr and
9: vs then
10: IMcurrent ← IMcurrent ∪ { the match between
11: vr and vs }
12: else
13: return FALSE
14: for all edges er adjacent to node vr in Gri do
15: let (vr, nr) be the edge er
16: for all edges es adjacent to node vs in Gs do
17: let (vs, ns) be the edge es
18: if er and es share a same direction and
19: possess identical edge labels then
20: S ← S ∪ ns //S : the set of candidate
21: //nodes for matching nr
22: for all ns ∈ S do
23: if an injective match between nr and ns
24: already exists in IMcurrent then
25: go to Line 14 and proceed with next edge er
26: else if an injective match is possible between
27: nr and ns then
28: σn ← σ //copy σ to a new stack σn
29: push (vr, vs, nr, ns) onto the stack σn
30: call MatchNode(σn, rIMcurrent, Gri , Gs)
31: //rIMcurrent : reference of IMcurrent
32: if MatchNode returned TRUE then
33: IMparent ← IMcurrent
34: //update IMparent using IMcurrent
35: return TRUE
36: return FALSE
37: IMparent ← IMcurrent
38: return TRUE
shared task. Since these types are all related to protein
biology, they take proteins (P) as their theme.
Regulation events always take a theme argument and, when
expressed, also a cause argument. As a unique feature
of the shared task, regulation events may take another
event (E), namely sub-event, as its theme or cause.
5.2</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Rule Induction Results</title>
        <p>For training data, only sentences that contain at least
one protein and one event are considered candidates
for further processing. For testing data, candidate
sentences contain at least one protein. Our proposed graph
matching-based method focuses on extracting
biological events from sentences. Therefore, only
sentencebased events are considered in this work. After
removing duplicate rules, we obtained 6,435 event rules,
which are distributed over nine event types.</p>
        <p>We observed that some event rules of an event
type are overlapped with rules of other event types.
For instance, a Transcription rule is isomorphic to a
Gene expression rule in terms of the graph
representation and they also share a same event trigger token.
In fact, tokens like “gene expression” and “induction”
are used as event trigger of both Transcription and
Gene expression in training data. Therefore, the
detection of some Gene expression events is always
accompanied by certain Transcription events.</p>
        <p>In tackling this problem, we processed the rules and
built a non-overlapping rule set. When the dependency
graphs of two rules across different event types are
isomorphic to each other and two rules share a same event
trigger token, we keep the rule of the event type in
which the trigger token of the rule occurs more frequent
as a trigger in the training data, and remove the rule of
the other event type from the set.
The non-overlapping rule sets in terms of different
combinations of matching features are then applied
to the 988 candidate development sentences using our
graph matching algorithm. Table 2 shows the event
extraction results based on each feature.</p>
        <p>The least specific matching criterion when
matching between rules and sentences is “E”, which assumes
that, without checking any information about nodes, as
long as edge directions and labels are the same, both
edges and nodes of a rule and a sentence can match
with each other. It achieves the highest recall among all
the runs and captures more than half of the gold events
in the sentences. However, the precision is quite low,
leading to a low F-score as too many false positives are
generated due to the disregard of node information.</p>
        <p>As the strictest matching criteria, “E+P+A” requires
that the edges (E), the POS tags (P) and all tokens (A)
be exactly the same for the edges and the nodes of a rule
and a sentence to match with each other. It achieves
the highest precision 69.72% and an F-score over 40%.
This indicates that a certain number of biological events
are described in very similar way in the literature,
involving the same grammatical structures and identical
contextual contents. Comparing to “P+A”, adding the
edge features improves the overall precision of event
extraction by a large margin, nearly 13%. “E+P+T”
requires that edge directions and labels of all edges be
identical, POS tags of all tokens be identical, and
tokens of only event triggers (T) be identical. It achieves
better performance than “E+P+A” when relaxing the
matching criteria from all tokens being the same to only
event trigger tokens having to be identical. The best 2
of the first 6 runs in Table 2 are “E+P+T” and “P+A”.</p>
        <p>Next, we attempted to relax the matching criterion
of POS tags for nouns and verbs. For nouns, the plural
form of nouns is allowed to match with the singular
form, and proper nouns are allowed to match with
regular nouns. For verbs, past tense, present tense and base
present form are allowed to match with each other.
Further, the event trigger tokens are stemmed to their root
forms allowing the trigger tokens derived from a same
root word to match. “E+P*+T*” and “P*+A+T*” in
Table 2 demonstrate the improved performance to the
above best two runs. These modifications improve the
recall but produce many incorrect events, leading to
only a small increase on the overall F-score.</p>
        <sec id="sec-4-2-1">
          <title>Feature</title>
          <p>E
E+P
E+P+A
E+P+T
P+A
P+T
E+P*+T*
P*+A+T*</p>
          <p>Prec.(%)
1.22
2.23
69.72
58.85
57.00
40.65
50.86
51.51</p>
          <p>Recall(%)
52.26
45.33
28.06
31.02
32.53
36.95
34.71
35.22
We decided to conduct four runs on the testing
sentences in terms of 4 features: “E”, “E+P+A”,
“E+P*+T*” and “P*+A+T*”. For “E” and “E+P+A”,
aiming to investigate the highest recall and precision
on the testing sentences that can be achieved by our
method. Table 3 gives the event extraction results on
the 1,670 testing sentences in terms of the 4 features.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Feature</title>
          <p>E
E+P+A
E+P*+T*
P*+A+T*</p>
          <p>“E+P*+T*” achieves the best overall F-score of
37.28% among all the runs. Similarly to the
development set, the highest precision 58.64% on the testing
sentences is achieved by the strictest matching criteria
“E+P+A”. The highest recall 52.17% is obtained by the
least specific matching criterion “E”, indicating that a
large amount of biological events is described in quite
different grammatical structures in the literature.
Although “P*+A+T*” produced the best performance on
the development set, it does not perform as well on the
testing set. This clearly suggests that when requiring
every token to be exactly the same for matching nodes
of a rule and a sentence, the event rules have less stable
generalization power to capture the underlying events.</p>
          <p>Table 4 gives the performance comparison of our
method with top-performing teams in the task. The
official evaluation shows that our best results would rank
6th in extracting biological events in the testing data
compared to the results of the 24 participating teams.</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>Team</title>
          <p>UTurku
JULIELab
ConcordU
UT+DBCLS
VIBGhent
DalhousieU
UTokyo
UNSW
conducted the experiments to tackle the primary task
of the BioNLP shared task, and our method achieves
an 37.28% F-score on the testing data in detecting
biological events across nine event types.</p>
          <p>In future work, we would like to experiment with
more matching criteria when mapping event rules to
sentences. We also plan to expand the coverage of event
trigger tokens using external lexical resources for new
event triggers and synonyms of existing triggers.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Muhammad</given-names>
            <surname>Abulaish</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lipika</given-names>
            <surname>Dey</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Biological relation extraction and query answering from medline abstracts using ontology-based text mining</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          ,
          <volume>61</volume>
          (
          <issue>2</issue>
          ):
          <fpage>228</fpage>
          -
          <lpage>262</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Antti</given-names>
            <surname>Airola</surname>
          </string-name>
          , Sampo Pyysalo, Jari Bjo¨rne, Tapio Pahikkala,
          <source>Filip Ginter, and Tapio Salakoski1</source>
          .
          <year>2008</year>
          .
          <article-title>All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <volume>9</volume>
          <issue>Suppl 11</issue>
          :
          <fpage>s2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Jari</given-names>
            <surname>Bjo</surname>
          </string-name>
          ¨rne, Juho Heimonen, Filip Ginter, Antti Airola, Tapio Pahikkala, and
          <string-name>
            <given-names>Tapio</given-names>
            <surname>Salakoski</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Extracting complex biological events with rich graph-based feature sets</article-title>
          .
          <source>In BioNLP '09: Proceedings of the Workshop on BioNLP</source>
          , pages
          <fpage>10</fpage>
          -
          <lpage>18</lpage>
          , Morristown, NJ, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Thomas H.</given-names>
            <surname>Cormen</surname>
          </string-name>
          , Charles E. Leiserson, Ronald L.
          <string-name>
            <surname>Rivest</surname>
            , and
            <given-names>Clifford</given-names>
          </string-name>
          <string-name>
            <surname>Stein</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Introduction to Algorithms</article-title>
          . The MIT Press.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Marie-Catherine de Marneffe and Christopher D. Manning</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>The Stanford typed dependencies representation</article-title>
          .
          <source>In CrossParser '08: Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          , Morristown, NJ, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Katrin</given-names>
            <surname>Fundel</surname>
          </string-name>
          , Robert Ku¨ffner, and
          <string-name>
            <given-names>Ralf</given-names>
            <surname>Zimmer</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Relex-relation extraction using dependency parse trees</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>23</volume>
          (
          <issue>3</issue>
          ):
          <fpage>365</fpage>
          -
          <lpage>371</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Ge´rard P.</given-names>
            <surname>Huet</surname>
          </string-name>
          .
          <year>1975</year>
          .
          <article-title>A unification algorithm for typed lambda-calculus</article-title>
          .
          <source>Theor. Comput. Sci.</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <fpage>27</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Lawrence</given-names>
            <surname>Hunter</surname>
          </string-name>
          , Zhiyong Lu, James Firby, William A. Baumgartner Jr., Helen L. Johnson, Philip V.
          <string-name>
            <surname>Ogren</surname>
            , and
            <given-names>K. Bretonnel</given-names>
          </string-name>
          <string-name>
            <surname>Cohen</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Opendmap: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <volume>9</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Jin-Dong</surname>
            <given-names>Kim</given-names>
          </string-name>
          ,
          <source>Tomoko Ohta, and Jun ichi Tsujii</source>
          .
          <year>2008</year>
          .
          <article-title>Corpus annotation for mining biomedical events from literature</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ):
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Jin-Dong</surname>
            <given-names>Kim</given-names>
          </string-name>
          , Yoshinobu Kano Tomoko Ohta, Sampo Pyysalo, and
          <string-name>
            <surname>Jun'ichi Tsujii</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Overview of bionlp'09 shared task on event extraction</article-title>
          .
          <source>In Proceedings of the NAACL-HLT 2009 Workshop on Natural Language Processing in Biomedicine (BioNLP'09)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Dan</given-names>
            <surname>Klein</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Accurate unlexicalized parsing</article-title>
          .
          <source>In ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics</source>
          , pages
          <fpage>423</fpage>
          -
          <lpage>430</lpage>
          , Morristown, NJ, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Yusuke</given-names>
            <surname>Miyao</surname>
          </string-name>
          , Kenji Sagae, Rune Saetre, Takuya Matsuzaki, and
          <string-name>
            <surname>Jun'ichi Tsujii</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Evaluating contributions of natural language parsers to protein-protein interaction extraction</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>25</volume>
          (
          <issue>3</issue>
          ):
          <fpage>394</fpage>
          -
          <lpage>400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Rinaldi</surname>
          </string-name>
          , Gerold Schneider, Kaarel Kaljurand, James Dowdall, Christos Andronis, Andreas Persidis, and
          <string-name>
            <given-names>Ourania</given-names>
            <surname>Konstanti</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Mining relations in the GENIA corpus</article-title>
          .
          <source>In Proceedings of the Second European Workshop on Data Mining and Text Mining for Bioinformatics</source>
          , Pisa, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Rinaldi</surname>
          </string-name>
          , Gerold Schneider,
          <string-name>
            <given-names>Kaarel</given-names>
            <surname>Kaljurand</surname>
          </string-name>
          , Michael Hess, Christos Andronis, Ourania Konstandi, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Persidis</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Mining of relations between proteins over biomedical scientific literature using a deeplinguistic approach</article-title>
          .
          <source>Artif. Intell. Med</source>
          .,
          <volume>39</volume>
          (
          <issue>2</issue>
          ):
          <fpage>127</fpage>
          -
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Yuanyuan</surname>
            <given-names>Tian</given-names>
          </string-name>
          , Richard C.
          <string-name>
            <surname>Mceachin</surname>
            ,
            <given-names>Carlos</given-names>
          </string-name>
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>David J.</given-names>
          </string-name>
          <string-name>
            <surname>States</surname>
          </string-name>
          , and
          <string-name>
            <surname>Jignesh</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Patel</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Saga: a subgraph matching tool for biological graphs</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>23</volume>
          (
          <issue>2</issue>
          ):
          <fpage>232</fpage>
          -
          <lpage>239</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Yoshimasa</given-names>
            <surname>Tsuruoka</surname>
          </string-name>
          , Yuka Tateishi,
          <string-name>
            <surname>Jin-Dong</surname>
            <given-names>Kim</given-names>
          </string-name>
          , Tomoko Ohta,
          <source>John McNaught, Sophia Ananiadou, and Jun ichi Tsujii</source>
          .
          <year>2005</year>
          .
          <article-title>Developing a Robust Part-ofSpeech Tagger for Biomedical Text</article-title>
          .
          <source>LNCS</source>
          <volume>3746</volume>
          :
          <fpage>382</fpage>
          -
          <lpage>392</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>John Wilbur</surname>
            , Lawrence Smith,
            <given-names>and Lorraine</given-names>
          </string-name>
          <string-name>
            <surname>Tanabe</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Biocreative 2. gene mention task</article-title>
          .
          <source>In Proceedings of Second BioCreative Challenge Evaluation Workshop</source>
          , pages
          <fpage>7</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Xifeng</given-names>
            <surname>Yan</surname>
          </string-name>
          , Feida Zhu, Jiawei Han, and
          <string-name>
            <surname>Philip</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Searching substructures with superimposed distance</article-title>
          .
          <source>In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering</source>
          , page 88, Washington, DC, USA. IEEE Computer Society.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>