=Paper= {{Paper |id=None |storemode=property |title=Biological event extraction using subgraph matching |pdfUrl=https://ceur-ws.org/Vol-714/ShortPaper03_Liu.pdf |volume=Vol-714 |dblpUrl=https://dblp.org/rec/conf/smbm/LiuKB10 }} ==Biological event extraction using subgraph matching== https://ceur-ws.org/Vol-714/ShortPaper03_Liu.pdf
                    Biological Event Extraction using Subgraph Matching

             Haibin Liu                           Christian Blouin                        Vlado Kešelj
    Faculty of Computer Science              Faculty of Computer Science          Faculty of Computer Science
       Dalhousie University                     Dalhousie University                 Dalhousie University
        Halifax, NS, Canada                      Halifax, NS, Canada                  Halifax, NS, Canada
      haibin@cs.dal.ca                        cblouin@cs.dal.ca                      vlado@cs.dal.ca


                        Abstract                               task on biological event extraction (Kim et al., 2009),
                                                               20 out of the total 24 participating teams resorted to
     An important task in biological information ex-
                                                               a full parsing strategy, including all top 10 perform-
     traction is to identify descriptions of biologi-
     cal relations and events involving genes or pro-
                                                               ing teams. However, most of previous work extracts
     teins. We propose a graph-based approach to               relevant relations based on a limited set of manually
     automatically learn rules for detecting biologi-          designed rules that map interpreted syntactic structures
     cal events in the literature. The detection is per-       into the semantic relations. We propose an approach to
     formed by searching for isomorphism between               automatically learn rules that characterize a wide range
     event rules and the dependency graphs of com-             of biological relations and events from a syntactically
     plete sentences. When applying our approach to            and semantically annotated corpus, and our approach is
     the datasets of the Task 1 of the BioNLP shared
                                                               also based on full parsing of biological texts.
     task, we achieved an 37.28% F-score in detecting
     biological events across 9 event types.                      More recently, the dependency representation ob-
                                                               tained from full parsing, with its ability to reveal long-
                                                               range dependencies, has shown an advantage in bi-
1 Introduction
                                                               ological relation extraction over the traditional Penn
Recent research in information extraction in the bio-          Treebank-style phrase structure trees (Miyao et al.,
logical domain has focused on extracting semantic re-          2009). Relations are generally extracted from the de-
lations between molecular biology concepts (Fundel et          pendency representation by two approaches. In one
al., 2007). State-of-the-art protein annotation methods        approach, the dependency representation is traversed
have achieved reasonable success with a performance            and paths that contain the relevant terms describing the
of 88% F-score (Wilbur et al., 2007). A task of in-            relations predefined in the rules are extracted as can-
terest is to automatically extract protein-protein inter-      didate relations (Fundel et al., 2007; Rinaldi et al.,
actions (PPI). To date, most of the biological knowl-          2004). In the other, relations are learned from the
edge about these interactions is only available in the         dependency representation using supervised machine
form of unstructured text from scientific articles (Abu-       learning based on specialized feature representations or
laish and Dey, 2007). The best-performing system from          kernels, encoded with dependency paths from the rep-
the BioCreative II challenge (Hunter et al., 2008) only        resentation (Airola et al., 2008; Björne et al., 2009).
achieved a 29% F-score in identifying protein pairs in a          Graphs provide a powerful primitive for modeling
sentence that have a biologically relevant relationship.       biological data such as pathways and protein interac-
This suggests that the problem of biological relation ex-      tion networks (Tian et al., 2007; Yan et al., 2006).
traction is difficult and far from solved.                     Since the dependency representation maps straightfor-
   Sentences in the biological literature often have           wardly onto a directed graph (de Marneffe and Man-
long-range dependencies. Therefore, co-occurrence              ning, 2008), properties and operations of graphs can
based or surface pattern based shallow analysis on bio-        be naturally applied to the problem of biological rela-
logical texts suffers from either low precision or recall      tion extraction. We propose a graph matching-based
(Fundel et al., 2007; Abulaish and Dey, 2007). As a re-        approach to extract biological events from the scientific
sult, full parsing has been explored as the basis for rela-    literature in tackling the primary task of the BioNLP’09
tion extraction to perform intensive syntactical and se-       shared task on biological event extraction. The ex-
mantic analysis (Abulaish and Dey, 2007; Fundel et al.,        traction is performed by matching the dependency rep-
2007; Rinaldi et al., 2007). In the BioNLP’09 shared           resentation of automatically learned rules to the de-

                                                              110
pendency representation of biological sentences. This           3 Subgraph Matching-based Event Extraction
process is treated as a subgraph matching problem,
which corresponds to the search for a subgraph isomor-          3.1 Dependency Representation
phic to a rule graph within a sentence graph.                   The dependency representation is designed to provide a
   The rest of the paper is organized as follows: In            simple description of the grammatical relationships in
Section 2, we introduce the BioNLP’09 shared task               a sentence that can be effectively used to extract textual
on event extraction. Section 3 describes our subgraph           relations (de Marneffe and Manning, 2008).
matching-based event extraction method. Sections 4                 The dependency representation of a sentence is
elaborates the implementation details. Performance is           formed by tokens in the sentence and the binary re-
evaluated in Section 5. Finally, Section 6 summarizes           lations between them. A single dependency relation
the paper and introduces future work.                           is represented as relation(governor, dependent), where
                                                                governor and dependent are tokens, and relation is a
2 BioNLP’09 Shared Task                                         type of the grammatical dependency relation. A depen-
                                                                dency representation is essentially a labeled directed
The BioNLP’09 shared task (Kim et al., 2009) focused            graph, which is named dependency graph.
on the recognition of biological events that appear in
the biological literature. When a biological event is           3.2 Event Rule Induction
described in text, we can analyze it by recognizing an          A biological event rule is defined as follows:
event type, the event trigger, one or more event argu-             Definition 2. A biological event rule is a pair
ments, and the source text (ST ), where the event is de-        r = (e, Gr ). Gr = (Vr , Er ) is a dependency graph,
scribed. The source text is composed of tokens, which           which characterizes the contextual structure of events.
are defined as finite strings of characters from a finite       e = (Type, Trigger, Arguments) encodes a detailed
alphabet. The alphabet is a finite set of symbols Σ.            event frame, where Type is the event type, Trigger =
Tokens come from W , the set of all finite strings of           {(t1 , v1 ), (t2 , v2 ), · · ·} records the event trigger and is
characters from Σ, i.e., W = Σ+ . The source text is a          a non-empty finite sequence of tokens associated with
finite sequence of tokens, i.e., any member of W ∗ . We         nodes in Gr , i.e., Trigger ∈ (W × Vr )+ , and Argu-
define a biological event in a way consistent with the          ments = {(t1 , l1 , v1 ), (t2 , l2 , v2 ), · · ·} records the event
shared task, which is as follows:                               arguments and is a non-empty finite sequence of tokens
   Definition 1. A biological event is a four-tuple e =         associated with semantic role labels and nodes in Gr ,
(Type, Trigger, Arguments, ST). ST ∈ W ∗ , called the           i.e., Arguments ∈ (W × L × Vr )+ .
source text, is a sequence of tokens that contains the             The biological event rules are learned from la-
event; Type ∈ Te is an event type from a finite set of          beled training sentences using the following induction
event types Te ; Trigger is a substring of tokens from ST       method. Starting with the dependency graph of each
that signals the event; Arguments is a non-empty, finite        training sentence, the directions of edges are first re-
set of pairs (l, a) where l ∈ L is a label from a finite set    moved so that the directed graph is transformed into an
of semantic role labels L, and a is a token from ST, or         undirected graph, where a path must exist between any
another biological event.                                       two nodes since the graph is always connected. For
   For the shared task, Te consists of nine event types         each gold event, the shortest dependency path in the
defined in Table 1, and L = {Theme, Cause}. A gold              undirected graph connecting the event trigger nodes to
event denotes a biological event where all the informa-         each event argument node is selected. The union of all
tion has been manually annotated by domain experts.             shortest dependency paths is then computed for each
   The primary task of the shared task was to detect bi-        event, and the original directed dependency representa-
ological events such as protein binding and phosphory-          tion of the path union is retrieved and used as the graph
lation, given only the annotation of protein names. It          representation of the event.
was required to extract type, trigger, and primary argu-           For multi-token event triggers, the shortest depen-
ments of each event. This task is an example of extrac-         dency path connecting the node of every trigger token
tion of semantically typed, complex events for which            to the node of each event argument is selected, and the
the arguments can also be other events. We focus on             union of the paths is then computed for each trigger.
the primary task and propose a graph matching-based             For regulation events, when a sub-event is used as an
method to cope with the problem.                                argument, only the type and the trigger of the sub-event

                                                               111
are preserved as the argument of the main events. The            Algorithm 1 Main algorithm
shortest dependency path is extracted so as to connect           Input: Dependency graph of a testing sentence s, Gs = (Vs , Es )
the trigger nodes of the main event to the trigger nodes             where Vs is the set of nodes and Es is the set of edges
                                                                     of the graph; a finite set of biological event rules R =
of the sub-event. In case that there exists more than                {r1 , r2 , · · · , ri , · · ·}, where ri = (ei , Gri ). Gri =
one shortest path, all of the paths are considered. As a             (Vri , Eri ) is the dependency graph of ri .
result, each gold event is transformed into the form of          Output: MR : a set of biological event rules from R matched with
a biological event rule. The obtained rules are catego-              s together with the injective mapping
                                                                 Main algorithm:
rized in terms of the nine event types of the task.
                                                                  1: MR ← ∅
                                                                  2: for all ri ∈ R do
3.3 Sentence Matching                                             3:    stri ← StartNode(Gri )                    //StartNode finds the start
We propose a sentence matching approach to attempt                4:    //node stri of the rule graph Gri
                                                                  5:    STs ← {sts1 , sts2 , · · · , stsj , · · ·}
to match event rules to each testing sentence. Since              6:    //STs : the set of start nodes of the sentence graph Gs
the event rules and the sentences all possess a depen-            7:    for all stsj ∈ STs do
dency graph, the matching process is a subgraph match-            8:         create an empty stack σ and push (stri , stsj ) onto
ing problem, which corresponds to the search for a                9:         the stack σ
                                                                 10:          IM ← ∅                     //IM : records of injective matches
subgraph isomorphic to an event rule graph within the
                                                                 11:          //between nodes in Gri and Gs
graph of a testing sentence. This problem is also called         12:          call MatchNode(σ, rIM, Gri , Gs )
subgraph isomorphism, defined in this work as follows:           13:          //rIM : reference of IM
   Definition 3. An event rule graph Gr = (Vr , Er )             14:          if MatchNode returned TRUE then
                                                                 15:                 MR ← MR ∪ {ri with IM }
is isomorphic to a subgraph of a sentence graph Gs =
                                                                 16: return MR
(Vs , Es ), denoted by Gr ∼   = Ss ⊆ Gs , if there is an
injective mapping f : Vr → Vs such that, for every
directed pair of nodes vi , vj ∈ Vr , if (vi , vj ) ∈ Er then    names are replaced with a unified tag “BIO Entity”.
(f (vi ), f (vj )) ∈ Es , and the edge label of (vi , vj ) is       GENIA tagger (Tsuruoka et al., 2005) is used to as-
the same as the edge label of (f (vi ), f (vj )).                sociate each word in the tokenized sentences with its
   The subgraph isomorphism problem is NP-complete               most likely Part-of-Speech tag. The POS-tagged sen-
(Cormen et al., 2001). Considering that the graphs of            tences are submitted to the Stanford unlexicalized nat-
rules and sentences involved in our matching process             ural language parser (Klein and Manning, 2003) to an-
are small, a simple subgraph matching algorithm us-              alyze the syntactic and semantic structure of the sen-
ing a backtracking approach is appropriate. It is named          tences. The Stanford parser returns a dependency graph
“Injective Graph Embedding Algorithm” and designed               for each sentence after parsing.
based on the Huet’s graph unification algorithm (Huet,              For each gold event, the shortest path in the undi-
1975). The main and the recursive part of the algorithm          rected graph connecting the event trigger to each event
are formalized in Algorithm 1 and Algorithm 2.                   argument is extracted using the Dijkstra’s algorithm
   For each sentence, the algorithm returns all the              (Cormen et al., 2001) with equal weight for edges. Sen-
matched rules together with the corresponding injec-             tence matching is performed following the procedure of
tive mappings from rule nodes to sentence tokens. Bio-           Algorithm 1 and Algorithm 2.
logical events are then extracted by applying the event
descriptions of tokens in each matched rule such as the          5 Results and Evaluation
type to the corresponding tokens of the sentence. In
practice, it only takes the algorithm a couple of seconds        5.1 Dataset
to return the results.                                           We use the BioNLP’09 Shared Task datasets for evalu-
                                                                 ation. A training set and a development set are provided
4 Implementation
                                                                 for the purpose of developing task solution. They are
We assume that a sentence is a suitable level of text            prepared based on the publicly available portion of the
granularity in event extraction. The target text is first        GENIA event corpus (Kim et al., 2008) with the gold
segmented into sentences. Then, each sentence is to-             protein annotation and the gold event annotation given.
kenized with whitespace separating tokens. We re-                A testing set is prepared from a held-out part of the cor-
quire that every protein be separated from surrounding           pus and provided without the gold event annotation.
text and become one individual token. All the protein               Table 1 shows the nine event types considered in the

                                                                112
Algorithm 2 Recursive subroutine                                                     Event type              Primary arguments
Recursive subroutine: MatchNode(σ, rIMparent , Gri , Gs )                        1   Gene expression         Theme(P)
 1: IMcurrent ← IMparent                       //assign IMparent from the        2   Transcription           Theme(P)
 2: //parent level to the current IMcurrent
                                                                                 3   Protein catabolism      Theme(P)
 3: while stack σ is not empty do
 4:    pop node pair (vr , vs ) from stack σ                                     4   Phosphorylation         Theme(P)
 5:    if an injective match between vr and vs already exists                    5   Localization            Theme(P)
 6:    in IMcurrent then                                                         6   Binding                 (Theme(P))+
 7:         do nothing
 8:    else if an injective match is possible between vr and
                                                                                 7   Regulation              Theme(P/E), (Cause(P/E))?
 9:    vs then                                                                   8   Positive regulation     Theme(P/E), (Cause(P/E))?
10:         IMcurrent ← IMcurrent ∪ { the match between                          9   Negative regulation     Theme(P/E), (Cause(P/E))?
11:         vr and vs }
12:     else                                                                           Table 1: Event types and primary arguments
13:         return FALSE
14:     for all edges er adjacent to node vr in Gri do
15:         let (vr , nr ) be the edge er                                       which are distributed over nine event types.
16:         for all edges es adjacent to node vs in Gs do
                                                                                   We observed that some event rules of an event
17:              let (vs , ns ) be the edge es
18:              if er and es share a same direction and                        type are overlapped with rules of other event types.
19:              possess identical edge labels then                             For instance, a Transcription rule is isomorphic to a
20:                   S ← S ∪ ns                  //S : the set of candidate    Gene expression rule in terms of the graph represen-
21:                   //nodes for matching nr                                   tation and they also share a same event trigger token.
22:         for all ns ∈ S do
23:              if an injective match between nr and ns                        In fact, tokens like “gene expression” and “induction”
24:              already exists in IMcurrent then                               are used as event trigger of both Transcription and
25:                   go to Line 14 and proceed with next edge er               Gene expression in training data. Therefore, the de-
26:              else if an injective match is possible between                 tection of some Gene expression events is always ac-
27:              nr and ns then
28:                   σn ← σ                   //copy σ to a new stack σn
                                                                                companied by certain Transcription events.
29:                   push (vr , vs , nr , ns ) onto the stack σn                  In tackling this problem, we processed the rules and
30:                   call MatchNode(σn , rIMcurrent , Gri , Gs )               built a non-overlapping rule set. When the dependency
31:                   //rIMcurrent : reference of IMcurrent                     graphs of two rules across different event types are iso-
32:                   if MatchNode returned TRUE then
33:                        IMparent ← IMcurrent
                                                                                morphic to each other and two rules share a same event
34:                        //update IMparent using IMcurrent                    trigger token, we keep the rule of the event type in
35:                        return TRUE                                          which the trigger token of the rule occurs more frequent
36:         return FALSE                                                        as a trigger in the training data, and remove the rule of
37: IMparent ← IMcurrent
38: return TRUE
                                                                                the other event type from the set.

                                                                                5.3 Event Extraction Results on Development Set
shared task. Since these types are all related to protein                       The non-overlapping rule sets in terms of different
biology, they take proteins (P) as their theme. Regu-                           combinations of matching features are then applied
lation events always take a theme argument and, when                            to the 988 candidate development sentences using our
expressed, also a cause argument. As a unique feature                           graph matching algorithm. Table 2 shows the event ex-
of the shared task, regulation events may take another                          traction results based on each feature.
event (E), namely sub-event, as its theme or cause.                                The least specific matching criterion when match-
                                                                                ing between rules and sentences is “E”, which assumes
5.2 Rule Induction Results                                                      that, without checking any information about nodes, as
For training data, only sentences that contain at least                         long as edge directions and labels are the same, both
one protein and one event are considered candidates                             edges and nodes of a rule and a sentence can match
for further processing. For testing data, candidate sen-                        with each other. It achieves the highest recall among all
tences contain at least one protein. Our proposed graph                         the runs and captures more than half of the gold events
matching-based method focuses on extracting biolog-                             in the sentences. However, the precision is quite low,
ical events from sentences. Therefore, only sentence-                           leading to a low F-score as too many false positives are
based events are considered in this work. After re-                             generated due to the disregard of node information.
moving duplicate rules, we obtained 6,435 event rules,                             As the strictest matching criteria, “E+P+A” requires

                                                                               113
that the edges (E), the POS tags (P) and all tokens (A)       method. Table 3 gives the event extraction results on
be exactly the same for the edges and the nodes of a rule     the 1,670 testing sentences in terms of the 4 features.
and a sentence to match with each other. It achieves
the highest precision 69.72% and an F-score over 40%.               Feature     Prec.(%)    Recall(%)    F-score(%)
This indicates that a certain number of biological events           E             0.84        52.17          1.65
are described in very similar way in the literature, in-            E+P+A        58.64        26.02         36.05
volving the same grammatical structures and identical               E+P*+T*      41.77        33.66         37.28
contextual contents. Comparing to “P+A”, adding the                 P*+A+T*      39.61        32.18         35.51
edge features improves the overall precision of event
                                                              Table 3: Event extraction of non-overlapping set on testing
extraction by a large margin, nearly 13%. “E+P+T” re-         sentences using different features
quires that edge directions and labels of all edges be
identical, POS tags of all tokens be identical, and to-          “E+P*+T*” achieves the best overall F-score of
kens of only event triggers (T) be identical. It achieves     37.28% among all the runs. Similarly to the develop-
better performance than “E+P+A” when relaxing the             ment set, the highest precision 58.64% on the testing
matching criteria from all tokens being the same to only      sentences is achieved by the strictest matching criteria
event trigger tokens having to be identical. The best 2       “E+P+A”. The highest recall 52.17% is obtained by the
of the first 6 runs in Table 2 are “E+P+T” and “P+A”.         least specific matching criterion “E”, indicating that a
   Next, we attempted to relax the matching criterion         large amount of biological events is described in quite
of POS tags for nouns and verbs. For nouns, the plural        different grammatical structures in the literature. Al-
form of nouns is allowed to match with the singular           though “P*+A+T*” produced the best performance on
form, and proper nouns are allowed to match with reg-         the development set, it does not perform as well on the
ular nouns. For verbs, past tense, present tense and base     testing set. This clearly suggests that when requiring
present form are allowed to match with each other. Fur-       every token to be exactly the same for matching nodes
ther, the event trigger tokens are stemmed to their root      of a rule and a sentence, the event rules have less stable
forms allowing the trigger tokens derived from a same         generalization power to capture the underlying events.
root word to match. “E+P*+T*” and “P*+A+T*” in                   Table 4 gives the performance comparison of our
Table 2 demonstrate the improved performance to the           method with top-performing teams in the task. The of-
above best two runs. These modifications improve the          ficial evaluation shows that our best results would rank
recall but produce many incorrect events, leading to          6th in extracting biological events in the testing data
only a small increase on the overall F-score.                 compared to the results of the 24 participating teams.

   Feature      Prec.(%)     Recall(%)    F-score(%)               Team          Prec.(%)    Recall(%)    F-score(%)
   E              1.22         52.26          2.38                 UTurku         58.48        46.73         51.95
   E+P            2.23         45.33          4.25                 JULIELab       47.52        45.82         46.66
   E+P+A         69.72         28.06         40.02                 ConcordU       61.59        34.98         44.62
   E+P+T         58.85         31.02         40.63                 UT+DBCLS       55.59        36.90         44.35
   P+A           57.00         32.53         41.42                 VIBGhent       51.55        33.41         40.54
   P+T           40.65         36.95         38.71                 DalhousieU     41.77        33.66         37.28
   E+P*+T*       50.86         34.71         41.26                 UTokyo         53.56        28.13         36.88
   P*+A+T*       51.51         35.22         41.84                 UNSW           45.78        28.22         34.92

Table 2: Event extraction of non-overlapping set on devel-     Table 4: Performance comparison with participating teams
opment set using different features

                                                              6 Conclusion and future work
5.4 Event Extraction Results on Testing Set                   We use dependency graphs to automatically induce bi-
We decided to conduct four runs on the testing                ological event rules from annotated events. These rules
sentences in terms of 4 features: “E”, “E+P+A”,               are then used to extract biological events from the lit-
“E+P*+T*” and “P*+A+T*”. For “E” and “E+P+A”,                 erature. The extraction process is treated as a subgraph
aiming to investigate the highest recall and precision        matching problem to search for the graph of an event
on the testing sentences that can be achieved by our          rule within the dependency graph of a sentence. We

                                                             114
conducted the experiments to tackle the primary task                 Processing in Biomedicine (BioNLP’09), pages 1–9.
of the BioNLP shared task, and our method achieves                   ACL.
an 37.28% F-score on the testing data in detecting bio-            Dan Klein and Christopher D. Manning. 2003. Accurate
logical events across nine event types.                              unlexicalized parsing. In ACL ’03: Proceedings of the
                                                                     41st Annual Meeting on Association for Computational
   In future work, we would like to experiment with
                                                                     Linguistics, pages 423–430, Morristown, NJ, USA. Asso-
more matching criteria when mapping event rules to                   ciation for Computational Linguistics.
sentences. We also plan to expand the coverage of event            Yusuke Miyao, Kenji Sagae, Rune Saetre, Takuya Mat-
trigger tokens using external lexical resources for new              suzaki, and Jun’ichi Tsujii. 2009. Evaluating contribu-
event triggers and synonyms of existing triggers.                    tions of natural language parsers to protein–protein inter-
                                                                     action extraction. Bioinformatics, 25(3):394–400.
                                                                   Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand, James
References                                                           Dowdall, Christos Andronis, Andreas Persidis, and Oura-
Muhammad Abulaish and Lipika Dey. 2007. Biological re-               nia Konstanti. 2004. Mining relations in the GENIA cor-
   lation extraction and query answering from medline ab-            pus. In Proceedings of the Second European Workshop
   stracts using ontology-based text mining. Data & Knowl-           on Data Mining and Text Mining for Bioinformatics, Pisa,
   edge Engineering, 61(2):228–262.                                  Italy.
Antti Airola, Sampo Pyysalo, Jari Björne, Tapio Pahikkala,        Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand, Michael
   Filip Ginter, and Tapio Salakoski1. 2008. All-paths graph         Hess, Christos Andronis, Ourania Konstandi, and An-
   kernel for protein-protein interaction extraction with eval-      dreas Persidis. 2007. Mining of relations between pro-
   uation of cross-corpus learning. BMC Bioinformatics, 9            teins over biomedical scientific literature using a deep-
   Suppl 11:s2.                                                      linguistic approach. Artif. Intell. Med., 39(2):127–136.
Jari Björne, Juho Heimonen, Filip Ginter, Antti Airola,           Yuanyuan Tian, Richard C. Mceachin, Carlos Santos,
   Tapio Pahikkala, and Tapio Salakoski. 2009. Extracting            David J. States, and Jignesh M. Patel. 2007. Saga: a
   complex biological events with rich graph-based feature           subgraph matching tool for biological graphs. Bioinfor-
   sets. In BioNLP ’09: Proceedings of the Workshop on               matics, 23(2):232–239.
   BioNLP, pages 10–18, Morristown, NJ, USA. Associa-              Yoshimasa Tsuruoka, Yuka Tateishi, Jin-Dong Kim,
   tion for Computational Linguistics.                               Tomoko Ohta, John McNaught, Sophia Ananiadou, and
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest,            Jun ichi Tsujii. 2005. Developing a Robust Part-of-
   and Clifford Stein. 2001. Introduction to Algorithms.             Speech Tagger for Biomedical Text. LNCS 3746:382–
   The MIT Press.                                                    392.
Marie-Catherine de Marneffe and Christopher D. Manning.            John Wilbur, Lawrence Smith, and Lorraine Tanabe. 2007.
   2008. The Stanford typed dependencies representa-                 Biocreative 2. gene mention task. In Proceedings of Sec-
   tion. In CrossParser ’08: Coling 2008: Proceedings                ond BioCreative Challenge Evaluation Workshop, pages
   of the workshop on Cross-Framework and Cross-Domain               7–16.
   Parser Evaluation, pages 1–8, Morristown, NJ, USA. As-          Xifeng Yan, Feida Zhu, Jiawei Han, and Philip S. Yu. 2006.
   sociation for Computational Linguistics.                          Searching substructures with superimposed distance. In
Katrin Fundel, Robert Küffner, and Ralf Zimmer. 2007.               ICDE ’06: Proceedings of the 22nd International Con-
   Relex—relation extraction using dependency parse trees.           ference on Data Engineering, page 88, Washington, DC,
   Bioinformatics, 23(3):365–371.                                    USA. IEEE Computer Society.
Gérard P. Huet. 1975. A unification algorithm for typed
   lambda-calculus. Theor. Comput. Sci., 1(1):27–57.
Lawrence Hunter, Zhiyong Lu, James Firby, William
   A. Baumgartner Jr., Helen L. Johnson, Philip V. Ogren,
   and K. Bretonnel Cohen. 2008. Opendmap: An open
   source, ontology-driven concept analysis engine, with ap-
   plications to capturing knowledge regarding protein trans-
   port, protein interactions and cell-type-specific gene ex-
   pression. BMC Bioinformatics, 9.
Jin-Dong Kim, Tomoko Ohta, and Jun ichi Tsujii. 2008.
   Corpus annotation for mining biomedical events from lit-
   erature. BMC Bioinformatics, 9(1):10.
Jin-Dong Kim, Yoshinobu Kano Tomoko Ohta,
   Sampo Pyysalo, and Jun’ichi Tsujii. 2009. Overview of
   bionlp’09 shared task on event extraction. In Proceedings
   of the NAACL-HLT 2009 Workshop on Natural Language

                                                                  115