<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving the extraction of complex regulatory events from scientific text by using ontology-based inference</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jung-jae Kim</string-name>
          <email>jungjae.kim@ntu.edu.sg</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dietrich Rebholz-Schuhmann</string-name>
          <email>rebholz@ebi.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>EMBL-EBI, Wellcome Trust Genome Campus</institution>
          ,
          <addr-line>Hinxton, Cambridge</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Engineering, Nanyang Technological University</institution>
          ,
          <country country="SG">Singapore</country>
        </aff>
      </contrib-group>
      <fpage>36</fpage>
      <lpage>44</lpage>
      <abstract>
        <p>Background: The extraction of complex events from biomedical text is a challenging task and requires in-depth semantic analysis. Previous approaches associate lexical and syntactic resources with ontologies for the semantic analysis, but fall short in testing the benefits from the use of domain knowledge. Results: We developed a system that deduces implicit events from explicitly expressed events by using inference rules that encode domain knowledge. We evaluated the system with the inference module on three tasks: First, when tested against a corpus with manually annotated events, the inference module of our system contributes 53.2% of correct extractions, but does not cause any incorrect results. Second, the system overall reproduces 33.1% of the transcription regulatory events contained in RegulonDB (up to 85.0% precision) and the inference module is required for 93.8% of the reproduced events. Third, we applied the system with minimum adaptations to the identification of cell activity regulation events, confirming that the inference improves the performance of the system also on this task. Conclusions: Our research shows that the inference based on domain knowledge plays a significant role in extracting complex events from text. This approach has great potential in recognizing the complex concepts of such biomedical ontologies as Gene Ontology in the literature.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Background</title>
      <p>
        The task of extracting events from text, called event
extraction, is a complex process that requires various
semantic resources to decipher the semantic features
in the event descriptions. Previous approaches
identify and represent the textual semantics of events
(e.g. gene regulation, gene-disease relation) by
associating lexical and syntactic resources with
ontologies [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1–5</xref>
        ]. We further explore the usage of an
ontology for incorporating domain knowledge into an
event extraction system.
      </p>
      <p>Events from text that have been hand-curated
into relational databases by biologists are actually
the products of scientific reasoning supported by the
domain knowledge of the biologists. This process of
reasoning is based on linguistic evidence of such
language patterns as “A regulates B” and “expression
of Gene C” which refer to the basic events of
regulation and gene expression. These basic events can
be combined into an event with the compositional
structure “A regulates (the expression of Gene C)”,
where the parentheses enclose the embedded event.
In this paper, we call such an event consisting of
multiple basic events a complex event and say that
it has a compositional structure. We will show that
the use of inference based on domain knowledge
supports the extraction of complex events from text.</p>
      <p>The previous approaches to extracting complex
events combine the basic events into compositional
structures according to the syntactic structures of
source sentences. However, there are two open
issues in curating the compositional structures into
relational databases. First, the event descriptions
in scientific papers are so complicated that it is
often required to transform the compositional
structures into the structures compatible with the
semantic templates of the target databases. Second, an
event can be represented across sentence boundaries,
even in multiple sentences which are not linked via
anaphoric expressions (e.g. ‘it’, ‘the gene’).</p>
      <p>Biologists with sufficient domain knowledge have
little problem in carrying out the two required tasks
of structural transformation and evidence
combination. Structural transformation is to find an event
that has the same meaning as the original event but
with a different structure, while evidence
combination is to identify a new event that can be deduced
from multiple events. We should encode the domain
knowledge into a logical form so that our text
mining systems can process the compositional structures
of events, which are explicitly expressed in text and
can be extracted by language patterns, to deduce
the events with alternative structures and those
implied by a combination of multiple events. We call
the explicitly expressed events explicit events and
the deduced events implicit events.</p>
      <p>
        Several text mining systems have employed
inference based on domain knowledge to fill in event
templates [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6–8</xref>
        ]. They can also go beyond sentence
boundaries and combine into an event frame the
event attributes collected from different sentences.
However, they do not use an ontology for
representing the inference rules. Moreover, they primarily
deal with flat-structured event frames whose
participants are physical entities (e.g. protein, residue).
To address these issues, we present a novel approach
that represents events and domain knowledge with
an ontology and combines basic events into a
compositional structure where an event participant can
be another simpler event.
      </p>
      <p>
        We utilize Gene Regulation Ontology (GRO), a
conceptual model for the domain of gene regulation
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The ontology has been designed for representing
the compositional semantics of both biomedical text
and the referential databases. GRO provides basic
concepts and properties of the domain, which are
from, and cross-linked to, such biomedical
ontologies as Gene Ontology and Sequence Ontology. We
use the concepts and properties of GRO to represent
the domain knowledge in form of P→Q implications,
which we call inference rules. We also represent
explicit events from text with GRO and apply modus
ponens to the inference rules and the explicit events
to deduce implicit events.
      </p>
      <p>We implemented a system of event extraction
with the proposed inference module and evaluated
it on three tasks, reporting that the inference
significantly improves the system performance.</p>
    </sec>
    <sec id="sec-2">
      <title>Results</title>
      <p>
        We performed three evaluations to test our system.
Each evaluation takes two steps to answer the
following two questions, respectively: 1) How well does
the system with the inference module extract events
from text and 2) how much does the inference
module contribute to the event extraction? First, we ran
the system on a manually annotated corpus to
estimate the performance of the system. Second, we
used the system for a real-world task of populating
RegulonDB, the referential database of E. coli
transcription regulatory network, to prove the robustness
of the system. The first two evaluations are based
on the corpora used for our previously reported
experiments [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Finally, we applied the system to a
related task of extracting regulatory events on cell
activities and compared the results with the GOA
database [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. While the first two evaluation tasks
focus on E. coli, a prokaryotic model organism, the
last task deals with human genes and cells.
      </p>
      <p>Table 1 shows the event templates for the
evaluations. The first two evaluations are to extract
instances of the first three event templates in the
table, while the last evaluation is to extract instances
of the two last event templates. Our system deals
with four properties of events: 1) agents which bind
to gene regulatory regions or control gene expression
and cell activities; 2) patients which are regulated by
the agents; 3) polarity, which tells whether the agent
regulates the patient positively or negatively; and 4)
physical contact, which indicates whether the agent
regulates the patient directly by binding or
indirectly through other agents. Since the three
evaluations only consider the agents and patients, the event
templates in Table 1 include only the two properties.</p>
      <sec id="sec-2-1">
        <title>Semantic template Gene Ontology</title>
        <p>concept
Regulation of
gene expression
(GO:0010468)
&lt;RegulationOfGeneExpression
hasAgent=?Protein
hasPatient=&lt;GeneExpression
hasPatient=?Gene&gt;&gt;
&lt;RegulationOfTranscription
hasAgent=?Protein
hasPatient=&lt;Transcription
hasPatient=?Gene&gt;&gt;
&lt;BindingOfTFTo</p>
        <p>TFBindingSiteOfDNA
hasAgent=?TranscriptionFactor
hasPatient=
&lt;RegulatoryDNARegion</p>
        <p>hasPatient=?Gene&gt;&gt;
&lt;RegulatoryProcess
hasAgent=?MolecularEntity
hasPatient=&lt;CellGrowth
hasAgent=?Cell&gt;&gt;
&lt;RegulatoryProcess
hasAgent=?MolecularEntity
hasPatient=&lt;CellDeath
hasAgent=?Cell&gt;&gt;</p>
      </sec>
      <sec id="sec-2-2">
        <title>Regulation of transcription (GO:0045449)</title>
      </sec>
      <sec id="sec-2-3">
        <title>Transcription factor binding (GO:0008134)</title>
      </sec>
      <sec id="sec-2-4">
        <title>Regulation of cell growth (GO:0001558)</title>
      </sec>
      <sec id="sec-2-5">
        <title>Regulation of cell killing (GO:0031341)</title>
        <sec id="sec-2-5-1">
          <title>Evaluation against event annotation</title>
          <p>
            We evaluated our system first against a manually
annotated corpus. The corpus consists of 209 MEDLINE
abstracts that contain at least one E. coli transcription
factor (TF) name. Two curators have annotated E. coli
gene regulatory events on the corpus and have agreed
on the final release of the annotated corpus which is
available online1 (see [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] for details, including
interannotator agreement).
          </p>
          <p>We randomly divided the corpus into two sets: One
for system development (i.e. training corpus) and the
other for system evaluation (i.e. test corpus). The
training corpus, consisting of 109 abstracts, has 250 events
annotated, while the test corpus, consisting of 100
abstracts, has 375 events annotated. We manually
constructed language patterns and inference rules, based on
the training corpus and a review paper (see the Methods
section for details).</p>
          <p>
            The system successfully extracted 79 events from
the test corpus (21.1% recall) and incorrectly produced
15 events (84.0% precision). We consider an extracted
event as correct if its two participants and their roles
(i.e. agent, patient) are correctly identified, following
the evaluation criteria of the previous approaches [
            <xref ref-type="bibr" rid="ref12 ref3">3, 12</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>Among the 79 events, the system has correctly identi</title>
        <p>
          fied polarity of 46 events (58.2% precision) and
physical contact of 51 events (64.6% precision), while these
two features are not considered for estimating the
system performance, following the evaluation criteria of the
previous approaches [
          <xref ref-type="bibr" rid="ref12 ref3">3, 12</xref>
          ].
        </p>
        <p>To understand the contribution of the inference on
the system, we have run the system without the inference
module. It then extracts only 37 out of the successfully
extracted 79 events, which indicates that the inference
contributes on 53.2% of the correct results. In addition,
the inference was involved in the extraction of only three
out of the 15 incorrectly extracted events. This result
supports our claim that logical inference can effectively
deduce implicit textual semantics from explicit textual
semantics.</p>
        <p>
          We have further focused on the events whose agents
are TFs for the purpose of comparing our system with
[
          <xref ref-type="bibr" rid="ref12 ref3">3, 12</xref>
          ]. The test corpus has 305 events with TFs as
agents. The system has successfully extracted 66 events
among them (21.6% recall) and incorrectly produced 6
events (91.7% precision). This performance is slightly
better than that of [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] (90% precision, ∼20% recall) and
of [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] (84% precision).
        </p>
      </sec>
      <sec id="sec-2-7">
        <title>We analyzed the errors of the system as follows: The</title>
        <p>false positives, in total 15 errors, are mainly due to the
inappropriate application of the loose pattern matching
method (7 errors) (see the Methods section for details).
The other causes include parse errors (2), the neglect of
negation (1), and an error in conversion from predicate
argument structure to dependency structure (1). These
results of error analysis indicate that the three incorrect
events, which were extracted by the system with the
inference module, are actually due to the incorrect outputs
of the prior modules (e.g. pattern matching) passed to
the inference module. In short, the inference module
caused no incorrect results.</p>
      </sec>
      <sec id="sec-2-8">
        <title>We also analyzed the false negatives. We found that</title>
        <p>29.7% of the missing events (88/296) are due to the
deficiency of the gene name dictionary and that 30.0%
(68/296) are due to the lack of anaphora resolution.
The rest of the missing events (40.3%) are thus
dependent upon pattern matching and inference. It is hard
to distinguish errors by pattern matching from those
by the inference, because the inference module takes
into consideration all semantics from an entire
document (i.e. MEDLINE abstract) for the evidence
combination. Therefore, the inference together with the
pattern matching affects at most 40% of the false negatives.</p>
      </sec>
      <sec id="sec-2-9">
        <title>Additionally, we analyzed the effect of event types.</title>
        <p>
          The precision for the events of the type “regulation of
transcription” is 85%, higher than that of [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] (77%
precision), while the overall precision (67%) is predictably
lower than that since the system of [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] is developed
specifically for extracting regulatory events on gene
transcription. We included the events of the other two types,
which are hypernyms of “regulation of transcription”,
into the result set for the evaluation, because of the
low recall for the events of “regulation of transcription”
(5%). The overall recall (33%) is still lower than that
of [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] (45% recall) because of the small size of the
regulon.fulltext corpus (436 fulltexts). Note that [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]
extracted 42% of RegulonDB events from 2,475 fulltexts
of RegulonDB references. We plan to analyze a larger
number of fulltexts in the future.
        </p>
        <p>It is remarkable that the inference is inevitable for
extracting 93.8% of the RegulonDB events that are
extracted by our system from the corpora. In contrast, the
inference module is involved in the extraction of only
3.2% of the false negative events. The percentage 93.8%
is much higher than 53.2% of the first evaluation. The
difference may be due to the fact that this second
evaluation only counts unique events, while the first
evaluation against the event annotations counts all extracted
event instances. If so, these results may indicate that
only a small amount of well-known events are frequently
mentioned in papers in concise language forms, thus
extracted by language patterns even without the help of
inference, and that the rest of the events are expressed
in papers with the detailed procedures of experiments
which led to the discovery of the events.</p>
        <sec id="sec-2-9-1">
          <title>Adaptation for regulation of cell activities</title>
        </sec>
      </sec>
      <sec id="sec-2-10">
        <title>Rule-based systems are criticized for being too specific</title>
        <p>to the domains for which they have been developed, so
much so that they cannot be straightforwardly adapted
for other domains. To prove the adaptability of our
system, we have applied it to a related topic: Regulation of
cell activities.</p>
      </sec>
      <sec id="sec-2-11">
        <title>The goal of this new task is to populate the GOA [11],</title>
        <p>concerning two Gene Ontology (GO) concepts:
Regulation of cell growth (GO:0001558) (shortly, RCG) and
regulation of cell death (GO:0031341) (shortly, RCD).</p>
      </sec>
      <sec id="sec-2-12">
        <title>GOA is a database which provides GO annotations to</title>
        <p>proteins. In short, the task is to identify the proteins
that can be annotated with the two GO concepts. The
semantic templates of the two event types are defined in
Table 1.</p>
      </sec>
      <sec id="sec-2-13">
        <title>The adaptation included only the following work: We</title>
        <p>
          manually collected keywords of the concepts ‘growth’
and ‘death’ from WordNet and constructed 40 patterns
for the keywords by using MedEvi [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. As candidate
agents, we collected human gene/protein names from
        </p>
      </sec>
      <sec id="sec-2-14">
        <title>UniProt. We also collected cell type names from MeSH.</title>
      </sec>
      <sec id="sec-2-15">
        <title>These are newly built resources that were not required</title>
        <p>for the first two evaluation tasks. Existing language
patterns and inference rules, for example for the concept
‘regulation’, were reused. We have not used any training
corpus to further adjust the system to the new task.</p>
      </sec>
      <sec id="sec-2-16">
        <title>We constructed a test corpus consisting of 13,136 ab</title>
        <p>stracts by querying PubMed with two MeSH terms “Cell
Death” and “Cell Enlargement”. The system with the
inference module extracted 244 unique UniProt proteins
associated with RCG events and 266 unique proteins
associated with RCD events from the corpus. This
evaluation also uses the two measures: Precision, the
percentage of unique proteins found in GOA among the
extracted proteins, and recall, the percentage of extracted
proteins among the protein records in GOA. GOA
contains 16 proteins among the 244 proteins of RCG events
(6.6% precision) and 100 proteins among the 266 proteins
of RCD events (37.6% precision). Currently (2010 July),
the GOA has 155 proteins associated with RCG (10.3%
recall) and 908 proteins associated with RCD (11.0%
recall). These results show that our system can be applied
to a related task with minimal adaptations.</p>
        <p>We also tested the system without the inference
module against the cell corpus. It identifies 193 proteins
associated with RCG events and 198 proteins associated
with RCD events. GOA contains 13 proteins among the
193 proteins of ROG events (6.7% precision) and 78
proteins among the 198 proteins of RCD events (39.4%
precision). The precision almost does not change even after
running without the inference module, while the recall
drops about 20% without the inference module. This
finding is similar to what we found from the results of
the second evaluation such that the precision is
independent from the inference, while the recall drops
significantly without the inference module. But the relatively
smaller drop of recall for the new task may indicate that
the inference rules developed for the first two evaluations
have less effects on the third evaluation than the other
two evaluations.</p>
        <p>We have manually inspected 20 out of the proteins
that are extracted by our system but not found in GOA,
for each event type. Among the 20 ‘false positive’
proteins of the RCD concepts, we found evidence that can
support the association of 15 proteins with RCD
concepts (75%). This means that the real precision can go
up to 80% and more importantly that we can identify
new protein instances of GO concepts by using our
system. Among the 20 ‘false positive’ proteins of the RCG
concepts, we located evidence only for 8 proteins (40%).</p>
      </sec>
      <sec id="sec-2-17">
        <title>After careful inspection, we realized that the precision of</title>
        <p>the RCG-related proteins is much lower than that of the</p>
      </sec>
      <sec id="sec-2-18">
        <title>RCD-related proteins because the language patterns for</title>
        <p>RCG events, which we collected from WordNet, are not
specific to cell size growth, but may also refer to cell
proliferation and development which should be linked to the
other GO concepts “cell proliferation” (GO:0008283) and
“cell development” (GO:0048468). The lack of training
corpus led to this problem, and so we plan to extend the
experiment to other GO concepts, establishing training
corpora for the concept identification in text.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Discussion</title>
      <p>As explained in the Introduction, the inference rules we
introduce in this paper are to deduce implicit events from
explicit events. Note that unless the explicit events
contain enough evidence to an implicit event, we cannot
deduce the implicit event from the explicit events. In
other words, the implicit events are alternative
representations of the extracted information, where the implicit
events do not convey new information compared to the
explicit events. The performance comparison between
the system with the inference and that without the
inference is, in a sense, to see which representations better
fit for the target templates, where the inference rules are
designed to produce results that better match the target
templates.</p>
      <p>
        The previous event extraction systems often utilize
rules or models whose semantics directly reflect the
target event templates, thus embedding linguitic and
domain knowledge together. In contrast, our approach
of separating the inference rules from the linguistic
resources has the following characteristics: 1) We can
represent the semantics of sentences, which are relevant to
event extraction, according to the syntactic structures of
the sentences, independently from target semantic
templates [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]; 2) we can construct language patterns for
event extraction without respect to target semantics,
considering the compositional aspect of events, which has
led to the development of phrase-level patterns rather
than sentence-level or clause-level patterns [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]; and 3)
we can add or remove language patterns according to
their semantic categories, not worrying about the
sideeffect of domain-specific patterns, which makes the
patterns highly reusable, as shown in the third test case.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>We proposed a novel approach to event extraction, using
an ontology to represent the semantics of lexical,
syntactic, and pragmatic resources. We focused on extracting
regulatory events on gene expression and cell activities,
which are very important to molecular biology and
disease studies. Our system shows the full complexity in the
identification of such complex events from the literature
and may guide the ontology development to innovative
ways of integrating various knowledge resources.</p>
    </sec>
    <sec id="sec-5">
      <title>Methods</title>
      <sec id="sec-5-1">
        <title>Our system first recognizes mentions of individual GRO</title>
        <p>instances in text, which can be the event components.
It then combines them into compositional structures of
explicit events by using language patterns. The system
performs inference based on domain knowledge to
deduce implicit events from the explicit events. It finally
extracts the events that match pre-defined event
templates. Both explicit and implicit events may fit for the
database event templates.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Figures 1a and 1b show the examples of the ex</title>
        <p>tracted events. Figure 1a depicts the three types of
structures from the input text: Dependency structure,
explicit event, and implicit event. An arrow between
the syntactic and semantic structures indicates a
corInput Text: In addition, both himA and himD lesions caused a sevenfold
reduction in expression of a phi(fimA-lacZ) operon fusion in strains in
which fimA was locked in the on phase.</p>
        <p>Dependency Structure:
(caused/VB,
(-Subject- lesions/NN,
(-Object- and/CC,
-- both/CC,
-- himA/NN:Gene,
-- himD/NN:Gene)),
(-Object- reduction/NN,
-- a/DT,
-- sevenfold/JJ,
(-- in/IN,
(-Object- expression/NN,
(-- of/IN,
(-Object- fusion/NN,
(-Object- operon/NN,
(-- (/LRB,
-- )/RRB,
-- fimA/NN:Gene,
-- lacZ/NN:Gene))))))))
Implicit Event (fit for Database Template):
&lt;RegulationOfGeneExpression
hasAgent = &lt;Protein name="himA"&gt;
hasPatient = &lt;GeneExpression hasPatient=&lt;Gene name="fimA"&gt;&gt;
hasPolarity="positive"&gt;
(a) Example 1</p>
        <p>Explicit Event:
&lt;RegulatoryProcess
hasAgent =
&lt;RegulatoryProcess
hasPatient =
&lt;Protein name="himA"&gt;
hasPolarity="negative"&gt;
hasPatient =
&lt;RegulatoryProcess
hasPatient =
&lt;GeneExpression
hasPatient=
&lt;Gene name="fimA"&gt;&gt;
hasPolarity="negative"&gt;&gt;</p>
        <p>Input text
Named entity recognition
Named entity annotated text
Parsing
Dependency structure
Pattern matching
(Explicit) Textual semantics
represented with GRO
Inference
Extraction
(Explicit+Implicit)</p>
        <p>Textual semantics
Events of pre-defined types
The implicit event is deduced from the explicit events by
using the inference rules 1 to 3 in Table 4. TFBS stands
for TranscriptionFactorBindingSiteOfDNA. Figure 1b
shows that the explicit events of the two sentences are
combined to deduce the implicit event. Rule 4 in Table</p>
      </sec>
      <sec id="sec-5-3">
        <title>4 is used for the deduction. The overall workflow of the system is depicted in Figure 2.</title>
        <p>No.</p>
        <p>Syntactic pattern / Semantic pattern
1
2
3
4
(expression Noun (of Prep Object:Gene)) /
&lt;GeneExpression hasPatient=Gene&gt;
(reduction Noun (in Prep Object:Patient)) /
&lt;RegulatoryProcess hasPatient=Patient
hasPolarity=“negative”&gt;
(lesion Noun Object:Patient) /
&lt;RegulatoryProcess hasPatient=Patient
hasPolarity=“negative”&gt;
(cause Verb Subject:Agent Object:Patient) /
&lt;RegulatoryProcess hasAgent=Agent
hasPatient=Patient&gt;</p>
        <p>Condition(s) ⇒ Conclusion
&lt;RegulatoryProcess hasPolarity=Polarity2
hasAgent=&lt;RegulatoryProcess
hasPatient=Patient
hasPolarity=Polarity1&gt;&gt;
⇒ &lt;RegulatoryProcess hasAgent=Patient
hasPolarity =</p>
        <p>polarity sum(Polarity1,Polarity2)
&lt;RegulatoryProcess hasPolarity=Polarity2
hasPatient=&lt;RegulatoryProcess
hasPatient=Patient
hasPolarity=Polarity1&gt;&gt;
⇒ &lt;RegulatoryProcess hasPatient=Patient
hasPolarity =</p>
        <p>polarity sum(Polarity1,Polarity2)
&lt;RegulatoryProcess</p>
        <p>hasPatient=GeneExpression&gt;
⇒ &lt;RegulationOfGeneExpression</p>
        <p>hasPatient=GeneExpression&gt;
&lt;RegulationOfGeneExpression
hasAgent=TranscriptionFactor
hasPatient=&lt;GeneExpression</p>
        <p>hasPatient=Gene&gt;&gt;
&lt;RegulatoryDNARegion hasAgent=Gene
hasPart=&lt;TFBS</p>
        <p>hasAgent=TranscriptionFactor&gt;&gt;
⇒ &lt;RegulationOfTranscription
hasAgent=TranscriptionFactor
hasPatient = &lt;Transcription</p>
        <p>hasPatient=Gene&gt;
hasPhysicalContact=“yes”&gt;</p>
        <p>Table 4. Example inference rules</p>
        <sec id="sec-5-3-1">
          <title>Named entity recognition</title>
        </sec>
      </sec>
      <sec id="sec-5-4">
        <title>We have adopted a dictionary-based approach for named</title>
        <p>entity recognition. The dictionary contains 15,881
gene/protein and operon names of E. coli, including
the names of 169 E. coli TF names, collected from</p>
      </sec>
      <sec id="sec-5-5">
        <title>RegulonDB and SwissProt. The recognized names are grounded with UniProt identifiers and labeled with relevant GRO concepts among the followings: Gene, Protein, Operon, and TranscriptionFactor.</title>
        <sec id="sec-5-5-1">
          <title>Parsing</title>
          <p>
            We have utilized Enju, the HPSG parser [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ], for
syntactic analysis of sentences. While the Enju parser
produces predicate-argument structures, we have developed
a module to convert them into dependency structures
and selectively merged the predicate-argument structure
into the dependency structure. We have identified the
dependency structure for the loose matching of language
patterns explained below.
          </p>
        </sec>
        <sec id="sec-5-5-2">
          <title>Pattern Matching</title>
        </sec>
      </sec>
      <sec id="sec-5-6">
        <title>To identify the explicit events from sentences, the sys</title>
        <p>tem utilizes syntactic-semantic paired patterns,
matching the syntactic patterns to the dependency structures
and combining the semantic patterns into a semantic
structure.</p>
        <p>Each pattern is a pair of a syntactic pattern and a
semantic pattern. Syntactic patterns comply with
dependency structures. The leftmost item within a pair of
parentheses (e.g. cause Verb, lesion Noun) is the head
of the other items within the parentheses (e.g.
Subject:Agent, Object:Patient). A dependent item may be
surrounded by another pair of parentheses, which forms
an embedded structure (e.g. Pattern 1, Pattern 2). The
lexical items in the syntactic patterns are labeled with
part-of-speech (POS) tags (e.g. Verb, Noun, Prep), and
should be matched to words with the same POS tags.</p>
      </sec>
      <sec id="sec-5-7">
        <title>The dependent items have syntactic constraints that in</title>
        <p>dicate their roles with respect to their head items (e.g.
Subject, Object), and should be matched to those with
the syntactic roles. The dependent items may have
semantic variables (e.g. Agent, Patient, Gene), which
indicate the semantics of the dependent items. If the
semantic variable of a dependent item is a concept of GRO
(e.g. Gene), the variable should match a semantic
category that is identical to, or a sub type of, the specified
concept.</p>
      </sec>
      <sec id="sec-5-8">
        <title>The semantic pattern expresses the semantics of its</title>
        <p>corresponding syntactic pattern. The semantic pattern
is represented with GRO concepts (e.g.
RegulatoryProcess, GeneExpression) and properties (e.g. hasAgent,
hasPatient).</p>
        <p>The system tries to match the syntactic patterns to
the dependency structures of sentences in a bottom-up
way. For example, it matches from Pattern 1 to Pattern</p>
      </sec>
      <sec id="sec-5-9">
        <title>4 in Table 3 to the dependency structure of the example</title>
        <p>(1) depicted in Figure 1a. In the process, it considers
the syntactic and semantic constraints of the syntactic
patterns. For instance, the item ‘cause’ of the fourth
pattern in Table 3 should match the verb ‘cause’ that
has both a subject and an object.</p>
        <p>Once a syntactic pattern is successfully matched to a
node of dependency structure, its corresponding
semantic pattern is assigned to the node as one of its
semantics. If the syntactic pattern has dependent items with
semantic variables (e.g. Subject:Agent, Object:Patient),
the variables (e.g. Agent, Patient) are replaced with the
semantics of the children of the node that have been
matched to the dependent items. In this way, the
semantics of multiple phrases is combined into sentential
semantics. In Figure 1a, the small boxes with dashed
lines show the semantics assigned to the internal nodes
of the example (1), which are later combined into the
textual sentential semantics.</p>
        <p>Note that the node ‘lesions’ is assigned two pieces of
semantics for the two gene names that are the children
of the node (i.e. himA, himD). The explicit textual
semantics of Figure 1a is one of the two, while the other is
a duplicate of Sem1 except that the gene name ‘himA’
is replaced with ‘himD’.</p>
      </sec>
      <sec id="sec-5-10">
        <title>One important feature of the pattern matching is</title>
        <p>that we loosely match the syntactic patterns to the
dependency structures. For instance, the gene name ‘fimA’
is not a direct child of the preposition ‘of’, but is matched
to the item Object:Gene of the first pattern in Table 4.
We have decided to match a dependent item not only
to a direct child of the node matched to the head item,
but also to any descendant of the node. The feature is
based on two reasons: First, it is practically impossible
to construct all potential patterns for the event
extraction, though a reasonably large number of patterns for
gene regulation have been accumulated; and second, the
lexical entries not matched to any of the patterns for gene
regulation (e.g. ‘sevenfold’, ‘operon’, ‘fusion’) might not
affect the extraction of the events.</p>
      </sec>
      <sec id="sec-5-11">
        <title>This loose matching still works under the following</title>
        <p>strict conditions: 1) An item with a syntactic role (e.g.</p>
      </sec>
      <sec id="sec-5-12">
        <title>Subject) can be matched to one of descendants under</title>
        <p>the sub-tree with the syntactic role; 2) once an item
is matched to a node, it is not further matched to the
node’s descendants; and 3) it does not jump over clausal
boundaries (e.g. ‘which’) and several exceptional words
(e.g. ‘except’).</p>
        <sec id="sec-5-12-1">
          <title>Inference</title>
        </sec>
      </sec>
      <sec id="sec-5-13">
        <title>The inference step is to transduce explicit textual se</title>
        <p>mantics (or events) into implicit semantics (or events).
It deduces a new specific event instance, if possible, by
combining any two or more general events. The inference
module takes as input the explicit events from a text
(i.e. a MEDLINE abstract, a fulltext) identified by the
previous module of pattern matching. It applies to the
explicit events the inference rules that reflect common
sense knowledge and domain knowledge, as exemplified
in Table 4.</p>
      </sec>
      <sec id="sec-5-14">
        <title>An inference rule has the propositional logic form of</title>
        <p>P→Q, where P is a set of conditions and Q is the
conclusion. It works with the modus ponens rule (i.e. P, P→Q
⊢ Q). That is, if all the conditions P of a rule match
some of the identified events from a text, the conclusion</p>
      </sec>
      <sec id="sec-5-15">
        <title>Q is instantiated and then added as an additional event</title>
        <p>of the text. As the input events are represented with</p>
      </sec>
      <sec id="sec-5-16">
        <title>GRO, the inference rules and their resultant events are also represented with GRO.</title>
      </sec>
      <sec id="sec-5-17">
        <title>We have constructed 28 inference rules for deal</title>
        <p>ing with the compositional structures of gene regulation
events (e.g. Rules 1, 2) and for deducing biological events
from the combination of linguistic events (e.g. Rules 3,</p>
      </sec>
      <sec id="sec-5-18">
        <title>4) by consulting the training corpus and the review pa</title>
        <p>
          per [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] (see Table 4).
        </p>
        <p>For example, Rules 1 and 2 flatten, if possible, the
compositional structure of event descriptions. The
explicit events in Figure 1a has a cascaded structure with
four basic event instances (i.e. three RegulatoryProcess,
one GeneExpression) and is transformed by Rules 1 and</p>
      </sec>
      <sec id="sec-5-19">
        <title>2 to fit for the database template that has only two event</title>
        <p>instances (i.e. RegulationOfGeneExpression,
GeneExpression). Rule 3 deduces the specific event type
RegulationOfGeneExpression from a general type of event (i.e.</p>
      </sec>
      <sec id="sec-5-20">
        <title>RegulatoryProcess). Rule 4 reflects the domain knowl</title>
        <p>edge that if a transcription factor both binds to the
regulatory region of a gene and regulates the gene’s
expression level, it is the transcriptional regulator of the gene.</p>
      </sec>
      <sec id="sec-5-21">
        <title>Note that the two conditions of Rule 4 can be matched</title>
        <p>to events from any sentences; in other words, Rule 4
can merge multiple evidence from different sentences into
a fact. The function polarity sum works exactly like</p>
      </sec>
      <sec id="sec-5-22">
        <title>NXOR (Not Exclusive OR) operation in Boolean logic.</title>
      </sec>
      <sec id="sec-5-23">
        <title>The rules are repeatedly applied over the explicit events</title>
        <p>from a given text until no additional event is generated.</p>
      </sec>
      <sec id="sec-5-24">
        <title>We have implemented a program that converts the</title>
        <p>inference rules into Prolog programming codes and a</p>
      </sec>
      <sec id="sec-5-25">
        <title>Prolog application that executes the rules over input</title>
        <p>
          events. We could not use the OWL-DL reasoners (e.g.
Pellet) because of the DL-safe restriction of the
reasoners. DL-safe restriction assumes that all instances
of rules, both in conditions and in conclusions, should
be available at the knowledge base [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Unfortunately,
however, the rules for the event extraction generate new
instances of events and event attributes in the
conclusions. Nonetheless, we can still utilize the reasoners
to validate the ontology populated with the extracted
events.
        </p>
        <sec id="sec-5-25-1">
          <title>Extraction</title>
          <p>The system finally selects the events that match given
semantic templates among those resulted from either
pattern matching or inference. Table 1 shows the event
templates. The variables are marked with ‘?’ and are
matched to the instances of the concepts referred to by
the variables. For example, the variable “?Protein” can
be matched to a protein name. Non-variable concepts
and properties are used as semantic restriction on the
events to extracted. For example, the last template in
Table 1 can be matched to an instance of
NegativeRegulation, which a child of RegulatoryProcess. In
addition, the patient of the instance should an instance of</p>
        </sec>
      </sec>
      <sec id="sec-5-26">
        <title>CellDeath and the agent can be a gene, where Gene is a descendant of MolecularEntity.</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Authors contributions</title>
      <sec id="sec-6-1">
        <title>JJK conceived the study, designed and implemented the system, carried out the evaluations and drafted the manuscript. DRS motivated and coordinated the study</title>
        <p>and revised the manuscript.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <sec id="sec-7-1">
        <title>We would like to thank Vivian Lee and Ruth Lovering</title>
        <p>for their contribution on the event annotation, and Nick</p>
      </sec>
      <sec id="sec-7-2">
        <title>Luscombe and Aswin Seshasayee for their helping us to learn the domain knowledge of gene transcription regulation. We would also like to thank the anonymous reviewers for their valuable comments.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Daraselia</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuryev</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Egorov</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novichkova</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikitin</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mazo</surname>
            <given-names>I</given-names>
          </string-name>
          :
          <article-title>Extracting human protein interactions from MEDLINE using a full-sentence parser</article-title>
          .
          <source>Bioinformatics</source>
          <year>2004</year>
          ,
          <volume>20</volume>
          (
          <issue>5</issue>
          ):
          <fpage>604</fpage>
          -
          <lpage>611</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cimiano</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reyle</surname>
            <given-names>U</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saric</surname>
            <given-names>J</given-names>
          </string-name>
          :
          <article-title>Ontology-based discourse analysis for information extraction</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          <year>2005</year>
          ,
          <volume>55</volume>
          :
          <fpage>59</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Saric</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jensen</surname>
            <given-names>LJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rojas</surname>
            <given-names>I</given-names>
          </string-name>
          :
          <article-title>Large-scale extraction of gene regulation for model organisms in an ontological context</article-title>
          .
          <source>Silico Biology</source>
          <year>2005</year>
          ,
          <volume>5</volume>
          :
          <fpage>21</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hunter</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            <given-names>Z</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Firby</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baumgartner</surname>
            <given-names>WA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
            <given-names>HL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ogren</surname>
            <given-names>PV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            <given-names>KB</given-names>
          </string-name>
          :
          <article-title>OpenDMAP: an open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-typespecific gene expression</article-title>
          .
          <source>BMC Bioinformatics</source>
          <year>2008</year>
          , 9:
          <fpage>78</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kim</surname>
            <given-names>JD</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohta</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pyysalo</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kano</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsujii</surname>
            <given-names>J</given-names>
          </string-name>
          :
          <article-title>Overview of BioNLP'09 shared task on event extraction</article-title>
          .
          <source>In Proceedings of the Workshop on BioNLP: Shared Task</source>
          <year>2009</year>
          :
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gaizauskas</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demetriou</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artymiuk</surname>
            <given-names>PJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Willett</surname>
            <given-names>P</given-names>
          </string-name>
          :
          <article-title>Protein structures and information extraction from biological texts: The PASTA system</article-title>
          .
          <source>Bioinformatics</source>
          <year>2003</year>
          ,
          <volume>19</volume>
          :
          <fpage>135</fpage>
          -
          <lpage>143</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Narayanaswamy</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ravikumar</surname>
            <given-names>KE</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vijay-Shanker</surname>
            <given-names>K</given-names>
          </string-name>
          :
          <article-title>Beyond the clause: extraction of phosphorylation information from medline abstracts</article-title>
          .
          <source>Bioinformatics</source>
          <year>2005</year>
          ,
          <volume>21</volume>
          (
          <issue>Suppl</issue>
          . 1):
          <fpage>i319</fpage>
          -
          <lpage>i327</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Culotta</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Betz</surname>
            <given-names>J</given-names>
          </string-name>
          :
          <article-title>Integrating probabilistic extraction models and data mining to discover relations and patterns in text</article-title>
          .
          <source>In Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics</source>
          <year>2006</year>
          :
          <fpage>296</fpage>
          -
          <lpage>303</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Beisswanger</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            <given-names>V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            <given-names>JJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rebholz-Schuhmann</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Splendiani</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dameron</surname>
            <given-names>O</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schulz</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hahn</surname>
            <given-names>U</given-names>
          </string-name>
          :
          <article-title>Gene Regulation Ontology (GRO): Design Principles</article-title>
          and
          <string-name>
            <given-names>Use</given-names>
            <surname>Cases</surname>
          </string-name>
          .
          <source>Studies in Health Technology and Informatics</source>
          <year>2008</year>
          ,
          <volume>136</volume>
          :
          <fpage>9</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hahn</surname>
            <given-names>U</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomanek</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buyko</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            <given-names>JJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>RebholzSchuhmann</surname>
            <given-names>D</given-names>
          </string-name>
          :
          <article-title>How Feasible and Robust is the Automatic Extraction of Gene Regulation Events? A Cross-Method Evaluation under Lab and RealLife Conditions</article-title>
          .
          <source>In Proceedings of BioNLP 2009</source>
          <year>2009</year>
          :
          <fpage>37</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Barrell</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimmer</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huntley</surname>
            <given-names>RP</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Binns</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Donovan</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Apweiler</surname>
            <given-names>R</given-names>
          </string-name>
          :
          <article-title>The GOA database in 2009 - an integrated Gene Ontology Annotation resource</article-title>
          .
          <source>Nucleic Acids Research</source>
          <year>2009</year>
          ,
          <volume>37</volume>
          :
          <fpage>D396</fpage>
          -
          <lpage>D403</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Rodr´
          <string-name>
            <surname>ıguez-Penagos</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salgado</surname>
            <given-names>H</given-names>
          </string-name>
          , Mart´
          <string-name>
            <surname>ınez-Flores</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Collado-Vides</surname>
            <given-names>J</given-names>
          </string-name>
          :
          <article-title>Automatic reconstruction of a bacterial regulatory network using Natural Language Processing</article-title>
          .
          <source>BMC Bioinformatics</source>
          <year>2007</year>
          , 8:
          <fpage>293</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kim</surname>
            <given-names>JJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pezik</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rebholz-Schuhmann</surname>
            <given-names>D</given-names>
          </string-name>
          :
          <article-title>MedEvi: Retrieving textual evidence of relations between biomedical concepts from Medline</article-title>
          .
          <source>Bioinformatics</source>
          <year>2008</year>
          ,
          <volume>24</volume>
          (
          <issue>11</issue>
          ):
          <fpage>1410</fpage>
          -
          <lpage>1412</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kim</surname>
            <given-names>JJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chae</surname>
            <given-names>YS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choi</surname>
            <given-names>KS</given-names>
          </string-name>
          :
          <article-title>Phrase-Pattern-based Korean to English Machine Translation using Two Level Translation Pattern Selection</article-title>
          .
          <source>In Proceedings of 38th Association for Computational Lingusitics (ACL)</source>
          <year>2000</year>
          :
          <fpage>31</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sagae</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miyao</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsujii</surname>
            <given-names>J</given-names>
          </string-name>
          :
          <article-title>HPSG parsing with shallow dependency constraints</article-title>
          .
          <source>In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics</source>
          , Prague,
          <source>Czech Republic</source>
          <year>2007</year>
          :
          <fpage>624</fpage>
          -
          <lpage>631</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Browning</surname>
            <given-names>DF</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Busby</surname>
            <given-names>SJW</given-names>
          </string-name>
          :
          <article-title>The regulation of bacterial transcription initiation</article-title>
          .
          <source>Nature Reviews Microbiology</source>
          <year>2004</year>
          ,
          <volume>2</volume>
          :
          <fpage>57</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Motik</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            <given-names>U</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Studer</surname>
            <given-names>R</given-names>
          </string-name>
          :
          <article-title>Query answering for OWL-DL with rules</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          <year>2005</year>
          ,
          <volume>3</volume>
          :
          <fpage>41</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>