<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Knowledge-based highly-specialized terrorist event extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jakub Dutkiewicz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Czesław Jędrzejek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jolanta Cybulka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maciej Falkowski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Control and Information Engineering</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Pl. M. Skłodowskiej-Curie 5</institution>
          ,
          <addr-line>60-965 Poznań</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Poznan University of Technology</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we present a prototype of a system aimed at event extraction using linguistic patterns with semantic classes. The process is aided with an auxiliary tool for mapping verb statistics across messages. The sentence analyzer uses linguistic associations, based on VerbNet across the message and between messages' sentences to select semantic role fillers. We restrict ourselves to the coverage of one event type only - namely a kidnapping  and to two events template slots (semantic roles): a perpetrator and a person_target (a human target). We designed rules involving semantic role filling using previous works on coreference. We used the Sundance parser and AutoSlog extraction patterns generator. Then we applied the semantic role filler and event resolution tool SRL Master. Our approach yields high performance on the MUC-4 data set.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>knowledge-based information extraction</kwd>
        <kwd>semantic roles</kwd>
        <kwd>terrorist event discovery</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Event extraction is one of the most important tasks of knowledge discovery. It may be
regarded as the core of knowledge-based systems that aim at providing the public
(people, organizations, government agenda etc.) with condensed and filtered
information concerning events. These events are described in texts written in natural
language, thus the posted problem is related to the issue of information extraction (IE).
Particularly, the task is to extract data concerning the described action (the event) and
its arguments (called event roles). To implement the considered task different
approaches are applied. They can be classified according to the provenance of the
approach (pattern-based linguistic ones vs. classifier-based (statistical) methods) or to
the ‘openness’ of it (fully open extraction vs. trained with the use of corpora one). The
next important classification criterion is the nature of the context of the extraction,
namely locality (one sentence only) or a larger context that takes into account
consecutive sentences (a discourse). In many cases the hybrid methods are used that
combine the different approaches. The open extraction systems (operating across one
sentence context) scale well to the open corpora [
        <xref ref-type="bibr" rid="ref1 ref3">1,3</xref>
        ], especially that acquitted from
the Web. But the most accurate IE systems are domain-specific, that use linguistic
patterns and are somewhat trained with the aid of statistics. Our work follows the
latter approach in that we use a training domain-specific corpus. Let us characterize it
briefly.
      </p>
      <p>
        Due to a series of DARPA Message Understanding Conferences (MUCs),
significant progress in pattern-based (NLP based) extraction technologies has been
achieved. In this work we capitalise on the results of MUC-3 and MUC-4 ([
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] that
were held in 1991-1992) conferences, which used news reports corpus (MUC corpus)
on terrorist activities in Latin America. MUC Conferences developed standards for
evaluation, e.g. the adoption of metrics like precision and recall.
      </p>
      <p>The goal of MUC was to extract from texts an information concerning 7 classes of
terrorist events: Attack, Kidnapping, Hijacking, Bombing, Arson, Robbery and
Forced Work Stoppage, plus several variations on each (for accomplished, threatened
and attempted incidents). The process of extraction was augmented by the knowledge
frames (event templates) generation. Every such template consisted of 24
attributesslots. A document (a multi-sentenced message concerning an event) could be labeled
with more than one template type. The MUC-4 corpus consists of 1700 documents,
from which 1300 (DEV) were used in MUC-4 for training, 200 documents
(TST1+TST2) were used as a tuning set, and the last 200 documents (TST3+TST4)
were applied as the test set. The resulting knowledge base frames are called “key
templates”. We filter out messages concerning one event type only, namely the
kidnapping. Also, from among 24 slots we consider the two of them: a perpetrator and a
person_target.</p>
      <p>The main contributions of the presented paper are:
 a method of comparing events to check whether a given two events are in fact
identical or whether they are different, on the basis of semantic typing (semantic
classes) of event’s arguments; it relies on using several types of rules, namely
atomic, filling thematic role rules and whole events comparing rules; the method
may be also used in coreference resolution
 an implementation of a corpus crawling tool that looks for words/phrases that
lexicalize the kidnapping event
 additional lexical rules related to identification of victims and perpetrators.</p>
      <p>The paper is organised as follows. Section 2 contains some notes concerning
related works. In Section 3 our extraction method is presented. Section 4 describes a
prototype implementation of the Word-statistics tool and its use. Section 5
demonstrates our information extraction results. In section 6 we give the concluding remarks
and mention on our future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related works</title>
      <p>
        The main drawback of open information extraction [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is that it uses the natural
language features which do not classify (semantically type) arguments of an extracted
relation. Additionally, in such methods the syntactic patterns (for example, regular
expressions) do not match verb arguments that are distant from the verb phrase in a
sentence. These are the features having the great negative impact on the ability to
compare events (whether they are identical or not) described in the different
sentences. In our work we avoid this drawback.
      </p>
      <p>
        Authors of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] use the language resources (dictionaries) to obtain sets of words that
are relevant to the semantic class (a type of a verb argument). Having such
extensionally defined types (semantic classes) they use them in the extraction process. In this
work it is also shown how to apply such classes in the process of events comparison.
      </p>
      <p>
        The method of event’s comparison is also described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Here, the authors
compare them (and extract their arguments) on the basis of head parts of noun phrases.
For example, the events described in the following two sentences:
1) A customer in the store was shot by masked men.
      </p>
      <p>2) The two men used 9mm semi-automatic pistols.
are in fact the same due to the fact that they use the same word “men”. In our
approach the events may be unified (or differentiated) on the basis of the membership
(non-membership) of two used (“linking”) words to the same semantic class. Also, it
is not known, which pairs of sentences should be analyzed according to the event (we
describe this problem later on).
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>The extraction method</title>
      <sec id="sec-3-1">
        <title>Preliminary definitions</title>
        <p>At first, let us give some definitions of the terms used in the paper. They are as
follows.</p>
        <p>Event (denoted by En, where n stands for event’s name) is an entity representing
the event (conceptually it is an occurrent that plays the central role in some situation,
which represents a state of affairs) described in the text. The event is connected with a
syntactic phrase (a verb phrase) that helps to identify it in a sentence, which is called
an anchor. Also, there are some participants in the event  we identify them via
thematic roles that are arguments of an anchoring phrase.</p>
        <p>Anchor (marked as Ak, where k stands for an anchor name) is a verb or a verb
phrase, which appearance in a derivation (i.e. a syntactically parsed sentence) triggers
the process of recognition of an event (such as, for example, the kidnapping).</p>
        <p>Thematic role (a semantic role label, marked as Rm, where m is a role name) is an
entity representing an argument of a verb or a verb phrase (an anchor) denoting the
event. For example, there may be such roles as Agent (in our considerations, a
perpetrator), Patient (a victim), Instrument, Location, Time and others.</p>
        <p>Role filler is a text phrase that instantiates a thematic role in the text (marked with
the symbol RpFv, where p is a role name and v identifies a filler).</p>
        <p>Syntactic similarity. Let us assume that the two argument function of syntactic
similarity simsyn (W1, W2), while given two words (or phrases) as arguments returns
a binary value true or false. The function will return the true value if W1 and W2 have
the same syntactic properties (i.e. number and gender), otherwise it returns false.</p>
        <p>Semantic class (denoted by Cs, where s is a class name) is defined as an entity
that is expressed by all of its verbalizations. For example, the verbalizations of the
semantic class concerning kidnapping are Ckidnapping={kidnap, seize, abduct, capture,
intercept, take hostage}. It should be noted that we do not use all the meanings of the
listed words, but only these fitting to a specific context.</p>
        <p>Atomic formula is a triple of the form &lt;sub, pred, obj&gt;, where sub means the
subject of the sentence (and semantically it may play a thematic role Rm), pred means
the predicate (represents an event in terms of a certain semantic class Cs) and obj
means the object (semantically playing a role Rp). An atomic formula could be
considered as a rule representing a fact.</p>
        <p>Let us illustrate the introduced notions with the exemplary message from
DEVMUC3-0018 (the text in this corpus is given in an upper case). We decorated the text
with roles, role fillers, events and anchors. One of the considered sentences is:</p>
        <p>OQUELI, LEADER OF THE NATIONAL REVOLUTIONARY MOVEMENT
(MNR) AND HILDA FLORES, A GUATEMALAN SOCIAL DEMOCRATIC
LEADER(RvictimF1) WERE ABDUCTED(EkidnappingAkidnapping1) AND KILLED IN
JANUARY(RtimeF1) BY UNIDENTIFIED INDIVIDUALS(RperpetratorF1) IN
GUATEMALA CITY(RlocationF1) AS THEY WERE HEADING TO THE LA
AURORA AIRPORT.</p>
        <p>Assume that there exists another sentence concerning the same event but with the
new fillers for the victim and perpetrator roles:</p>
        <p>IT TURNED OUT THAT POLITICIANS(RvictimF2) WERE
KIDNAPPED(EkidnappingAkidnapping2) BY URBAN TERRORISTS OF FARABUNDO
MARTI NATIONAL LIBERATION FRONT(RperpetratorF2).</p>
        <p>After decorating the two sentences we are to check, whether two pairs:
EkidnappingAkidnapping1 and EkidnappingAkidnapping2 concern the same event. We will show
how to approach this issue in section 2.3.</p>
        <p>
          We are motivated by VerbNet (VN) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] thematic/semantic role methodology.
VerbNet verb classes are organized according to the syntactic behavior of verbs.
VerbNet uses 109 verb classes and 29 semantic role labels for arguments of the
&lt;sub, pred, obj&gt; triple pattern (which resembles our atomic formulae). We adhere to
VerbNet semantics rather than to ontologies, because we are not aware of any
publicly available ontology with adequate expressive power and rich verbalization of classes
(ontological entities). We are in the process of using our CATIE ontology for the
general extraction of facts from MUC-4 corpus [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>We are interested in such event specifying verbs as: kidnap, abduct, seize (VN
index/vn/steal-10.5.php#steal-10.5; sense number 3: take or capture by force or
authority) belonging to class steal-10.5. However, instead of a role Agent [+animate |
+organization] we need a role Agent/Patient [+person | +a group of persons |
+organization]. In Unified Verb Index collection (VerbNet generalization) the word
capture belonging to class steal-10.5.1
(http://verbs.colorado.edu/verbindex/wn/wordnet.cgi?v3-0.capture.1.capture-2:36:00#1) apparently has not been
assigned a meaning kidnap.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Basic rules for identifying thematic roles</title>
        <p>The next type of rules (besides the earlier described atomic formulae that represent
facts) says that as the direct anchors we use all the interesting verbs (Ckidnapping) in the
past tense forms. Using a special function that retrieves a predicate of a given triple,
namely predicate_of(&lt;s,p,o&gt;) = p, we denote such rules as triples of the form:
&lt;predicate_of(&lt;s,p,o&gt;), tense_of, “Past”&gt;. We assume that tense_of is a built-in predicate
representing verb tenses, i.e. “Past” and “Past Participle”. Another built-in predicate,
named voice_of, represents voice of a verb phrase, namely “active_voice” and
“passive_voice”. The third built-in predicate, named plays, represents a fact concerning
the deduced thematic role of a subject and an object of some triple (as it was assumed
we only consider the agentive role (a perpetrator) and the patientive (beneficiary) role
 a victim).</p>
        <p>Now we are ready to give the rules to identify thematic roles of a predicate given in
the past tense form. We are concerned with predicates expressed by verbs being
members of a Ckidnapping semantic class.</p>
        <p>The first rule states that for a given triple if its predicate is in the past tense and in
the active voice then the subject plays the agentive thematic role of a perpetrator
while the object plays the patientive thematic role of a victim (a kind of a
person_target). The rule (1) is as follows:
&lt;predicate_of(&lt;sub,pred,obj&gt;), tense_of, “Past”&gt; 
&lt;predicate_of(&lt;sub,pred,obj&gt;), voice_of, “active_voice”&gt; 
&lt;sub, plays, “agentive_role”&gt; &lt;obj, plays, “beneficiary_role”&gt;</p>
        <p>The second (2) rule differs in the voice specification only that influences the order
of the atomic formulae in the conclusion. The rule is as follows:
(1)
(2)</p>
      </sec>
      <sec id="sec-3-3">
        <title>Rules for event identification</title>
        <p>In many cases information about certain roles and events is included in several
sentences. Thus, matching different phrases to one thematic role constitutes one of a key
tasks. We define a set of rules to identify such cases and eventually we either unify
different events or differentiate them (the are_different predicate). One of these rules
bases on two sentences with a verb phrases denoted as two pairs containing an event
and an anchor, En1Am1, En2Am2. Each of these sentences contains a phrase that
represents a filler of the same role, namely Rp1Fk1, Rp1Fk2. To activate such a rule we need
to find at least two sentences with these role fillers and event anchors. If we happen to
find more than two sentences of such a kind, we need to analyze them in pairs. To
describe such a rule, we need to define two predicates. The “belongs_to” predicate is
used if a given phrase belongs to a certain semantic class (this means that the main
word in the phrase is a member of the considered class). The “is_equal_to” predicate
decides whether either two semantic classes contain the same set of elements or role
fillers are syntactically equivalent.</p>
        <p>The process of analysis starts with searching of described pair of sentences. Let us
denote the anchor and the role filler that were found in the first sentence as R1F1 and
E1A1, and the anchor and the role filler found in the second sentence as R1F2 and
E2A1. Once we have found these pairs we need to decide whether the described event
anchors belong to the same semantic class (denoted as C1). This is formalized as:
&lt;E1A1, belongs_to, C1&gt;  &lt;E2A1, belongs_to, C1&gt;.</p>
        <p>&lt;R1F1, belongs_to, C2&gt;  &lt;R1F2, belongs_to, C3&gt;  &lt;C2, ?rel, C3&gt;
&lt;C2, is_equal_to, C3&gt;  simsyn(R1F1,R1F2)

are_the_same (E1,E2)  (R1F1, ?rel, R2F2)
However, if role fillers belong to the same class, but are different or role fillers have
different syntactic properties, it is necessary to classify two events as different (4):
(&lt;R1F1, belongs_to, C4&gt;&lt;R1F2, belongs_to, C4&gt;&lt;R1F1, is_equal_to, R1F2&gt;)
simsyn(R1F1,R1F2)) 
are_different(E1, E2). (4)
We illustrate that rule with the following examples.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Example 1</title>
        <sec id="sec-3-4-1">
          <title>There are two consecutive sentences in the message:</title>
        </sec>
        <sec id="sec-3-4-2">
          <title>1) John Smith (RvictimF1) has been kidnapped (Ekidnapping1A1).</title>
          <p>
            2) President (RvictimF2) was taken hostage (Ekidnapping2A2) by unknown perpetrators.
The preemptive constraints are:
&lt;”kidnap”, belongs_to, Ckidnapping&gt;  &lt;”take_hostage”, belongs_to, Ckidnapping&gt;.
The following rule activation captures lexical associations between two neighboring
sentences by pairing as similar each noun in the role of a victim (person_target). This
is similar to lexical bridge features used in [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. The rule for those sentences goes as
following:
This basic condition should be considered as preemptive and its result decides if we
are going to consider a pair of sentences as worth of executing this rule on.
          </p>
          <p>The second part of the analysis starts with determining if role fillers belong to
classes that are different, but there exists some relation between those classes.
Furthermore we need to check if role fillers have the same syntactic properties. If those
conditions are true, we can assume that phrases describe the same event. Additionally,
there exists some relation among semantic classes, which may also be projected on
role fillers (in particular it may be a subsumption). Let us formalize these
considerations in the form of rule (3). In this rule, we mark “some relation” as a variable “?rel”.
(3)
&lt;”John Smith”, belongs_to, CPerson&gt;  &lt;”President”, belongs_to, CPolitician&gt; 
&lt;CPerson, represents, CPolitician&gt;   &lt;”John Smith”, is_equal_to, “President”&gt; 
simsyn(“President”, “John Smith”)

are_the_same(Ekidnapping1,Ekidnapping2).</p>
        </sec>
        <sec id="sec-3-4-3">
          <title>As the result we obtain a fact (an atomic formula) of the form:</title>
          <p>&lt;“John Smith”, represents, “President”&gt;.</p>
          <p>The confidence of this rule could be measured in distance between the considered
sentences (thus the distance is measured in the number of sentences). In particular this
rule may be used only to analyze consecutive sentences.</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>Example 2</title>
        <sec id="sec-3-5-1">
          <title>We have three sentences, not necessarily in one document. 1. 2. 3.</title>
          <p>Ricardo Alfonso Castellar, mayor of Achi,(RvictimF1)who was
kidnapped(Ekidnapping1A1) on 5 January, apparently by Army Of National Liberation
guerillas, was found dead.</p>
          <p>Castellar(RvictimF2)was kidnapped(Ekidnapping2A1) by a group of armed men.
A politician condemned kidnapping(Ekidnapping3A1) of mayor of Achi(RvictimF3).</p>
          <p>In this case we need to process sentences in pairs. First, we take sentences 1 and 2.
We execute the rule and as a result we get the unification of Ekidnapping1 and Ekidnapping2.
This means that unification of Ekidnapping3 event, with both of the previous events would
be redundant and we just need to clarify if Ekidnapping3 could be unified with any of
those events. However, if Ekidnapping1and Ekidnapping2 would not be unified, all events
need to be compared separately. In this case we get three fillers of the victim role,
furthermore the relation between those fillers is quite specific. That relation could be
marked as “is_substring_of”. The left-hand side argument of this relation is always
less expressive then its right-hand side and thus we could find the most expressive
filler – “Ricardo Alfonso Castellar, mayor of Achi”.</p>
          <p>
            Our method of unification is conceptually more powerful than the so far used for
coreference resolution (for example in [
            <xref ref-type="bibr" rid="ref11 ref9">11, 9</xref>
            ]). But so far it is used only for
establishing the agreement of semantic classes and also the noun-pronoun agreement features,
that means features 2-3 and 8 out of 12 features proposed in [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ].
3.4
          </p>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>Additional lexical rules</title>
        <p>The examples shown in the previous subsection illustrate the need for rules that go
beyond search of sentences with verb phrases corresponding to event related semantic
class. To make the task of identifying event easier for the annotators, it is necessary to
use the secondary semantic class containing words that are in a fuzzy relation to the
core event term. We introduce a class:
Cfuzzy_kidnapping = {disappear, release}
Following the Automatic Content Extraction (ACE) Programme guidelines:
An event trigger refers to the term within the event mention that most clearly
expresses the occurrence of the event instance and is based on direct anchor –
corresponds to Ckidnapping.</p>
        <p>
          An event mention refers to the sentence within which an event instance is reported –
corresponds to Cfuzzy_kidnapping. An event can have multiple mentions associated with it.
Apart from the sentence that initially reports the event, other coreferring sentences
that contain anaphors of events (such as pronouns and definite descriptions of
previously mentioned events) are taggable mentions of that event [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>In general there always exists a direct connection between roles of events
corresponding to C1 and Cfuzzy_1. For example a victim of kidnapping directly corresponds
to a subject of releasement or disappearance. To measure the confidence of fuzzy
classes we look at the statistics of all words/stems in various part-of-speech forms,
which directly or indirectly could indicate an event of kidnapping. They are words
corresponding to Ckidnapping and Cfuzzy_kidnapping classes  verbs for kidnap (heads of verb
phrases) in the past tense or attributive kidnapped, verbs in the past tense, verbs
(infinitve, -ing form for a verb, gerund), nouns related to an act of kidnapping or a
perpetrator, namely:
kidnap, kidnapping, kidnapped, kidnapper
stem seiz, seized, seizing,
abduct, abducted, abducting,
stem captur, capturing, captured,
intercept, intercepting, intercepted,
stem releas, released, releasing,
disappear, disappeared, disappearing,
take/hold hostage.</p>
        <p>Finally, we apply coreference rules for both Cfuzzy_kidnapping and Ckidnapping semantic
classes.</p>
      </sec>
      <sec id="sec-3-7">
        <title>Example 3:</title>
        <p>1.
2.</p>
        <p>Ricardo Alfonso Castellar(RvictimF1), mayor of Achi, was released(E1A1) on 15
January.</p>
        <p>Kidnapping(E2A1) of Castellar(RobjectF1) was a brutal act.</p>
        <p>Even though events E1 and E2 belong to different semantic classes we can unify
specific role fillers within those events.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Word Statistics Tool</title>
      <p>The process of designing pattern-based linguistic rules is a very tedious work, what
constitutes the main disadvantage of such methods. To alleviate a burden we
implemented a MUC Word Statistics Analyzer (Figure 1). The tool realizes several useful
functions:
1) it presents graphically statistics of words across a document or a corpus
2) and it displays in two separate panels fragments of text pertaining to this
statistics.</p>
      <p>The considered in the paper extraction method relies on the quality of verb
argument’s typing (semantic classes). To obtain good results concerning the extensions of
semantic classes Ckidnapping and Cfuzzy_kidnapping we designed and implemented a statistic
tool. It estimates the frequency of words (exactly, their stems) occurrences in the
message or in the whole corpus. The tool also enables the analysis of sentences (or
message) across which the stems appear. In the upper right corner of the screen given in
Figure1 the histogram is located that depicts the number of a word (stem) occurrences
in the message and in the sentence. The exemplary message is shown in the lower left
corner. In the bottom panel the list of sentences is located in which the stems with
different endings appear, for example: a stem kidnap, end words kidnapped,
kidnapper or kidnapping.</p>
      <p>Summing up, by the quick inspection of the frequency of appearance of words and
their correlation and varying the trigger term lists we can assess effectiveness of
linguistic features.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>
        For many years these results have not been significantly improved. Only recently a
significant progress [
        <xref ref-type="bibr" rid="ref11 ref9">9,11</xref>
        ] has been made.
      </p>
      <p>
        There appear 159 events resolved as kidnappings out of 1700 documents as a result
of assessment of the MUC-4 community [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>We define the following numbers or word occurrences:
X1: at least a single occurrence of words from Ckidnapping or Cfuzzy_kidnapping</p>
      <sec id="sec-5-1">
        <title>X2: only from Ckidnapping at least once</title>
      </sec>
      <sec id="sec-5-2">
        <title>X3: only from Cfuzzy_kidnapping at least once</title>
        <p>X4: from Ckidnapping at least once and from Cfuzzy_kidnapping at least once together</p>
      </sec>
      <sec id="sec-5-3">
        <title>X5: only from Ckidnapping ending with –ed at least once</title>
      </sec>
      <sec id="sec-5-4">
        <title>X6: only from Cfuzzy_kidnapping ending with –ed at least once</title>
        <p>X7: as in X1 from Ckidnapping at least once and from Cfuzzy_kidnapping at least once,
together ending with –ed</p>
      </sec>
      <sec id="sec-5-5">
        <title>X8: only from {kidnap} set</title>
      </sec>
      <sec id="sec-5-6">
        <title>X9: only kidnapped Y1- Y9: occurrence as for X but for the set of documents that do not belong to a kidnapping event. 10</title>
        <p>In Table 3 the meaning of symbols is the following: EN= event name (e.g.
kidnapping, crime, etc.) – there are anchors, in all other patterns VP are anchors, NP = noun
phrase, VP = verb phrase, PVP = passive verb phrase, AdjP=adjective phrase, PP=
prepositional phrase starting with specific prepositions, Pron= noun phrase
represented by a pronoun, Perp=perpetrator.</p>
        <p>
          Effectiveness of our system is due to several factors:
 Our patters are mostly triples, whether most previous works were based on
syntax patterns consisting of 2 elements, see e.g Fig. 1 of [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
 Non-triple patterns are more likely to generate extraction of nonrelevant
patterns. For a pattern to be relevant we need to have at least either of two:
location, date sentence part (first sought in a simple sentence, then in the
complex sentence, and finally in adjacent sentences.
 One of the main contributions of this work is the introduction of VP(S) =
supplementary verb phrase (particularly effective involving NP=EN are:
take place, claim responsibility, be responsible for, carry out. To a lesser
degree this helps to identify perpetrators and victims.
        </p>
        <p>The correctness of extraction in this paper is providing all of the following
kidnapping event roles (recall): perpetrator individuals, perpetrator organizations,
human_target/victim, location and date. These roles are narrower than 24 slots of the
MUC-4 contest.</p>
        <p>Table 4 presents the recall for the kidnapping events (here the same events in
different documents are counted separately, similarly as for MUC-4 evaluation).</p>
        <p>
          The recall numbers are significantly higher than in the MUC-4 contest (where the
best contribution achieved around 60% for both precision and recall), but achieved for
the easier task and for only one type of a terrorism event. They are also higher than in
Table 3 of [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>The system is presented at http://draco.kari.put.poznan.pl/ruleml2013_Extraction.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>
        The recent wave of methods [
        <xref ref-type="bibr" rid="ref11 ref3 ref4 ref8 ref9">11,9,8,3,4</xref>
        ] is capable of significant improvement of
extraction measures. The MUC Conferences provided benchmarks that decrease
arbitrariness of a given method evaluation. For example open extraction system ReVerb
gives a good precision but a poor recall [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We plan to apply against the full MUC-4
benchmark. The MUC Word Statistics Analyzer would be helpful for this task. There
are improvement possibilities in using the probable better syntax parser, Named
Entity Recognition and using a wider set of coreference comparison.
      </p>
      <p>Our choice of anchor words can be more optimal. In general, our patterns
presented in Table 3 are more compatible with ontology-driven extraction than purely
linguistic methods. Rather than use one general dictionary as used by most MUC
related works, we can have lexicalization specific to ontology element. We are
working in this direction.</p>
      <p>Acknowledgement. This work was supported by the Polish National Centre for
Research and Development (NCBR) No O ROB 0025 01 and DS 45-085/13 and DS-PB
grants. We would like to thank Prof. Ellen Riloff for making Sundance and AutoSlog
tools available to us, and Bartosz Zaremba for calculating some statistics.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bonial</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corvey</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petukhova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bunt</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>A Hierarchical Unification of LIRICS and VerbNet Semantic Roles</article-title>
          .
          <source>Proceedings of the ICSC Workshop on Semantic Annotation for Computational Linguistic Resources (SACL-ICSC</source>
          <year>2011</year>
          ), Sep,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Etzioni</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banko</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soderland</surname>
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Weld</surname>
            <given-names>D. S.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Open information extraction from the web</article-title>
          .
          <source>Commun. ACM</source>
          <volume>51</volume>
          ,
          <issue>12</issue>
          (
          <year>December 2008</year>
          ),
          <fpage>68</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Etzioni</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fader</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christensen</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soderland</surname>
            <given-names>S.</given-names>
          </string-name>
          , and Mausam: Open Information Extraction:
          <article-title>The Second Generation</article-title>
          .
          <source>IJCAI</source>
          <year>2011</year>
          :
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Riloff</surname>
          </string-name>
          , E.:
          <article-title>Multi-faceted Event Recognition with Bootstrapped Dictionaries, Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT</article-title>
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Riloff</surname>
          </string-name>
          , E.:
          <article-title>Modeling Textual Cohesion for Event Extraction</article-title>
          ,
          <source>Proceedings of the 26th Conference on Artificial Intelligence (AAAI</source>
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Jedrzejek</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cybulka</surname>
            <given-names>J.,</given-names>
          </string-name>
          <article-title>CATIE ontology for the MUC-4 events extraction</article-title>
          ,
          <source>in progress.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lehnert</surname>
            ,
            <given-names>W.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cardie</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fisher</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCarthy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riloff</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Evaluating Information Extraction System, submitted to (
          <source>Journal of Integrated Computer-Aided Engineering)</source>
          ,
          <volume>1</volume>
          (
          <issue>6</issue>
          ), (
          <year>1995</year>
          ), pp.
          <fpage>453</fpage>
          -
          <lpage>472</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nakashole</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suchanek</surname>
            <given-names>F. M.:</given-names>
          </string-name>
          <article-title>PATTY: A Taxonomy of Relational Patterns with Semantic Types</article-title>
          . EMNLP-CoNLL
          <year>2012</year>
          :
          <fpage>1135</fpage>
          -
          <lpage>1145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Naughton</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sentence-Level Event</surname>
            Detection and
            <given-names>Coreference</given-names>
          </string-name>
          <string-name>
            <surname>Resolution</surname>
          </string-name>
          . School of Computer Science and Informatics, University College Dublin,
          <source>PhD Thesis: October</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <source>Proceedings of the 4th Conference on Message Understanding, MUC</source>
          <year>1992</year>
          ,
          <article-title>McLean, Virginia</article-title>
          , USA, June 16-18,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Soon</surname>
            ,
            <given-names>W. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng H. T.</surname>
          </string-name>
          , and
          <string-name>
            <surname>Lim</surname>
            <given-names>D. C. Y.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Learning approach to coreference resolution of noun phrases</article-title>
          .
          <source>Computational Linguistics</source>
          <volume>27</volume>
          (
          <issue>4</issue>
          ),
          <fpage>521</fpage>
          -
          <lpage>544</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Riloff</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Phillips</surname>
            <given-names>M..</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>An Introduction to the Sundance</article-title>
          and
          <source>AutoSlog Systems Technical Report UUCS-04-015</source>
          , School of Computing, University of Utah, http://www.cs.utah.edu/~riloff/pdfs/official-sundance-tr.
          <source>pdf.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Patwardhan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Riloff</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2006</year>
          )
          <article-title>"Learning Domain-Specific Information Extraction Patterns from the Web"</article-title>
          ,
          <source>ACL 2006 Workshop on Information Extraction Beyond the Document.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>