<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Knowledge-based highly-specialized terrorist event extraction</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jakub</forename><surname>Dutkiewicz</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Control and Information Engineering</orgName>
								<orgName type="institution">Poznan University of Technology</orgName>
								<address>
									<addrLine>Pl. M. Skłodowskiej-Curie 5</addrLine>
									<postCode>60-965</postCode>
									<settlement>Poznań</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Czesław</forename><surname>Jędrzejek</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Control and Information Engineering</orgName>
								<orgName type="institution">Poznan University of Technology</orgName>
								<address>
									<addrLine>Pl. M. Skłodowskiej-Curie 5</addrLine>
									<postCode>60-965</postCode>
									<settlement>Poznań</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jolanta</forename><surname>Cybulka</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Control and Information Engineering</orgName>
								<orgName type="institution">Poznan University of Technology</orgName>
								<address>
									<addrLine>Pl. M. Skłodowskiej-Curie 5</addrLine>
									<postCode>60-965</postCode>
									<settlement>Poznań</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Maciej</forename><surname>Falkowski</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Control and Information Engineering</orgName>
								<orgName type="institution">Poznan University of Technology</orgName>
								<address>
									<addrLine>Pl. M. Skłodowskiej-Curie 5</addrLine>
									<postCode>60-965</postCode>
									<settlement>Poznań</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Knowledge-based highly-specialized terrorist event extraction</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">220E056CFE48ECD635E2267A786A27F8</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T01:07+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>knowledge-based information extraction</term>
					<term>semantic roles</term>
					<term>terrorist event discovery</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper we present a prototype of a system aimed at event extraction using linguistic patterns with semantic classes. The process is aided with an auxiliary tool for mapping verb statistics across messages. The sentence analyzer uses linguistic associations, based on VerbNet across the message and between messages' sentences to select semantic role fillers. We restrict ourselves to the coverage of one event type onlynamely a kidnapping  and to two events template slots (semantic roles): a perpetrator and a person_target (a human target). We designed rules involving semantic role filling using previous works on coreference. We used the Sundance parser and AutoSlog extraction patterns generator. Then we applied the semantic role filler and event resolution tool SRL Master. Our approach yields high performance on the MUC-4 data set.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Event extraction is one of the most important tasks of knowledge discovery. It may be regarded as the core of knowledge-based systems that aim at providing the public (people, organizations, government agenda etc.) with condensed and filtered information concerning events. These events are described in texts written in natural language, thus the posted problem is related to the issue of information extraction (IE). Particularly, the task is to extract data concerning the described action (the event) and its arguments (called event roles). To implement the considered task different approaches are applied. They can be classified according to the provenance of the approach (pattern-based linguistic ones vs. classifier-based (statistical) methods) or to the 'openness' of it (fully open extraction vs. trained with the use of corpora one). The next important classification criterion is the nature of the context of the extraction, namely locality (one sentence only) or a larger context that takes into account consecutive sentences (a discourse). In many cases the hybrid methods are used that combine the different approaches. The open extraction systems (operating across one sentence context) scale well to the open corpora <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b2">3]</ref>, especially that acquitted from the Web. But the most accurate IE systems are domain-specific, that use linguistic patterns and are somewhat trained with the aid of statistics. Our work follows the latter approach in that we use a training domain-specific corpus. Let us characterize it briefly.</p><p>Due to a series of DARPA Message Understanding Conferences (MUCs), significant progress in pattern-based (NLP based) extraction technologies has been achieved. In this work we capitalise on the results of MUC-3 and MUC-4 ( <ref type="bibr" target="#b9">[10]</ref> that were held in 1991-1992) conferences, which used news reports corpus (MUC corpus) on terrorist activities in Latin America. MUC Conferences developed standards for evaluation, e.g. the adoption of metrics like precision and recall.</p><p>The goal of MUC was to extract from texts an information concerning 7 classes of terrorist events: Attack, Kidnapping, Hijacking, Bombing, Arson, Robbery and Forced Work Stoppage, plus several variations on each (for accomplished, threatened and attempted incidents). The process of extraction was augmented by the knowledge frames (event templates) generation. Every such template consisted of 24 attributesslots. A document (a multi-sentenced message concerning an event) could be labeled with more than one template type. The MUC-4 corpus consists of 1700 documents, from which 1300 (DEV) were used in MUC-4 for training, 200 documents (TST1+TST2) were used as a tuning set, and the last 200 documents (TST3+TST4) were applied as the test set. The resulting knowledge base frames are called "key templates". We filter out messages concerning one event type only, namely the kidnapping. Also, from among 24 slots we consider the two of them: a perpetrator and a person_target.</p><p>The main contributions of the presented paper are:  a method of comparing events to check whether a given two events are in fact identical or whether they are different, on the basis of semantic typing (semantic classes) of event's arguments; it relies on using several types of rules, namely atomic, filling thematic role rules and whole events comparing rules; the method may be also used in coreference resolution  an implementation of a corpus crawling tool that looks for words/phrases that lexicalize the kidnapping event  additional lexical rules related to identification of victims and perpetrators.</p><p>The paper is organised as follows. Section 2 contains some notes concerning related works. In Section 3 our extraction method is presented. Section 4 describes a prototype implementation of the Word-statistics tool and its use. Section 5 demonstrates our information extraction results. In section 6 we give the concluding remarks and mention on our future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Related works</head><p>The main drawback of open information extraction <ref type="bibr" target="#b2">[3]</ref> is that it uses the natural language features which do not classify (semantically type) arguments of an extracted relation. Additionally, in such methods the syntactic patterns (for example, regular expressions) do not match verb arguments that are distant from the verb phrase in a sentence. These are the features having the great negative impact on the ability to compare events (whether they are identical or not) described in the different sentences. In our work we avoid this drawback.</p><p>Authors of <ref type="bibr" target="#b3">[4]</ref> use the language resources (dictionaries) to obtain sets of words that are relevant to the semantic class (a type of a verb argument). Having such extensionally defined types (semantic classes) they use them in the extraction process. In this work it is also shown how to apply such classes in the process of events comparison.</p><p>The method of event's comparison is also described in <ref type="bibr" target="#b4">[5]</ref>. Here, the authors compare them (and extract their arguments) on the basis of head parts of noun phrases. For example, the events described in the following two sentences:</p><p>1) A customer in the store was shot by masked men.</p><p>2) The two men used 9mm semi-automatic pistols. are in fact the same due to the fact that they use the same word "men". In our approach the events may be unified (or differentiated) on the basis of the membership (non-membership) of two used ("linking") words to the same semantic class. Also, it is not known, which pairs of sentences should be analyzed according to the event (we describe this problem later on).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3</head><p>The extraction method</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Preliminary definitions</head><p>At first, let us give some definitions of the terms used in the paper. They are as follows.</p><p>Event (denoted by E n , where n stands for event's name) is an entity representing the event (conceptually it is an occurrent that plays the central role in some situation, which represents a state of affairs) described in the text. The event is connected with a syntactic phrase (a verb phrase) that helps to identify it in a sentence, which is called an anchor. Also, there are some participants in the event  we identify them via thematic roles that are arguments of an anchoring phrase. Anchor (marked as A k , where k stands for an anchor name) is a verb or a verb phrase, which appearance in a derivation (i.e. a syntactically parsed sentence) triggers the process of recognition of an event (such as, for example, the kidnapping).</p><p>Thematic role (a semantic role label, marked as R m , where m is a role name) is an entity representing an argument of a verb or a verb phrase (an anchor) denoting the event. For example, there may be such roles as Agent (in our considerations, a perpetrator), Patient (a victim), Instrument, Location, Time and others.</p><p>Role filler is a text phrase that instantiates a thematic role in the text (marked with the symbol R p F v , where p is a role name and v identifies a filler).</p><p>Syntactic similarity. Let us assume that the two argument function of syntactic similarity simsyn (W 1 , W 2 ), while given two words (or phrases) as arguments returns a binary value true or false. The function will return the true value if W 1 and W 2 have the same syntactic properties (i.e. number and gender), otherwise it returns false.</p><p>Semantic class (denoted by C s , where s is a class name) is defined as an entity that is expressed by all of its verbalizations. For example, the verbalizations of the semantic class concerning kidnapping are C kidnapping ={kidnap, seize, abduct, capture, intercept, take hostage}. It should be noted that we do not use all the meanings of the listed words, but only these fitting to a specific context.</p><p>Atomic formula is a triple of the form &lt;sub, pred, obj&gt;, where sub means the subject of the sentence (and semantically it may play a thematic role R m ), pred means the predicate (represents an event in terms of a certain semantic class C s ) and obj means the object (semantically playing a role R p ). An atomic formula could be considered as a rule representing a fact.</p><p>Let us illustrate the introduced notions with the exemplary message from DEV-MUC3-0018 (the text in this corpus is given in an upper case). We decorated the text with roles, role fillers, events and anchors. One of the considered sentences is: OQUELI, LEADER OF THE NATIONAL REVOLUTIONARY MOVEMENT (MNR) AND HILDA FLORES, A GUATEMALAN SOCIAL DEMOCRATIC LEADER(R victim F 1 ) WERE ABDUCTED(E kidnapping A kidnapping1 ) AND KILLED IN JANUARY(R time F 1 ) BY UNIDENTIFIED INDIVIDUALS(R perpetrator F 1 ) IN GUATEMALA CITY(R location F 1 ) AS THEY WERE HEADING TO THE LA AURORA AIRPORT.</p><p>Assume that there exists another sentence concerning the same event but with the new fillers for the victim and perpetrator roles:</p><formula xml:id="formula_0">IT TURNED OUT THAT POLITICIANS(R victim F 2 ) WERE KIDNAPPED(E kidnapping A kidnapping2 ) BY URBAN TERRORISTS OF FARABUNDO MARTI NATIONAL LIBERATION FRONT(R perpetrator F 2 ).</formula><p>After decorating the two sentences we are to check, whether two pairs: E kidnapping A kidnapping1 and E kidnapping A kidnapping2 concern the same event. We will show how to approach this issue in section 2.3.</p><p>We are motivated by VerbNet (VN) <ref type="bibr" target="#b0">[1]</ref> thematic/semantic role methodology. VerbNet verb classes are organized according to the syntactic behavior of verbs. VerbNet uses 109 verb classes and 29 semantic role labels for arguments of the &lt;sub, pred, obj&gt; triple pattern (which resembles our atomic formulae). We adhere to VerbNet semantics rather than to ontologies, because we are not aware of any publicly available ontology with adequate expressive power and rich verbalization of classes (ontological entities). We are in the process of using our CATIE ontology for the general extraction of facts from MUC-4 corpus <ref type="bibr" target="#b5">[6]</ref>.</p><p>We are interested in such event specifying verbs as: kidnap, abduct, seize (VN index/vn/steal-10.5.php#steal-10.5; sense number 3: take or capture by force or authority) belonging to class steal-10.5. However, instead of a role Agent <ref type="bibr">[</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Basic rules for identifying thematic roles</head><p>The next type of rules (besides the earlier described atomic formulae that represent facts) says that as the direct anchors we use all the interesting verbs (C kidnapping ) in the past tense forms. Using a special function that retrieves a predicate of a given triple, namely predicate_of(&lt;s,p,o&gt;) = p, we denote such rules as triples of the form: &lt;predi-cate_of(&lt;s,p,o&gt;), tense_of, "Past"&gt;. We assume that tense_of is a built-in predicate representing verb tenses, i.e. "Past" and "Past Participle". Another built-in predicate, named voice_of, represents voice of a verb phrase, namely "active_voice" and "pas-sive_voice". The third built-in predicate, named plays, represents a fact concerning the deduced thematic role of a subject and an object of some triple (as it was assumed we only consider the agentive role (a perpetrator) and the patientive (beneficiary) role  a victim). Now we are ready to give the rules to identify thematic roles of a predicate given in the past tense form. We are concerned with predicates expressed by verbs being members of a C kidnapping semantic class.</p><p>The first rule states that for a given triple if its predicate is in the past tense and in the active voice then the subject plays the agentive thematic role of a perpetrator while the object plays the patientive thematic role of a victim (a kind of a per-son_target). The rule (1) is as follows: &lt;predicate_of(&lt;sub,pred,obj&gt;), tense_of, "Past"&gt;  &lt;predicate_of(&lt;sub,pred,obj&gt;), voice_of, "active_voice"&gt;  &lt;sub, plays, "agentive_role"&gt; &lt;obj, plays, "beneficiary_role"&gt;</p><p>The second (2) rule differs in the voice specification only that influences the order of the atomic formulae in the conclusion. The rule is as follows: &lt;predicate_of(&lt;sub,pred,obj&gt;), tense_of, "Past"&gt;  &lt;predicate_of(&lt;sub,pred,obj&gt;), voice_of, "passive_voice"&gt;  &lt;sub, plays, " beneficiary_role"&gt;  &lt;obj, plays, " agentive_role"&gt;.</p><p>(2)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Rules for event identification</head><p>In many cases information about certain roles and events is included in several sentences. Thus, matching different phrases to one thematic role constitutes one of a key tasks. We define a set of rules to identify such cases and eventually we either unify different events or differentiate them (the are_different predicate). One of these rules bases on two sentences with a verb phrases denoted as two pairs containing an event and an anchor, E n1 A m1 , E n2 A m2 . Each of these sentences contains a phrase that represents a filler of the same role, namely R p1 F k1 , R p1 F k2 . To activate such a rule we need to find at least two sentences with these role fillers and event anchors. If we happen to find more than two sentences of such a kind, we need to analyze them in pairs. To describe such a rule, we need to define two predicates. The "belongs_to" predicate is used if a given phrase belongs to a certain semantic class (this means that the main word in the phrase is a member of the considered class). The "is_equal_to" predicate decides whether either two semantic classes contain the same set of elements or role fillers are syntactically equivalent.</p><p>The process of analysis starts with searching of described pair of sentences. Let us denote the anchor and the role filler that were found in the first sentence as R 1 F 1 and E 1 A 1 , and the anchor and the role filler found in the second sentence as R 1 F 2 and E 2 A 1 . Once we have found these pairs we need to decide whether the described event anchors belong to the same semantic class (denoted as C 1 ). This is formalized as:</p><formula xml:id="formula_2">&lt;E 1 A 1 , belongs_to, C 1 &gt;  &lt;E 2 A 1 , belongs_to, C 1 &gt;.</formula><p>This basic condition should be considered as preemptive and its result decides if we are going to consider a pair of sentences as worth of executing this rule on.</p><p>The second part of the analysis starts with determining if role fillers belong to classes that are different, but there exists some relation between those classes. Furthermore we need to check if role fillers have the same syntactic properties. If those conditions are true, we can assume that phrases describe the same event. Additionally, there exists some relation among semantic classes, which may also be projected on role fillers (in particular it may be a subsumption). Let us formalize these considerations in the form of rule <ref type="bibr" target="#b2">(3)</ref>. In this rule, we mark "some relation" as a variable "?rel".</p><formula xml:id="formula_3">&lt;R 1 F 1 , belongs_to, C 2 &gt;  &lt;R 1 F 2 , belongs_to, C 3 &gt;  &lt;C 2 , ?rel, C 3 &gt; &lt;C 2 , is_equal_to, C 3 &gt;  simsyn(R 1 F 1 ,R 1 F 2 )  are_the_same (E 1 ,E 2 )  (R 1 F 1 , ?rel, R 2 F 2 )<label>(3)</label></formula><p>However, if role fillers belong to the same class, but are different or role fillers have different syntactic properties, it is necessary to classify two events as different ( <ref type="formula" target="#formula_4">4</ref>):</p><formula xml:id="formula_4">(&lt;R 1 F 1, belongs_to, C 4 &gt;&lt;R 1 F2, belongs_to, C 4 &gt;&lt;R 1 F 1 , is_equal_to, R 1 F 2 &gt;) simsyn(R 1 F 1 ,R 1 F 2 ))  are_different(E 1 , E 2 ). (<label>4</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>) We illustrate that rule with the following examples.</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 1</head><p>There are two consecutive sentences in the message: 1) John Smith (R victim F 1 ) has been kidnapped (E kidnapping1 A 1 ).</p><p>2) President (R victim F 2 ) was taken hostage (E kidnapping2 A 2 ) by unknown perpetrators.</p><p>The preemptive constraints are: &lt;"kidnap", belongs_to, C kidnapping &gt;  &lt;"take_hostage", belongs_to, C kidnapping &gt;.</p><p>The following rule activation captures lexical associations between two neighboring sentences by pairing as similar each noun in the role of a victim (person_target). This is similar to lexical bridge features used in <ref type="bibr" target="#b4">[5]</ref>. The rule for those sentences goes as following:</p><p>&lt;"John Smith", belongs_to, C Person &gt;  &lt;"President", belongs_to, C Politician &gt;  &lt;C Person , represents, C Politician &gt;   &lt;"John Smith", is_equal_to, "President"&gt;  simsyn("President", "John Smith")  are_the_same(E kidnapping1 ,E kidnapping2 ).</p><p>As the result we obtain a fact (an atomic formula) of the form: &lt;"John Smith", represents, "President"&gt;.</p><p>The confidence of this rule could be measured in distance between the considered sentences (thus the distance is measured in the number of sentences). In particular this rule may be used only to analyze consecutive sentences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 2</head><p>We have three sentences, not necessarily in one document.</p><p>1. Ricardo Alfonso Castellar, mayor of Achi,(R victim F 1 )who was kidnapped(E kidnapping1 A 1 ) on 5 January, apparently by Army Of National Liberation guerillas, was found dead.</p><p>2. Castellar(R victim F 2 )was kidnapped(E kidnapping2 A 1 ) by a group of armed men.</p><p>3. A politician condemned kidnapping(E kidnapping3 A 1 ) of mayor of Achi(R victim F 3 ).</p><p>In this case we need to process sentences in pairs. First, we take sentences 1 and 2. We execute the rule and as a result we get the unification of E kidnapping1 and E kidnapping2 . This means that unification of E kidnapping3 event, with both of the previous events would be redundant and we just need to clarify if E kidnapping3 could be unified with any of those events. However, if E kidnapping1 and E kidnapping2 would not be unified, all events need to be compared separately. In this case we get three fillers of the victim role, furthermore the relation between those fillers is quite specific. That relation could be marked as "is_substring_of". The left-hand side argument of this relation is always less expressive then its right-hand side and thus we could find the most expressive filler -"Ricardo Alfonso Castellar, mayor of Achi".</p><p>Our method of unification is conceptually more powerful than the so far used for coreference resolution (for example in <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b8">9]</ref>). But so far it is used only for establishing the agreement of semantic classes and also the noun-pronoun agreement features, that means features 2-3 and 8 out of 12 features proposed in <ref type="bibr" target="#b10">[11]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Additional lexical rules</head><p>The examples shown in the previous subsection illustrate the need for rules that go beyond search of sentences with verb phrases corresponding to event related semantic class. To make the task of identifying event easier for the annotators, it is necessary to use the secondary semantic class containing words that are in a fuzzy relation to the core event term. We introduce a class:</p><formula xml:id="formula_5">C fuzzy_kidnapping = {disappear, release}</formula><p>Following the Automatic Content Extraction (ACE) Programme guidelines: An event trigger refers to the term within the event mention that most clearly expresses the occurrence of the event instance and is based on direct anchorcorresponds to C kidnapping .</p><p>An event mention refers to the sentence within which an event instance is reportedcorresponds to C fuzzy_kidnapping . An event can have multiple mentions associated with it.</p><p>Apart from the sentence that initially reports the event, other coreferring sentences that contain anaphors of events (such as pronouns and definite descriptions of previously mentioned events) are taggable mentions of that event <ref type="bibr" target="#b8">[9]</ref>. In general there always exists a direct connection between roles of events corresponding to C 1 and C fuzzy_1. For example a victim of kidnapping directly corresponds to a subject of releasement or disappearance. To measure the confidence of fuzzy classes we look at the statistics of all words/stems in various part-of-speech forms, which directly or indirectly could indicate an event of kidnapping. They are words corresponding to C kidnapping and C fuzzy_kidnapping classes  verbs for kidnap (heads of verb phrases) in the past tense or attributive kidnapped, verbs in the past tense, verbs (infinitve, -ing form for a verb, gerund), nouns related to an act of kidnapping or a perpetrator, namely: kidnap, kidnapping, kidnapped, kidnapper stem seiz, seized, seizing, abduct, abducted, abducting, stem captur, capturing, captured, intercept, intercepting, intercepted, stem releas, released, releasing, disappear, disappeared, disappearing, take/hold hostage. Finally, we apply coreference rules for both C fuzzy_kidnapping and C kidnapping semantic classes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 3:</head><p>1. Ricardo Alfonso Castellar(R victim F 1 ), mayor of Achi, was released(E 1 A 1 ) on 15 January. 2. Kidnapping(E 2 A 1 ) of Castellar(R object F 1 ) was a brutal act.</p><p>Even though events E 1 and E 2 belong to different semantic classes we can unify specific role fillers within those events.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Word Statistics Tool</head><p>The process of designing pattern-based linguistic rules is a very tedious work, what constitutes the main disadvantage of such methods. To alleviate a burden we implemented a MUC Word Statistics Analyzer (Figure <ref type="figure" target="#fig_0">1</ref>). The tool realizes several useful functions:</p><p>1) it presents graphically statistics of words across a document or a corpus 2) and it displays in two separate panels fragments of text pertaining to this statistics.</p><p>The considered in the paper extraction method relies on the quality of verb argument's typing (semantic classes). To obtain good results concerning the extensions of semantic classes C kidnapping and C fuzzy_kidnapping we designed and implemented a statistic tool. It estimates the frequency of words (exactly, their stems) occurrences in the message or in the whole corpus. The tool also enables the analysis of sentences (or message) across which the stems appear. In the upper right corner of the screen given in Figure1 the histogram is located that depicts the number of a word (stem) occurrences in the message and in the sentence. The exemplary message is shown in the lower left corner. In the bottom panel the list of sentences is located in which the stems with different endings appear, for example: a stem kidnap, end words kidnapped, kidnapper or kidnapping. Summing up, by the quick inspection of the frequency of appearance of words and their correlation and varying the trigger term lists we can assess effectiveness of linguistic features.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Results</head><p>There are five overall IE related tasks that evolved from MUC.  Named entity (NE) aims to extract all instances of persons, organisations, locations, dates, times, percentages and monetary entities.  Coreference (CO) given a set of entities, this task aims to generate a set of entity coreference chains, such that mentions that coreference to the same entity appears in the same chain.  Template element (TE) aims to extract all entity attributes. As an example, for the entity mention \ Castellar ", the aim is to extract its name (\Ricardo Alfonso Castellar, "), type (\PERSON") and descriptor (\the mayor of Achi").  Template relation (TR) aims to extract all well-defined facts from each newswire text. In MUC-4 this was related to the knowledge frame (24 slots) of 8 terrorist type of events. In MUC-7 the facts were limited to relationships with organisations: employee of, product of and location of.  Scenario template (ST) aims to extract pre-specified event information from anywhere in the given text, and relate it to the particular organisation and person entities etc. involved in the event. The figures presented in this table are based on the performance levels of systems participating in the MUC evaluations. More detailed figures can be found in Table <ref type="table" target="#tab_1">1</ref>.  For many years these results have not been significantly improved. Only recently a significant progress <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b10">11]</ref> has been made.</p><p>There appear 159 events resolved as kidnappings out of 1700 documents as a result of assessment of the MUC-4 community <ref type="bibr" target="#b6">[7]</ref>.</p><p>We define the following numbers or word occurrences: The detailed analysis of these results will be presented at the Challenge event. We used the Sundance and AutoSlog systems for syntactic parsing and extraction patterns generation <ref type="bibr" target="#b11">[12]</ref> together with Name Entity Extraction (with slightly modified dictionaries). Then we applied the semantic role filler and event resolution tool SRL Master.</p><p>In Table <ref type="table">3</ref> the meaning of symbols is the following: EN= event name (e.g. kidnapping, crime, etc.)there are anchors, in all other patterns VP are anchors, NP = noun phrase, VP = verb phrase, PVP = passive verb phrase, AdjP=adjective phrase, PP= prepositional phrase starting with specific prepositions, Pron= noun phrase represented by a pronoun, Perp=perpetrator. Effectiveness of our system is due to several factors:  Our patters are mostly triples, whether most previous works were based on syntax patterns consisting of 2 elements, see e.g Fig. <ref type="figure" target="#fig_0">1</ref> of <ref type="bibr" target="#b12">[13]</ref>.  Non-triple patterns are more likely to generate extraction of nonrelevant patterns. For a pattern to be relevant we need to have at least either of two: location, date sentence part (first sought in a simple sentence, then in the complex sentence, and finally in adjacent sentences.  One of the main contributions of this work is the introduction of VP(S) = supplementary verb phrase (particularly effective involving NP=EN are: take place, claim responsibility, be responsible for, carry out. To a lesser degree this helps to identify perpetrators and victims.</p><p>The correctness of extraction in this paper is providing all of the following kidnapping event roles (recall): perpetrator individuals, perpetrator organizations, hu-man_target/victim, location and date. These roles are narrower than 24 slots of the MUC-4 contest.</p><p>Table <ref type="table" target="#tab_3">4</ref> presents the recall for the kidnapping events (here the same events in different documents are counted separately, similarly as for MUC-4 evaluation). The recall numbers are significantly higher than in the MUC-4 contest (where the best contribution achieved around 60% for both precision and recall), but achieved for the easier task and for only one type of a terrorism event. They are also higher than in Table <ref type="table">3</ref> of <ref type="bibr" target="#b4">[5]</ref>.</p><p>The system is presented at http://draco.kari.put.poznan.pl/ruleml2013_Extraction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusions</head><p>The recent wave of methods <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4]</ref> is capable of significant improvement of extraction measures. The MUC Conferences provided benchmarks that decrease arbitrariness of a given method evaluation. For example open extraction system ReVerb gives a good precision but a poor recall <ref type="bibr" target="#b2">[3]</ref>. We plan to apply against the full MUC-4 benchmark. The MUC Word Statistics Analyzer would be helpful for this task. There are improvement possibilities in using the probable better syntax parser, Named Entity Recognition and using a wider set of coreference comparison.</p><p>Our choice of anchor words can be more optimal. In general, our patterns presented in Table <ref type="table">3</ref> are more compatible with ontology-driven extraction than purely linguistic methods. Rather than use one general dictionary as used by most MUC related works, we can have lexicalization specific to ontology element. We are working in this direction.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 .</head><label>1</label><figDesc>Figure 1. A snapshot of the results of the MUC Word Statistics Analyzer.</figDesc><graphic coords="10,124.70,153.35,367.40,194.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 . MUC evaluation tasks</head><label>1</label><figDesc></figDesc><table><row><cell>Year</cell><cell>Evaluation</cell><cell></cell><cell></cell><cell>MUC Tasks</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell>NE</cell><cell>CO</cell><cell>TE</cell><cell>TR</cell><cell>ST</cell></row><row><cell>1991</cell><cell>MUC-3</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>F&lt; 58%</cell></row><row><cell>1992</cell><cell>MUC-4</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>F&lt;56% [9]</cell></row><row><cell>1995</cell><cell>MUC-7</cell><cell>F&lt; 94%</cell><cell>F&lt; 62%</cell><cell>F&lt; 87%</cell><cell>F &lt;76%</cell><cell>F&lt; 51%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 . Statistics of MUC evaluation tasks</head><label>2</label><figDesc></figDesc><table><row><cell>X1: at least a single occurrence of words from C kidnapping or C fuzzy_kidnapping</cell></row><row><cell>X2: only from C kidnapping at least once</cell></row><row><cell>X3: only from C fuzzy_kidnapping at least once</cell></row><row><cell>X4: from C kidnapping at least once and from C fuzzy_kidnapping at least once together</cell></row><row><cell>X5: only from C kidnapping ending with -ed at least once</cell></row><row><cell>X6: only from C fuzzy_kidnapping ending with -ed at least once</cell></row><row><cell>X7: as in X1 from C kidnapping at least once and from C fuzzy_kidnapping at least once, to-</cell></row><row><cell>gether ending with -ed</cell></row><row><cell>X8: only from {kidnap} set</cell></row><row><cell>X9: only kidnapped</cell></row><row><cell>Y1-Y9: occurrence as for X but for the set of documents that do not belong to a</cell></row><row><cell>kidnapping event.</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 : Recall for the kidnapping events for the MUC-4 development and test sets</head><label>4</label><figDesc></figDesc><table><row><cell cols="2">Recall Measure [per cent]</cell></row><row><cell>DEV set</cell><cell>TST sets</cell></row><row><cell>78</cell><cell>73</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgement. This work was supported by the Polish National Centre for Research and Development (NCBR) No O ROB 0025 01 and DS 45-085/13 and DS-PB grants. We would like to thank Prof. Ellen Riloff for making Sundance and AutoSlog tools available to us, and Bartosz Zaremba for calculating some statistics.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A Hierarchical Unification of LIRICS and VerbNet Semantic Roles</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bonial</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Corvey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Palmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Petukhova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Bunt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ICSC Workshop on Semantic Annotation for Computational Linguistic Resources (SACL-ICSC 2011)</title>
				<meeting>the ICSC Workshop on Semantic Annotation for Computational Linguistic Resources (SACL-ICSC 2011)</meeting>
		<imprint>
			<date type="published" when="2011-09">Sep, 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Open information extraction from the web</title>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Banko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soderland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Weld</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Commun. ACM</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="page" from="68" to="74" />
			<date type="published" when="2008-12">2008. December 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Mausam: Open Information Extraction: The Second Generation</title>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Fader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Christensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soderland</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IJCAI</title>
		<imprint>
			<biblScope unit="page" from="3" to="10" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Multi-faceted Event Recognition with Bootstrapped Dictionaries</title>
		<author>
			<persName><forename type="first">R</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Riloff</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT</title>
				<meeting>the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Modeling Textual Cohesion for Event Extraction</title>
		<author>
			<persName><forename type="first">R</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Riloff</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th Conference on Artificial Intelligence (AAAI</title>
				<meeting>the 26th Conference on Artificial Intelligence (AAAI</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Jedrzejek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cybulka</surname></persName>
		</author>
		<title level="m">CATIE ontology for the MUC-4 events extraction</title>
				<imprint/>
	</monogr>
	<note>in progress</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Evaluating Information Extraction System, submitted to</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">G</forename><surname>Lehnert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cardie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fisher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mccarthy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Riloff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soderland</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Integrated Computer-Aided Engineering)</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="453" to="472" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">PATTY: A Taxonomy of Relational Patterns with Semantic Types</title>
		<author>
			<persName><forename type="first">N</forename><surname>Nakashole</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Suchanek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EMNLP-CoNLL</title>
				<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="1135" to="1145" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Sentence-Level Event Detection and Coreference Resolution</title>
		<author>
			<persName><forename type="first">M</forename><surname>Naughton</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009-10">October 2009</date>
		</imprint>
		<respStmt>
			<orgName>School of Computer Science and Informatics, University College Dublin, PhD Thesis</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m">Proceedings of the 4th Conference on Message Understanding</title>
				<meeting>the 4th Conference on Message Understanding<address><addrLine>MUC; McLean, Virginia, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1992-06-16">1992. June 16-18, 1992</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Learning approach to coreference resolution of noun phrases</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M</forename><surname>Soon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">T</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">C Y</forename><surname>Lim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="521" to="544" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">An Introduction to the Sundance and AutoSlog Systems Technical Re</title>
		<author>
			<persName><forename type="first">E</forename><surname>Riloff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Phillips</surname></persName>
		</author>
		<ptr target="http://www.cs.utah.edu/~riloff/pdfs/official-sundance-tr.pdf" />
	</analytic>
	<monogr>
		<title level="m">port UUCS-04-015</title>
				<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
		<respStmt>
			<orgName>School of Computing, University of Utah</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Learning Domain-Specific Information Extraction Patterns from the Web</title>
		<author>
			<persName><forename type="first">S</forename><surname>Patwardhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Riloff</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACL 2006 Workshop on Information Extraction Beyond the Document</title>
				<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
