Semi-Automatic Generation of Linear Event Extraction
                     Patterns for Free Texts

                        © Daria Dzendzik                         Sergey Serebryakov
                              HP Labs                                  HP Labs
                      daria.dzendzik@hp.com                  sergey.serebryakov@hp.com


                       Abstract                              patterns for dependency-based parse tree (DPT) struc-
                                                             tures. In this paper we present our approach to semi-
    In this paper we describe a semi-automatic ap-           automatic generation of linear extraction patterns for an-
    proach to generating event extraction patterns           notation graphs while using DPT patterns in the process
    for free texts. The algorithm is composed of             of constructing the first ones.
    four steps: we automatically extract possible               The paper is structured as follows. In the next section
    events from a corpus of free documents, cluster          we briefly outline related work, in particular, we give an
    them using dependency-based parse tree paths,            overview of the papers that propose algorithms minimiz-
    validate random samples from each cluster and            ing human efforts required to build extraction patterns.
    generate linear patterns using positive event            In section 3 we describe in details four steps of the al-
    clusters. We compare our algorithm with the              gorithm. Section 4 presents the results of experiments.
    system that uses manually created patterns.              We conclude in section 5 by summarizing the results and
                                                             outlining current and future work.
1   Introduction
The event extraction (EE) task is formulated as “to au-      2   Related Work
tomatically identify the events in free texts and derive
detailed information about them, ideally identifying who     One of the most known algorithms that learn multi-slot
did what to whom, when, with what methods, where and         extraction patterns for free texts is Whisk [10]. It re-
why”. Events involve entities and relations between them     quires an annotated corpus of documents and syntactic
and imply a change of state. For instance, the sentence      parse data provided with it.
“Palm was acquired by Hewlett-Packard for $1.2 billion          The problem of learning conventional character-based
two days ago” mentions an event of the type Mergers          regular expressions from annotated corpora is described
& Acquisitions (M&A) with four arguments - acquirer          in [8]. The authors demonstrate the applicability of their
(Hewlett-Packard), acquiree (Palm), monetary expres-         method in extracting such structured entities as software
sion ($1.2 billion) and temporal expression (two days        names, URLs, phone and course numbers. It is still an
ago). There is also an event indicator (was acquired).       open question if their approach can be effectively applied
An event indicator is a word or a sequence of words that     for extracting higher level structures such as relations or
clearly signal about the possible presence of an event.      events.
    EE has been the subject of active research for more         Snowball [1, 3, 9] and ExDisco [3, 11] are the ex-
than twenty years since the series of MUC conferences        amples of algorithms for learning information extraction
started in 1987. Initially, EE task was limited to sev-      patterns from unannotated corpora. Snowball is a logi-
eral domains and had small-sized corpora at its disposal.    cal successor of DIRPE [4, 1, 3, 9] which was intended
Nowadays, with the rapid evolution of socially-oriented      for extracting binary relations from semi-structured web
Internet, it is becoming a crucial task for businesses to    data. Snowball also extracts binary relations but it is in-
analyze millions of documents with multilingual content      tended for free texts. The main idea is that there is a
each day with minimal latency in order to get insight and    small seed of known relations and these relations are to
provide critical decisions in time. We believe that mod-     be found in the text. After that the algorithm generalizes
ern and effective solutions to EE task would allow busi-     sentence context (creates a pattern). The next step is to
nesses to dramatically minimize the time required to start   find new relations within this context.
making use of new sources of information and reduce the         ExDisco uses a different approach to deal with the
operating costs of such systems.                             problem of absence of annotated corpora. It divides a
    One of the popular approaches to information extrac-     corpus into two parts (relevant and irrelevant documents)
tion (IE), and EE as a sub-problem of IE in particular,      and uses the following assumption: relevant documents
is in the use of domain specific extraction patterns such    contain relevant events and relevant events are present in
as linear extraction patterns for annotation graphs and      relevant documents. ExDisco uses a small seed of 2 or 3
                                                             known patterns to first divide documents, then it does the
Proceedings of the Ninth Spring Researcher’s Colloquium      syntactic analysis of every sentence and uses its results
on Database and Information Systems, Kazan, Russia, 2013     for pattern generation.
   As opposed to conventional IE from relatively small       an event is defined as the following triple:
annotated corpora, Open Information Extraction (OIE)
                                                                              E T, I, {AN       T m
                                                                                                      
deals with large Web-sized unannotated sets of docu-                                       i , Ai }i=1                 (1)
ments. The two most known information extraction sys-
tems that do OIE are KnowItAll [6, 5, 3] and TextRunner      where T is the event type, I is the type of event indica-
[12, 5, 3]. They rely upon syntactic information rather      tor,the event has m arguments and the i-th argument has
than annotated or any lexical data.                          a name AN               T
                                                                       i and type Ai .
                                                                 The core idea at the first step is to generate all possi-
                                                             ble events for every sentence in the corpus. To do it, we
3     Generating Linear Patterns                             have built the processing pipeline presented at fig. 1. The
                                                             primary task of this pipeline is to recognize instances of
The motivation behind this research is to minimize hu-
                                                             event indicators and named entities that might be argu-
man efforts in constructing event extraction patterns for
                                                             ments of events.
new types of events. There are three assumptions under-
lying our algorithm. The first one is the way how we de-
fine the verity of events. We define an event to be a true
(positive) event if its indicator and arguments linguisti-
cally represent a valid event mention no matter what po-
larity, modality etc. the event mention has. For instance,
in the sentence “Reuters announces that HP will not ac-
quire Palm” the pair {Reuters, announces} represents a
valid event mention of the type Company Announcement,
while {HP, announces} pair does not. On the other hand,
the triple {HP, Palm, will not acquire} represents a valid
event mention of the type M&A, while {Palm, Reuters,
will not acquire} triple does not.                                Figure 1: Pipeline that extracts possible events.
    The algorithm requires the documents to be anno-
tated with named entities that might be the arguments            The pipeline has a typical architecture intended for
of events. They include not only such named entities as      extracting named entities. Initially, it splits text into sen-
persons, companies, positions and event indicators, but      tences and tokens. After that, it uses Named Entity and
also temporal and monetary expressions. A complete set       Event Indicator Recognizers to extract named entities
of required named entities depends on the types of events    and event indicators. We use OpenNLP library to ex-
to be extracted. We use available NERs (in particular, we    tract named entities and a dictionary-based recognizer to
use OpenNLP library) which we trust. It means that we        extract event indicators.
consider texts annotated by appropriate NERs as the gold         The last component of the pipeline (Possible Event
standard and we do not consider confidence of extracted      Extractor) uses previously extracted information together
entities (if NERs provide such information) during the       with the description of events in order to extract possible
process of evaluating the extraction patterns.               events. First, this component determines if a sentence
    We also assume that if two events are extracted by       may contain at least one event of any type. To do it,
the same DPT pattern, they both are either true or false.    it iterates over the description of event types. For each
More generally, a group of N events all extracted by         event type, it determines if a sentence contains (1) an in-
the same DPT pattern contains only either true or false      dicator of the appropriate type, and if so (2) a minimal
events. A verity of the group is determined by the ver-      number of the appropriate named entities. For instance,
ity of a random sample drawn from it. This is the main       for the event of the type PersonAnnouncement a sentence
assumption that allows us to minimize human efforts          must contain at least one indicator of the type Announce-
required to produce extraction rules by validating only      mentIndicator and at least one named entity of the type
such a small random sample but not the whole entire          Person. For the event of the type M&A, a sentence must
group that might contain tens of thousands of events.        contain at least one indicator of the type AcquisitionIndi-
    As we have mentioned, the algorithm is composed of       cator and at least two named entities of the type Com-
four steps. At the first step we extract possible events     pany. If a sentence cannot contain a mention of at least
from an unannotated corpus of documents.                     one event, it is not processed further.
                                                                 If there can be at least one event mention in the
3.1   Extracting Possible Events                             sentence, we apply the Stanford parser to obtain DBT.
                                                             For every type of the event T, for which there can be at
Any event type is described by an indicator (a word that     least one event mention, we generate all possible events.
most clearly signals about the event presence, usually, a    At the first step, we construct a list of all appropriate
verb) and arguments together with their semantic types.      event indicators {I}ni=1 found in the sentence. Then
For instance, an event of the type M&A is described by an    we construct a set of named entities that will be a part
event indicator of the type AcquisitionIndicator and two     of possible events. We compute the shortest paths
arguments: acquirer of the type Company and acquiree         in DPT from indicators to every named entity in this
of the type Company as well. Another example is an           set. We then generate all possible events around each
event of the type PersonAnnouncement that is described       event indicator. For every possible event, we construct
by an indicator of the type AnnouncementIndicator and        its pattern id - a non unique string that is composed
one argument announcer of the type Person. Formally,         of the event type information and DPT paths. Two
events of the same type having the same paths from             3.4   Generating Linear Patterns
indicators to arguments will have the same pattern
                                                               Finally, the system annotates the groups of events as pos-
id. The format of the pattern id is the following:
                                                               itive or negative based on the subsets validated by the
Indicator Attribute1 T ype (CoveredT ext1 ):P ath1 ,
                                                               assessor at the previous step. This is done by counting
Attribute2 T ype (CoveredT ext2 ):P ath2 , ...           For
                                                               the number of true and the number of false events in the
instance, the event mentioned in the sentence “Google
                                                               subset. If there are more positive events in the subset,
acquires Neotonic Software” has the pattern id Ac-
                                                               the whole group is annotated as positive, and vice versa.
quisitionIndicator Company(Google):nsubj,              Com-
                                                               Positive groups are used further to generate and general-
pany(Neotonic Software):dobj.
                                                               ize event extraction patterns.
    It is possible to count the number N of the events that
are generated around each indicator. Let us denote the
number of distinct types of arguments as m/ , and for
each j-th distinct type let Sj be the number of arguments
in the event definition of this type, and let Nj be the num-
ber of named entities of this type in a sentence. Then
there can be Pj (Nj , Sj ) = Nj !/(Nj − Sj )! variants to
fill Sj arguments with Nj named entities. To compute
the number of possible events, we need to multiply all
                     Qm/
these values N = j=1 Pj (Nj , Sj ).
    Due to limitations of the Stanford parser, only those
sentences that have less than 80 tokens are processed.           Figure 3: Typesystem of TextMARKER rule engine.
                                                                  The input data for the pattern generation algorithm
3.2   Grouping Possible Events into Clusters                   is the sentence annotation graph. The graph contains
                                                               such annotations as events, their indicators and argu-
Once the input corpus has been processed and all pos-          ments (named entities) and sentence context which is
sible events have been identified, at the second step we       presented as types of tokens in the notation of the
group the events according to their pattern id. All events     TextMARKER[7] type system (fig. 3). We use the
inside every group have the same pattern id. In other          bottom-up generalization strategy and start with the most
words, they are all extracted by the same DPT pattern.         specific version of a pattern. We generalize patterns until
We sort the groups by cardinality and generate a random        the F1-measure of the new ones increases.
subset for each group.                                          1: procedure G ENERALIZE RULE (rule)
                                                                2: curF 1 ← 0

3.3   Assessor Validation                                       3:  newF 1 ← T EST RULE(rule)
                                                                4:  while (newF 1 − curF 1 ) > 0 do
At the third step an assessor validates a small random          5:   curF 1 ← newF 1
subset of each cluster using UIMA Cas editor1 inside            6:   rules ← empty list
Eclipse IDE. This editor (fig. 2) highlights text frag-         7:   rules.add(B EST RULE(B OTTOM U P G EN(rule)))
ments that mention possible events. For every possible          8:   rules.add(B EST RULE(Q UANT M ODIF(rule)))
event, this tool provides detailed information such as its      9:   if N EW RULES U NKNOWN(rules) then
indicator and arguments that are also highlighted in the       10:     newRule ← B EST RULE(rules)
text. The assessor should iterate through every possible       11:     newF 1 ← T EST RULE(newRule)
event, decide if a particular event is true or false and in    12:     rule ← newRule
case it is true, set its verity flag to true.                  13:   end if
                                                               14: end while
                                                               15: return (rule)
                                                               16: end procedure

                                                                      Figure 4: Pattern generalization algorithm.

                                                                  There are two operations that we use during pattern
                                                               generalizing process (fig. 4):
                                                                1. BottomUpGen: bottom-up generalization (see fig.
                                                                   3). After the type of token is changed, its neighbors
                                                                   are visited. If there are two tokens with the same
                                                                   type close to each other, they are generalized to one
                                                                   token of the same type with an expanded quantifier:
                                                                   “company” “based” “in” “Tel” “Aviv” → SW SW
Figure 2: UIMA CAS Editor perspective (Eclipse IDE).
                                                                   SW CW CW → SW[3] CW[2] → W[5]
                                                                     (a) generalizing a word to its type: “recently” →
  1 http://uima.apache.org/                                              SW; “May” → CW; “INC” → CAP
        (b) generalizing the type of a word to its immedi-
            ate parent type:                                              Table 2: Splits of the entire dataset (train/test (patterns#))
                                                                                                  1st split     2nd split
            SW, CW, CAP → W
                                                                                      M&A 61/22 (8)             55/28 (8)
        (c) generalizing the types of punctuation marks to                            MPC         38/7 (9)      30/15 (8)
            their parent type:                                                        Res       54/26 (10) 56/24 (10)
            COLON, COMMA, SEMICOLON → PM
        (d) generalizing the types to their parent types:                 manually created ones in terms of precision, recall and
            W, PM → ANY                                                   F1-measure on both the train and test sets in all splits (ta-
                                                                          ble 3), thought the number of automatically constructed
    2. QuantModif :      quantifier modification (expan-                  patterns (8) is smaller than the number of manually cre-
       sion/restriction of a quantifier)                                  ated ones (10). Note that in the first split on the test data
                                                                          and in the second split on the train data manually con-
        (a) Expansion: SW[3] → SW[2,4]                                    structed patterns did not extract any events, that is why
        (b) Restriction: CW[2,7] → CW[2,6]                                we do not provide the performance for these cases.
   Using these two operations, we iteratively generalize
the patterns. At every step we select the best pattern and                    Table 3: M&A patterns performance (train / test)
we stop the generalizing process when the F1-measure                                             Manually Automatically


                                                                               2nd split 1st split
stops improving (newF 1 ). We keep a list of previously                            Precision     0.50 / —     1.00 / 1.00
generalized patterns. If the current pattern has already                           Recall       0.11 / 0.00   1.00 / 0.33
been processed, it is not generalized further. This corre-                         F1-measure    0.18 / —     1.00 / 0.50
sponds to line 9 of the algorithm (fig. 4).                                        Precision     — / 0.50     1.00 / 1.00
                                                                                   Recall       0.00 / 0.50   1.00 / 0.50
4     Experiments                                                                  F1-measure    — / 0.50     1.00 / 0.66
Sentences annotated by the assessor with true events are
                                                                             In case of MPC events, in the first split we got higher
considered as the gold standard and are used to measure
                                                                          results using automatically constructed patterns (see ta-
the performance of event extraction patterns. Once the
                                                                          ble 4). In the second split, manually and automatically
generated patterns had been applied over the gold stan-
                                                                          constructed patterns demonstrated the same results, how-
dard, we compared the gold event annotations with the
                                                                          ever, our algorithm constructed 8 patterns as opposed to
event annotations created by the patterns under evalu-
                                                                          3 patterns constructed by a human.
ation. Two event annotations are considered to be the
same if they cover the same event indicators and argu-
ments (named entities).                                                       Table 4: MPC patterns performance (train / test)
    We ran preliminary experiments on the English part of                                        Manually Automatically
Reuters (RCV2)2 dataset [2]. Table 1 presents the details
                                                                               2nd split 1st split


                                                                                  Precision     0.78 / 0.50   0.92 / 1.00
of the entire dataset that we used (after processing all En-                      Recall        0.58 / 0.33   1.00 / 0.33
glish articles on the pipeline that detects possible events).                     F1-measure 0.67 / 0.40      0.96 / 0.50
We ran experiments on extracting three types of events -                          Precision     0.67 / 1.00   1.00 / 1.00
                                                                                  Recall        0.60 / 0.40   1.00 / 0.40
                Table 1: Dataset description                                      F1-measure 0.63 / 0.57      1.00 / 0.57
       type of event           M&A MPC                     Res
       number of sentences        83       45              80                 Automatically constructed patterns for Res events also
       number of true events      12       16              25             outperformed those manually created in terms of F1-
                                                                          measure in the first and second splits. However, the
Mergers & Acquisitions (M&A), Management Position                         precision of manually constructed patterns was higher in
Change (MPC) and Resignation (Res). We compare the                        both splits.
performance of automatically constructed patterns with                        In the second experiment, we tried to get more real-
the patterns that we built manually based on RSS arti-                    istic comparison results of manually and automatically
cles obtained from such sources as Yahoo and Google                       constructed patterns. What we did was validation of
news feeds. In total, we manually created 10 patterns for                 manually constructed patterns on the entire dataset (ta-
M&A, 3 patterns for MPC and 5 patterns for Res events.                    ble 7) and ran the algorithm through a 5-fold cross val-
   In the first experiment, we created 2 train/test splits.               idation 5 times (due to the fact that we used a relatively
In each split, we constructed rules on the train set and                  small dataset). The results are presented in table 6.
validated them on the test one. Table 2 provides split                        Afterwards, we tested manually constructed patterns
details as well as the number of patterns that have been                  on the entire dataset. The results are given in table 7.
constructed automatically in each case. Tables 3-5 give                   As it can be seen, we got quite varying results for dif-
an overview of the results of experiments. For M&A                        ferent partitions (F1-measure: 0.4523-0.8333 for M&A,
event, automatically constructed patterns outperformed                    0.5428-0.7023 for MPC and 0.5000- 0.6764 for Res
   2 Initially, we wanted to use RCV1 that is much bigger. However,       events). A conclusion that can be made from these results
due to performance characteristics of the first implementations of the    is that we need a larger and more representative dataset
algorithm in terms of execution time, we decided to user smaller corpus   to construct and evaluate comprehensive event extraction
and experiment on RCV1 after we optimize the algorithm.                   patterns.
    Table 5: Res patterns performance (train / test)          Table 7: Performance of manually constructed rules on
                        Manually Automatically                the entire dataset
     2nd split 1st split
       Precision       1.00 / 1.00   0.83 / 0.50                                     M&A MPC Res
       Recall          0.18 / 0.33   0.86 / 0.67                          Precision  0.5     0.73    1.0
       F1-measure 0.30 / 0.50        0.84 / 0.57                            Recall   0.08    0.53    0.2
       Precision       1.00 / 1.00   0.90 / 0.33                         F1-measure 0.14     0.62 0.33
       Recall          0.19 / 0.25   0.90 / 0.75
       F1-measure 0.32 / 0.40        0.90 / 0.46                   Databases, WebDB ’98, pages 172–183, London,
                                                                   UK, UK, 1999. Springer-Verlag.

Table 6: F1-measure of automatically constructed rules         [5] Oren Etzioni, Michele Banko, Stephen Soderland,
(5-fold cross validation 5 times)                                  and Daniel S. Weld. Open information extraction
                M&A        MPC    Res                              from the web. Commun. ACM, 51(12):68–74, De-
                                                                   cember 2008.
               0.4523 0.5428 0.5000
               0.4666 0.5714 0.5988                            [6] Oren Etzioni, Michael Cafarella, Doug Downey,
               0.4888 0.6750 0.6321                                Stanley Kok, Ana-Maria Popescu, Tal Shaked,
               0.5555 0.6904 0.6426                                Stephen Soderland, Daniel S. Weld, and Alexander
               0.8333 0.7023 0.6764                                Yates. Web-scale information extraction in know-
                                                                   itall: (preliminary results). In Proceedings of the
5   Conclusions and Future Work                                    13th international conference on World Wide Web,
In this paper we gave an overview of the current progress          WWW ’04, pages 100–110, New York, NY, USA,
in developing the algorithm for semi-automatic genera-             2004. ACM.
tion of linear event extraction patterns for free texts and    [7] Peter Kluegl, Martin Atzmueller, and Frank Puppe.
presented our preliminary experimental results. Our next           Integrating the rule-based ie component textmarker
steps are to enhance the algorithm with additional capa-           into uima. In Joachim Baumeister and Martin Atz-
bilities, optimize it in terms of execution performance            mueller, editors, LWA-2008 (Special Track on In-
and validate it on a larger dataset.                               formation Retrieval), pages 73–77, 2008.
    Currently, we extract only mandatory events’ argu-
ments. We will add capability to generalize patterns that      [8] Yunyao Li, Rajasekar Krishnamurthy, Sriram
will include optional arguments as well (such as tempo-            Raghavan, Shivakumar Vaithyanathan, and H. V.
ral and monetary expressions). We plan to explore the              Jagadish. Regular expression learning for infor-
possibility to improve and optimize the algorithm. We              mation extraction. In Proceedings of the Confer-
will work on enhancing the generalization algorithm by             ence on Empirical Methods in Natural Language
providing additional operations as well as an intelligent          Processing, EMNLP ’08, pages 21–30, Strouds-
selection of operations to apply at each iteration. We will        burg, PA, USA, 2008. Association for Computa-
also explore the possibility to employ negative examples.          tional Linguistics.
We have a dataset of approximately 110 000 news arti-
cles that we will use for validating the algorithm. We will    [9] Ryan McDonald. Extracting relations from un-
also use Reuters RCV1 dataset for this purpose. We will            structured text. Technical report, 2005.
study in much more details theoretical properties of the      [10] Stephen Soderland. Learning information extrac-
proposed algorithm.                                                tion rules for semi-structured and free text. Mach.
                                                                   Learn., 34(1-3):233–272, February 1999.
References                                                    [11] Roman Yangarber, Ralph Grishman, and Pasi
 [1] Eugene Agichtein and Luis Gravano. Snowball: ex-              Tapanainen. Automatic acquisition of domain
     tracting relations from large plain-text collections.         knowledge for information extraction. In In Pro-
     In Proceedings of the fifth ACM conference on Dig-            ceedings of the 18th International Conference on
     ital libraries, DL ’00, pages 85–94, New York, NY,            Computational Linguistics, pages 940–946, 2000.
     USA, 2000. ACM.
                                                              [12] Alexander Yates, Michele Banko, Matthew Broad-
 [2] Massih-Reza Amini, Nicolas Usunier, and Cyril                 head, Michael J. Cafarella, Oren Etzioni, and
     Goutte. Learning from multiple partially observed             Stephen Soderland. Textrunner: Open information
     views - an application to multilingual text cate-             extraction on the web. In HLT-NAACL (Demonstra-
     gorization. In Advances in Neural Information                 tions)’07, pages 25–26, 2007.
     Processing Systems 22 (NIPS 2009), pages 28–36,
     2009.
 [3] Nguyen Bach and Sameer Badaskar. A Review of
     Relation Extraction. 2007.
 [4] Sergey Brin. Extracting patterns and relations from
     the world wide web. In Selected papers from the In-
     ternational Workshop on The World Wide Web and