An Event Extraction System via Neural Networks
                                   Alapan Kuila                                               Sudeshna Sarkar
                Indian Institute of Technology Kharagpur                          Indian Institute of Technology Kharagpur
                         alapan.cse@iitkgp.ac.in                                        sudeshna@cse.iitkgp.ernet.in

ABSTRACT                                                                  To overcome the difficulties of complicated feature engineering
In this paper we describe the IIT KGP team’s participation in the         and domain dependency, researchers use neural network approach
Event Extraction task at FIRE 2017. We have developed an event            for event classification [2, 11, 14]. But all these works deal with
extraction system which can extract event-phrases from tweets             English language and principle objective of these tasks is to detect
written in Indian language scripts along with Roman script. We            the trigger word in the text which indicate an event. Some of these
designed our system on Hindi language and then used the same              papers also identify the arguments related to these event trigger
system for Malayalam and Tamil languages. We have submitted two           and their corresponding roles in the events [2, 14, 18].
systems one uses pipelined architecture another uses non-pipelined
architecture. In case of pipelined architecture we first identify the     3     TASK DEFINITION
tweets which contain event inside it and then extract the event-          Event extraction task at Fire 2017 requires participants to detect
phrase from those tweets. In case of non-pipelined system all the         event-phrase from given tweets. In the training set tweets are writ-
tweets are directly pass to the event extraction system. Though           ten in three Indian languages: Hindi, Malayalam and Tamil along
conceptually simple, non-pipelined approach gives better result           with romanized script. The objective is to detect the phrase within
than pipelined approach and achieves F1-score of 50.01, 48.29 and         the tweet which depicts events such as natural disasters(floods,
51.80 on Hindi, Malayalam and Tamil dataset respectively.                 earthquakes etc), man made disasters (accidents, crime etc), politi-
                                                                          cal events (inaugurations by political leaders, poltical rallies etc),
1     INTRODUCTION                                                        cultural/social events (Seminars, Conferences, light music events
Event Extraction from unstructured text is one of the most impor-         etc).
tant and problematic task in Information extraction and natural
language processing. Event extraction deals with automatic extrac-        4     DATASETS
tion of events depicting accidents, crime, natural disasters, political
                                                                          Dataset contain tweets written in both Indian languages and Ro-
events etc. from various newswires, discussion forums, social media
                                                                          man script. Three Indian languages are: Hindi, Malayalam and
texts. Most of the existing event extraction systems [2, 8, 14] deals
                                                                          Tamil. Training dataset contains two file for each language. One file
with English texts where main objective is to detect event trigger
                                                                          contains all the tweets obtained using the Twitter API. Another an-
words and to classify those trigger words among predefined event
                                                                          notation file contains event phrases extracted from tweets present
classes [11, 14]. Though there exists several successful works for
                                                                          in previous file. Each line in the annotation file contains: tweet-id,
English language such as ACE, TAC1 evaluation tracks but there
                                                                          user-id, Event phrase of the tweet, index where this phrase starts
is no such standard event extraction tool for Indian Languages.
                                                                          in the tweet string, string length of the event phrase. Test file con-
The Event extraction task at FIRE 2017 aims to identify and extract
                                                                          tains only the tweets with corresponding tweet-id and user-id. The
events from newswires and social media text specifically tweets.
                                                                          details of the training and test dataset is depicted in the Table 1.
The tweets are written in three Indian language scripts: Hindi,
Malayalam and Tamil along with romanized script. Unlike typical
event extraction systems[8, 14] where the objective is to detect the
                                                                                  Table 1: Dataset description: number of tweets
trigger words from sentences and classify the words to a predefined
event types, the FIRE 2017 shared task on event extraction deals
with extraction of event-phrase (which depicts any event) from                Language         Training data      No of events       Test
the given tweets. In this paper, we present the system we devel-                                                  in annotation      data
oped for this event extraction task at FIRE 2017 which deals with                                                 file
event extraction from newswires and social media text in Indian               Hindi            1024               402                4451
languages.                                                                    Malayalam        2218               674                5173
                                                                              Tamil            3843               1109               5304
2     RELATED WORK
Many approaches have been taken to extract events from text.
Judea and Strube,2015 formulated the event extraction problem
as frame-semantic parsing [4]. McClosky et al.,2011 [12] uses de-         5     SYSTEM DESCRIPTION
pendency parsing to extract events. Previously researchers use
                                                                          In this section we describe our event extraction system. We have
feature based approach to extract events [3, 9, 18]. But features
                                                                          experimented with two types of event extraction systems: 1. Non-
are domain dependent and needs huge linguistic knowledge [15].
                                                                          pipelined approach 2. Pipelined approach. We have used neu-
1 https://tac.nist.gov/2017/KBP/                                          ral networks as the main technique in both the cases.
5.1       Preprocessing                                                           5.3       Run2: Pipelined approach
The training file contains tweets which are written in mainly Indian              It is noticed in the training corpus that approximately 40% of the
language script with some Romanized script. Some of the tweets                    tweets contains event phrases. So it is vacuous to check all the
are ending with urls. To avoid data sparseness problem we have                    tweets for extracting event phrase. From this intuition we have
replaced all the urls with a unique token. Event annotation file                  employ ed an event classification module before the event extrction
contains some event phrases which are taken from same tweets                      module, depicted in non-pipelined approach. In case of pipelined
and indicate same event and the words contained in those event-                   approach first events are classified as event-tweet and without
phrases are more or less same. We have omitted those redundant                    event-tweets. Tweets which are classified as event-tweets by our
event-phrases.                                                                    classification module are fed to the event extraction module. Other
                                                                                  tweets which are classified as without event-tweets are discarded
5.2       Run1: Non-pipelined approach                                            and are not fed to the event-extraction module. The classification
In case of non-pipelined approach we have formulated the event                    module is similar to [5] [6] where authors have done sentence
extraction problem as sequence labelling problem. For every token                 modelling and sentence classification using convolution neural
in the input tweet we have tagged the word with ’0’ or ’1’ i.e. ’outside          network.
event-phrase’ or ’inside event-phrase’ respectively. And for this                                   Preprocessing
                                                                                                                                            Tweet classification
task we have used a combination of convolution neural network [7]                 input tweets                       Cleaned Tweets                                 Tweets containing events

along with bidirectional LSTM [16]. In order to prepare the input to
the convolution layer we have made a fixed sequence length which
is same as maximum tweet length and also used padding for shorter
sentences with a special token when necessary. We have used an
embedding layer in the neural network to transform each token into                                                                            event extraction model

a real valued vector [13, 17]. And then the sequence of real valued                                             output:Run2

vectors is fed to the neural network model. The main neural network
architecture employed here is a combination of convolution neural
network(CNN) [7] followed by a bidirectional LSTM [16]. Input                                Figure 2: Block diagram of pipelined apprach
to the convolution layer is a matrix of size n ∗ m where n is the
sequence length and m is the dimensionality of the word vector.
CNN pass the input matrix representation through a convolution                       5.3.1 Tweet Classification. Here we have used a convolution
layer with a fixed filter length and filter size. And then without                nueral network(CNN) based architecture for tweet classification. As
using any pooling layer we have again passed the output of the first              the tweets are of different length so padding is applied to make them
convolution layer to the second convolution layer with another                    of fixed size. Now these padded sequences are fed to an embedding
fixed filter length and size keeping the sequence length same as                  layer to convert the tokens into a fixed size real-valued vectors.
input sequence length. Now this internal representation is of size                Then the sequences of fixed size vectors are fed to a convolution
n ∗ mc where mc is the dimension of internal vector representation.               layer followed by maxpooling layer. The internal representation
This internal vector representation is fed to a bidirectional LSTM                again fed to a combination of convolution layer followed by a pool-
with one hidden layer. The output of the bidirectional LSTM layer is              ing layer. The model uses multiple filter size to get multiple features.
followed by a softmax layer to compute the probability distribution               Now the output is fed to a fully connected softmax layer which
over the possible tags of ’0’ or ’1’ for each token in the sequence.              gives the probability distribution over two classes: event-tweet
                                                                                  or without event-tweet. The performance of Tweet-classification
    W1      W2      W3             Wn       Input                                 module is reported in Table 2.
                                           Word Embedding
                                                                                               Conv layer              Pooling Layer          Concatination

                                                                                                                                                                Conv and Pooling layer
             Convolution layer           Two back to back convolution layer           W1
                                                                                      W2
                                                                                      W3
                                         Feature vector representation                W4
                                                                                      W5
                                                                                                                               Contextual feature vectors
                                            Forward LSTM                                    Filter size=3,4,5   Feature maps
   LSTM     LSTM   LSTM           LSTM
                                                                                                                                                            0                Softmax layer
                                                                                                                                                            1
                                                                                        P                                                                   Tweet Classification
   LSTM     LSTM   LSTM           LSTM     Backward LSTM                                                                                                    0: Tweet without event
                                                                                        P P: padding token                                                  1: Event-tweet
                                                                                   Input Tweet
                                           Output Layer
    Out     Out     Out            Out

                                           Softmax layer                                             Figure 3: Tweet Classification Module
    0/1     0/1     0/1            0/1     Tag Sequence

                                                                                    Eventually the tweets classified as event-tweets are fed to the
             Figure 1: Event-extraction architecture                              event extraction module described in non-pipelined section. The
                                                                              2
architecture of event extraction module in pipelined approach is                   3 that the precision is very much low in both pipelined and non-
same as non-pipelined approach. The only difference is that in                     pipelined system. We will investigate on our model to improve the
case of pipelined approach at the training time we use only those                  precision score.
tweets which contains events. Tweets which contains no event are
discarded from training data.                                                      Table 3: Result on the final test set[P: Precision, R: Recall]
   Event extraction module will give the event span(i.e. the event
phrase) within the tweets.                                                          Language                        Run1                               Run2
                                                                                                        P          R          F-sore       P          R           F-score
              Table 2: Tweet classification accuracy                                                    (%)        (%)        (%)          (%)        (%)         (%)
                                                                                    Hindi               36.58      79.02      50.01        31.42      56.37       40.35
       Language             Precision(%)         Recall(%)                          Malayalam           32.98      90.20      48.29        39.98      57.50       47.17
       Hindi                82.92                64.15                              Tamil               43.16      64.77      51.80        39.73      49.33       44.01
       Malayalam            86.08                62.26
       Tamil                83.33                63.69

                                                                                   7     CONCLUSION AND FUTURE WORK
5.4    Postprocessing                                                              We have taken two strategies for event extraction. In case of non-
The event phrase which depicts events inside a tweet consists of                   pipelined approach we have classified each word with tag ’0’ or ’1’
cosecutive word sequences. So after sequence tagging if there exist                indicating inside event phrase or outside event-phrase. But there
’0’s inside sequence of ’1’s then first ’1’ is taken as the strating point         are many tweets which do not indicate any event. So in pipelined ap-
of event-phrase and the last ’1’ in the sequence indicates the ending              proach first we have detected those tweets which contain any event
of event-phrase. All the tokens inside the boundary are cosidered                  and then identify the span of the event inside the tweet. The accu-
as event-pharase. We use this heuristic to maintain the constraint                 racy of the pipelined approach depends on accuracy of the tweet
that all the event-phrases consists of consecutive tokens.                         classification module. So we will try to improve the performance
                                                                                   of tweet-classification module. In our experiment the number of
5.5    Parameters and training                                                     training tweets are very low. If more training data could be used
                                                                                   the event extraction accuracy may increase. In future we will try to
Event extraction model used in pipelined and non-pipelined ap-
                                                                                   increase the performance of the event extraction system by using
proach uses same architecture and hyperparameters. Regarding
                                                                                   more training data and other advanced strategies [1, 10].
embeddings we have used 100 dimensions for word embedding in
the word embedding layer. The first convolution layer uses filter
                                                                                   REFERENCES
size of 3 and number of filters used m f = 30. In second convolution
                                                                                    [1] Yubo Chen, Shulin Liu, Xiang Zhang, Kang Liu, and Jun Zhao. 2017. Automatically
layer the filter size is 4 and number of filters mh = 20. The bidi-                     Labeled Data Generation for Large Scale Event Extraction. In Proceedings of
rectional LSTM layer uses one hidden layer with hidden layer size                       the 55th Annual Meeting of the Association for Computational Linguistics, ACL
                                                                                        2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers. 409–419.
60. For event classification we have used CNN based classification                      https://doi.org/10.18653/v1/P17-1038
approach which uses word embedding of size 100. These vectors are                   [2] Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, Jun Zhao, et al. 2015. Event
randomly initialized and fed to the embedding layer. We have em-                        Extraction via Dynamic Multi-Pooling Convolutional Neural Networks.
                                                                                    [3] Yu Hong, Jianfeng Zhang, Bin Ma, Jianmin Yao, Guodong Zhou, and Qiaom-
ployed filter size of {3, 4, 5} with 20 filters for each filter size for the            ing Zhu. 2011. Using cross-entity inference to improve event extraction. In
convolution operation. Finally, we have trained the neural network                      Proceedings of the 49th Annual Meeting of the Association for Computational Lin-
models using adam optimizer with suffled minibatches, dropout                           guistics: Human Language Technologies-Volume 1. Association for Computational
                                                                                        Linguistics, 1127–1136.
rate=0.5, backpropagation for gradient calculation and parameter                    [4] Alex Judea and Michael Strube. 2015. Event Extraction as Frame-Semantic
modification.                                                                           Parsing.. In * SEM@ NAACL-HLT. 159–164.
                                                                                    [5] Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional
                                                                                        neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014).
6     RESULT AND ERROR ANALYSIS                                                     [6] Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification.
                                                                                        In Proceedings of the 2014 Conference on Empirical Methods in Natural Language
Table 3 shows the performance of event extraction in all three                          Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT,
languages using both pipelined and non-pipelined approach. While                        a Special Interest Group of the ACL. 1746–1751. http://aclweb.org/anthology/D/
examining the result in each languge we have found that non-                            D14/D14-1181.pdf
                                                                                    [7] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classifi-
pipelined system has given better F-score than pipelined approach.                      cation with Deep Convolutional Neural Networks. In Advances in Neural Infor-
In Hindi dataset Pipeline system acquire F-score of 40.35 but in                        mation Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Wein-
non-pipelined approach the F-score is 50.01. For Malayalam the F-                       berger (Eds.). Curran Associates, Inc., 1097–1105. http://papers.nips.cc/paper/
                                                                                        4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
score in Pipelined and non-pipelined approach are 47.17 and 48.29                   [8] Qi Li, Heng Ji, and Liang Huang. 2013. Joint Event Extraction via Structured
respectively which are comparable. But in Tamil non-pipelined                           Prediction with Global Features.. In ACL (1). 73–82.
                                                                                    [9] Shasha Liao and Ralph Grishman. 2010. Using document level cross-event in-
system whose F-score is 51.80 beats pipelined system (F-score:                          ference to improve event extraction. In Proceedings of the 48th Annual Meeting
44.01). Error propagation in pipelined approach may be responsible                      of the Association for Computational Linguistics. Association for Computational
for this low performance of pipelined system. The performance of                        Linguistics, 789–797.
                                                                                   [10] Shulin Liu, Yubo Chen, Shizhu He, Kang Liu, and Jun Zhao. 2016. Leveraging
tweet-classification module directly influenced the event extraction                    FrameNet to Improve Automatic Event Detection. In Proceedings of the 54th
system in pipelined approach. It is also obvious from the Table.                        Annual Meeting of the Association for Computational Linguistics, ACL 2016, August
                                                                               3
     7-12, 2016, Berlin, Germany, Volume 1: Long Papers. http://aclweb.org/anthology/
     P/P16/P16-1201.pdf
[11] Shulin Liu, Yubo Chen, Kang Liu, and Jun Zhao. 2017. Exploiting Argument
     Information to Improve Event Detection via Supervised Attention Mechanisms.
     In Proceedings of the 55th Annual Meeting of the Association for Computational
     Linguistics (Volume 1: Long Papers). Association for Computational Linguistics,
     1789–1798. https://doi.org/10.18653/v1/P17-1164
[12] David McClosky, Mihai Surdeanu, and Christopher D. Manning. 2011. Event
     Extraction As Dependency Parsing. In Proceedings of the 49th Annual Meeting
     of the Association for Computational Linguistics: Human Language Technologies -
     Volume 1 (HLT ’11). Association for Computational Linguistics, Stroudsburg, PA,
     USA, 1626–1635. http://dl.acm.org/citation.cfm?id=2002472.2002667
[13] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic regularities
     in continuous space word representations.
[14] Thien Huu Nguyen, Kyunghyun Cho, and Ralph Grishman. 2016. Joint Event
     Extraction via Recurrent Neural Networks.. In HLT-NAACL. 300–309.
[15] Thien Huu Nguyen and Ralph Grishman. 2015. Event Detection and Domain
     Adaptation with Convolutional Neural Networks.
[16] M. Schuster and K.K. Paliwal. 1997. Bidirectional Recurrent Neural Networks.
     Trans. Sig. Proc. 45, 11 (nov 1997), 2673–2681. https://doi.org/10.1109/78.650093
[17] Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a
     simple and general method for semi-supervised learning. In Proceedings of the
     48th annual meeting of the association for computational linguistics. Association
     for Computational Linguistics, 384–394.
[18] Bishan Yang and Tom M. Mitchell. 2016. Joint Extraction of Events and Entities
     within a Document Context. CoRR abs/1609.03632 (2016). arXiv:1609.03632
     http://arxiv.org/abs/1609.03632


                                                                                         4