=Paper= {{Paper |id=Vol-2266/T5-2 |storemode=property |title=A Neural Network based Event Extraction System for Indian Languages |pdfUrl=https://ceur-ws.org/Vol-2266/T5-2.pdf |volume=Vol-2266 |authors=Alapan Kuila,Sarath chandra Bussa,Sudeshna Sarkar |dblpUrl=https://dblp.org/rec/conf/fire/KuilaBS18 }} ==A Neural Network based Event Extraction System for Indian Languages== https://ceur-ws.org/Vol-2266/T5-2.pdf
A neural network based Event extraction system
             for Indian languages

          Alapan Kuila, Sarath chandra Bussa, and Sudeshna Sarkar

                Indian institute of Technology kharagpur, India
alapan.cse@iitkgp.ac.in, bussasarath2@gmail.com, sudeshna@cseiitkgp.ac.in



      Abstract. In this paper we have described a neural network based ap-
      proach for Event extraction(EE) task which aims to discover different
      types of events along with the event arguments form the text documents
      written in Indian languages like Hindi, Tamil and English as part of
      our participation in the task on Event Extraction from Newswires and
      Social Media Text in Indian Languages at Forum for Information Re-
      trieval Evaluation (FIRE) in 2018. A neural netork model which is a
      combination of Convolution neural network(CNN) and Recurrent neural
      network(RNN) is employed for the Event identification task. In addi-
      tion to event detection, the system also extracts the event arguments
      which contain the information related to the events(i.e. when[Time],
      where[Place], Reason, Casualty, After-effect etc.). Our proposed Event
      Extraction model achieves f-score of 39.71, 37.42 and 39.91 on Hindi,
      Tamil and English dataset respectively which shows the overall perfor-
      mance of Event identification and argument extraction task in these three
      language domain.

      Keywords: Event extraction · Convolution neural network(CNN) · Re-
      current neural network(RNN).


1   Introduction
A huge number of news on events, occuring in different corners of the world, are
reported each moment in online and printed media. To keep track of those news,
it is very important to identify relevant events and extract the spatio-temporal
aspects about those events. Understanding events and their descriptions in raw
text is the key factor in automatic event extraction which is an important and
challenging task in Natural Language Processing(NLP) and Information Extrac-
tion(IE). It is also essential in practical applications like news summerization,
information retrival and knowledge base construction. Event extraction is an im-
portant and challenging task in Information extraction(IE) which aims to detect,
from the text, the occurrence of events of specific types, and to discover the argu-
ments(event participants or attributes) that are associated with the event. Event
Arguments are basically represents the event related information i.e. capturing
who does what to whom, how, when and where. For example,
 – S1: Mild earthquake has been found in Indonesia’s Sulawesi island.
2       A. Kuila et al.

A typical automatic Event extraction system will identify from S1 that the
sentence depicts a Earthquake event and it also extract some important in-
formation related to the earthquake such as, Place of the event: Indonesia’s
Sulawesi island.
     Current event extraction systems rely on examining event trigger words to
check if an event occurs or not and if occurs then what is the type of the event.
An event trigger is a word or phrase that clearly express the occurrence of
an event. For example, in S1, earthquake is the trigger word as it is the most
salient clue in the sentence to identify the Earthquake. The Understanding and
identifying event triggers in raw text is an important and challenging problem
as the same event may be expressed by various trigger expressions and an spe-
cific event trigger might be used to represent different events in different context.
From the previous research it is evident that Event detection task is not a simple
trigger-token searching task. And Contextual feature information is very much
important in identification and classification of event triggers. Event argument
extraction is another challenging task. Unlike entity identification task, event
argument extraction task deals with discovering the specific entities which are
relevant to the events appeared in the document. So typical NER systems are not
sufficient to solve the problem. Previously, we have worked on detecting the seg-
ment of the text which contains some event related information from twitter[18].
This task is the extension of the previous work. Here we have tried to identify
the events as well as event arguments from news documents using supervised
learning procedure with the help of annotated corpus where event triggers and
argument phrases are tagged. So our contribution in this task is two fold. First,
we have designed a hybrid neural network based model for event extraction from
news document written in Indian languages like Hindi and Tamil as well as En-
glish and second, we have represented the extracted events in structured format
(Table1).


2   Related Studies

A plethora of works have already been done in event extraction from text doc-
uments. However supervised methods are dominant in this problem which re-
quires annotated training data to employ machine learning based approaches.
Previously, the researchers have designed various feature (lexical, contextual,
syntactic) based approach for Event Detection[3, 4]. But the problem with these
approaches is that they require thorough feature engineering, and extensive lan-
guage knowledge which also impede the designing of language independent ED
model.
    Now neural networks are widely applied in NLP tasks including Event Ex-
traction(EE) as they automatically extract underlying features. Some of the
papers apply Bidirectional recurrent neural networks(Bi-RNN) as it is capable
to capture both preceding and following context information of each token in the
sentence[1, 7]. Some have also used convolution neural network(CNN) as CNN is
good at capturing features from a sequence of objects [1, 2, 5, 7]. Liu et al., 2017
        A neural network based Event extraction system for Indian languages       3

has used event argument information to enhance the quality of event detection
accuracy[6]. Peng et al., 2016 has tried to detect events from text documents
using minimal supervision approach where system needs only annotation guide-
lines[8]. It formalizes the problem as a semantic similarity problem[7]. Some
works also exists where researchers have tried to solve it through frame semantic
parser[9].
     Here, we are interested in news events depicting man-made or natural dis-
asters. In the disaster domain, maximum research works are centered around
extracting events from user generated content such as social media texts. Ex-
tracting event from social media text consists of two subtask: Identification of
informative tweets and detecting events from those informative tweets[10]. Ar-
tificial Intelligence for Disaster Response (AIDR) [11] is one such platform that
has been successfully deployed that works on streaming microblog text (twitter)
to classify messages that people post during disasters into a set of user-defined
categories of information. Muhammad et al. (2013) has extracted disaster rel-
evant information from twitter and classified them into personal, informative
and others[12]. Something similar has been done by Chowdhury et al. (2013),
who has classified crisis related tweets and categorize them into: Pre-incident,
during-incident and post-incident events[13].
     Event extraction are also represented as a natural language understanding
problem. Previously, natural language understanding task was highly depen-
dent on script knowledge[14]. Though script knowledge is very useful in event-
event relation extraction and causality detection but scripts are heavily domain
dependent and too expensive to create[15]. To get the event structure knowl-
edge from documents, narrative event chain is introduced by Chambers and
Jurafsky(2009)[14, 16]. Event Detection is also formalized as Semantic Role la-
beling(SRL) problem[17]. But the problem is that these works take only verbs
as events and from our experience we know that events could be noun, verb
or adjective. Chambers and Jurafsky(2009) have introduced the concept of the
protagonist[14] for event representation, which is also not feasible in case of dis-
aster related news documents. Though there exists several successful works for
English language such as ACE, TAC evaluation tracks 1 but there is no such
standard event extraction tool for Indian Languages. So we see that none of the
previous work is adequate to tackle the problem of event extraction from news
documents written in Indian languages.


3     Problem Description

Here, we focus on the sentence level Event extraction task. Given a news docu-
ment we want to extract the events inside that document and represent the events
in structured format. Our work covers the events which depicts the occurrence
of any disasters caused by nature (e.g. Flood, Earthquake etc.) or manmade
(e.g. terrorist attack, accident, riot etc). The term Event covers the type of the
1
    https://tac.nist.gov/2017/KBP/Event/index.html
4         A. Kuila et al.

event and the event participants or arguments which depicts the informations
regarding that event(i.e. when[Time], where[Place], cause, effect etc). So while
extracting events first we have to identify the type of the extracted event based
on the trigger word found in the text. Event argument identification is the an-
other crucial module of our system as arguments are realy important to describe
the events. Each event has some general attributes like place(where), time(when)
and some specific attributes like cause, after-effect, magnitude. Our interest is
to identify these attributes for each events which are present in the document.
And if the same event has more than one argument of same type then we have
to aggregate the argument information accordingly such that no superfluous in-
formation exist in final outcome. Table 1 depicts sentence level event extraction
output where each row indicates one event mention with corresponding argument
information.

                     Table 1. Sentence level Event Extraction output

S-id      Event         Time        Place            Casualty          Participant
          type
                                    neighborhood of 56 people were
1         Shoot-Out Thursday                                           -
                                     Dast-e-Barchi     wounded
                                    Eastern city,   35 civilians
2         Shoot-Out Friday                                             Taliban
                                       Ghazni          killed
                                                      at least 35
3         Suicide-      Wednesday   Afganistan                         -
                                                    soldiers killed
          attack




4      Datasets
This section describes the dataset that have been used in this work. The training
and the test data have been made available to the task participants by the
organizers. The training dataset consists of annotated documents where event
trigger words as well as event arguments are tagged. The testset contains news
documents without any annotations. The dataset statistics is represented in
Table 2.

                               Table 2. Dataset description

                                                                 Number of
                   Number of    Number of             Number of
Language                                                         Argument
                  Training Docs Test Docs.           Event types
                                                                    types
Hindi             107               311              14          7
Tamil             64                1438             23          10
English           100               803              16          10
       A neural network based Event extraction system for Indian languages            5

5    Methodology

An event extraction problem can be further subdevided into two subtasks:
(1)Event Detection, the task of finding the event mentions of predefined event
types and (2)Event argument extraction , the identification of event related infor-
mation like time, place, cause, effect which are relevant to the event mentions. We
formalize the Event Detection(ED) problem as a multi-class classification prob-
lem via combination of convolution neural network(CNN) and bidirectional Long
short term memory (Bi-LSTM). We expect that the Bi-LSTM and CNN would
capture syntactic and semantic information from the text so their combination
would help to enhance the overall ED performance. Let, we have k numbers of
predefined event types E1 , E2 , ..., Ek . Now, given a sentence S = w1 w2 w3 ...wn ,
where n is the sentence length, for each word wi in the sentence, we want to
predict whether the current token is an valid event trigger. And if it is an event
trigger then with which event type Ej the word wi will be matched. So, the
current word along with its sentential context constitute an event trigger candi-
date which will be input to our classification model. We have fixed the context
window size in order to feed the trigger candidate to the CNN.
    If the window size is k then trigger candidate for word W would be repre-
sented as [w−k , w−k+1 , ..., w0 , ..., wk−1 , wk ] where current word is positioned in
the middle position (i.e. w0 ). Now these (2k +1) size trigger candidates are taken
as input to the CNN and Bi-LSTM models.
    Before entering the CNNs and Bi-LSTMs, each word is converted into a real
valud vector by looking up a embedding table with an intution that these real
valued vectors will capture various semantic characteristics of the words. So the
input to our model is a matrix X of size (2k + 1) ∗ |V | where |V | is the dimension
of the real valued vectors and X = [x−k , x−k+1 , ..., x0 , ..., xk−1 , xk ] where xj is
the vector representation of word wj . Now the two dimensional representation
of each word is feed to a convolution layer followed by a max-pooling layer.
    In convolution layer we have used a set of filters F1 , F2 , ..Fp and each filter Fi
has a window size mi and can be represented as a matrix of size mi ∗ |V |. After
employing max-pooling on the convolution layer output we get a hidden vector
representation of size |F1 | + |F2 | + ... + |Fp | where, |Fi | represents the number of
instances of filter type Fi . The hidden vector representation which we get from
CNN model is named as FCN N .
    Besides CNN, we have also used Bi-directional Long Short Term Memory(Bi-
LSTM) in this work. The matrix representation of each word W which is rep-
resented as X = [x−k , x−k+1 , ..., x0 , ..., xk−1 , xk ], is also feed to the Bi-LSTM
model. The hidden states of Bi-LSTM are computed both in forward and back-
ward ways at each time step. We denote the outputs of the forward and backward
           −
           →        ←
                    −
LSTM as ht and h′t respectively. At each time step t, hidden vector hi is com-
puted based on current input vector xt and previous hidden vector ht−1 .
                               −
                               → −−−−→ −−→
                               ht = LST M (ht−1 , wt )                              (1)
                               ←
                               − ←−−−− ←−−
                               h′t = LST M (h′t−1 , wt )                            (2)
6      A. Kuila et al.

                                         → ←
                                         −   −
Then the output at time t is ht = [ ht , h′t ]. Here we have taken the hidden
representation of x0 i.e. concatination of hidden vector representation of forward
       −
       →                                        ←−
LSTM h0 And hidden vector representation of h′0 as the output of the Bi-LSTM.
                           −
                           →  ←
                              −
Formally, FBi−LST M = [h0 , h′0 ].
   Now to combine the output of CNN and Bi-LSTM we have concatenated the
representation vector FCN N and FBi−LST M and feed the concatenated vector
to a fully connected layer followed by a softmax layer to get the proper event
type of the current word W (see Fig. 1). The gradients are calculated using
back-propagation. We have also implemented regularization by dropout.




          Fig. 1. Deep neural architecture for Event Trigger classification


    For Event-Argument Extraction we have applied the same model which is
used for Event Trigger classification. Only the number of output classes is dif-
ferent in these two models. We have used BIO annotation scheme for argument
identification as most of the arguments contain more than a single token.
    Now after Event-Trigger and Event-Arguments have been identified, our
next task is to link the event arguments (which are identified by the Event-
Arument extractor) to their corresponding event trigger words so that we can
identify the event participants for a specific event trigger. For Event Trigger-
Argument linking we have taken a simple heuristic based approach where,
For each argument phrase we have identified the nearest event trigger word and
link those argument- trigger pair. The distance between trigger and argument is
calculated by the number of sentences between them and if a tie is there then
decision has been taken by number of tokens in between the pairs. Sometimes,
a sentence contains useful argument information regarding an event but there
        A neural network based Event extraction system for Indian languages         7

exist no event-trigger word in that very sentence. In that situation the argument
is mapped to nearest event-trigger present in the previous or next sentence.


6     Resources and Hyperparameters

We have used the same parameters for the neural network models used in trigger
identification and argument extraction. In Convolution neural network we have
used filter size of 2, 3, 4 and for each size we have taken 200 filters to generate
feature maps from the convolution operatons. The final output of the CNN is
represented as FCN N which has size 600. The number of hidden units used
in Bi-LSTM is 200 which leads to the size of final outcome of the Bi-LSTM
layer FBi−LST M is 400. Regarding embedding, we have used pre-trianed word
embeddings of size 300 which we have taken from the fastText toolkit2 . Finally,
we have trained the neural network models using adam optimizer with suffled
minibatches, dropout rate=0.5, backpropagation for gradient calculation and
parameter modification.


7     Result Analysis

According to the FIRE 2018 official evaluation metodology the performance of
the system is measured by the Precision and Recall and F-score. For example,
let there is an Event-mention E1 and there are six fields such as Event Type,
Location, Time, Event-Participants, Causes, Effects for that event. Now for that
event E1, if all these fields are identified correctly then the system gets full score
of 1 else according to the identified fields the score will be calculated. And
finally macro-average of the Precision and Recall is calculated for all events in
all documents. And the performance of our system is depicted in 3. From the


     Table 3. Event extraction performance on the FIRE 2018 official evaluation

Language           Precision            Recall                F-measure
Hindi              62.85                29.02                 39.71
Tamil              59.98                27.20                 37.42
English            65.16                28.77                 39.91



result, we can conclude that we have to improve the Recall score to enhance
the system performance. In all the three languages the precision score is quite
acceptable but due to low Recall, overall performance of the system is not so
satisfactory.
2
    https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
8         A. Kuila et al.

8      Error Analysis
Our event extraction system has three steps. Firstly we have identfied the event
trigger words. Then, we have extracted argument phrases. Lastly we have linked
the arguments with their nearest trigger words. We have split the training data in
the the ratio of 80:20 and use the first partition(80%) for training and remaining
20% as development set. In this section we will try to analyze the system’s
performance on development set and examine the shortcomings of the system.

8.1     Event Identificaton:
While identifying the trigger words, system sometimes fail to get the contextual
information which has an important role in event identification. For example:
    – S2: Nepalese Maoists will not attack on ordinary people and politicians any
      more.
The system tagged attack as an event trigger of type Terrorist-Attack. But from
the context it is clear that this sentence does not describe any terrorist attack
event.
    – S3: It is reported that 56 people were also wounded in the bombing in the
      neighborhood of Dasht-e-Barchi.
The event identification module sometimes failed to correctly classify the event
types present in the documents. We have noticed that the system is confused
among related event types like terrorist-attack, Shoot-out, Suicide-attack or
Vehicular-Collision, Transport-hazard and Accident. For example, in S3, sys-
tem classified trigger word bombing as Shoot-out type event but it is actually an
event of type Suicide-attack.
    – S4: Officials blame the Islamic State group for this attack.
In some of the cases system failed to detect the correct event triggers from the
sentences. For example, in S4, system failed to identify attack as an event trigger
word.

8.2     Event-Argument Identificaton:
We have checked the argument extraction output and identified some errors,
which we will dicuss below.

Symantic Variety: Based on development set, have noticed that our system
performance is not so satisfactory for some frequent arguments like Place, Time.
The possible cause is the symantic variety. Some arguments are represented in
a very specific way. For example,
    – S5:The magnitude of the first quake was 4.9.
       A neural network based Event extraction system for Indian languages       9

 – S6:After a short span there is an another earthquake whose magnitude is
   4.1.
 – S7:The Governor of California was injured in a motorcycle accident near
   the house on Sunday afternoon.
   The magnitude of quake is represented very specific way (example S5, S6)
and it is always a numeric value. But arguments like place is not all time a place
entity which can be detected by typical Named entity recognizer(NER) systems.
For example in S7, near the house is not a typical location entity, but is the
place of event type VEHICULAR COLLISION.

Token missing in phrase: Most of the event arguments are more than one
word phrase. But our system is failed to identify all the words in the phrase which
leads to missing important information regarding the event. For example, in the
S7 our system has extracted Sunday as time argument whereas the accurate
time information will be Sunday afternoon.

Reason and After-effect: Our system has failed to identify arguments like
Reason and After-effect for any event instance. Most of the time reason and
effect of any event is an event itself and are represented by a long sequence of
words or phrase. But as there is no such specific pattern in these two arguments
the system is failed to identify them. The cases where there is a semantic clue
present in the sentence then system can identify it. For example:
 – S8:US Airways has been forced to land in Tehran due to mechanical disrup-
   tion on its way to Amsterdam with 255 passengers from Mumbai.
The system has accurately identified the reason of the AVIATION HAZARD
type event as: due to mechanical disruption. The presence of the term due to
may help in this case. But in most of the cases this type of clues is not present.
And the proposed system accuracy is miserably bad for these two attributes.
Some smarter technique have to be employed to improve our performance.

8.3   Event trigger-argument linking:
In this mdule, we have linked the event trigger words with its corresponding
arguments using a heuristic approach stated in Section 5.

Incorrect assignment It is noticed that the heuristic approach which takes
nearest token for linking, may causes wrong output. Some times due to complex
syntatic structure of the sentence a valid argument may appear far away from
the trigger word and lots of other token are exist between that argument and the
valid trigger word. But our heuristic function has selected an incorrect trigger-
argument pair. For example,
 – S9: Translation: Earthquake in Indonesia.
10     A. Kuila et al.

 – S10: Mild earthquake has been found in Indonesia’s Sulawesi island.
 – S11: Meanwhile, one Indian has died from the four floors of Manama Hotel
   in Dubai.
In the sentence S11, there is no trigger word of our interest but our Event-
Argument extractor has identified One Indian has died as a casualty argument.
Our linking module will wrongly map this with EARTHQUAKE event in the
previous sentence(S10), though they are totaly unrelated.

Mutual exclusiveness It is highly probable that one single argument may
be attribute of more than one different event instance. But according to our
heuristic function one argument can only be linked to a specif event trigger
word. These problem leads to missing important information in the extracted
events and directly affect system performance.


9    Conclusion and Future Work
The main problem that impede the designing of Event extraction model for
Indian language news document texts is scarcity of annoatated corpus. The sec-
ond hinderance is unavailability of syntactic and semantic feature extractor for
Indian languages. In this task we have relied on automatically learning effec-
tive features from data, without using language specific resources. So our hybrid
neural network model incorporates both bidirectional LSTM and CNN to cap-
ture sequencial and structural information from text. Our main insight from is
that, the sentence level event extraction is not effective in case of news docu-
ments. When we take a news article into consideration, we have noticed that the
detected events may have their arguments distributed along the length of the
document. Sometimes event arguments and corresponding event trigger may ap-
pear in two different sentences. And it is evident that the event span(portion of
the document where event related information exist) may be a single sentence or
multiple sentence or even a whole document. So, identifying events on sentence
level will narrow down the event span which leads to a substantial loss in ac-
quired event related information. Eventually the sentence level event extraction
system is inadequate to get the document level view of the events.To cope up
with this problem, detection and aggregation of coreferring event mentions could
be an useful remedy. As future work we would like to explore more sophisticated
models for Event extraction and try to incorporate background knowledge and
other language resources to improve the system performance.


References
1. Nguyen, Thien Huu, Kyunghyun Cho, and Ralph Grishman. ”Joint event extraction
   via recurrent neural networks.” Proceedings of the 2016 Conference of the North
   American Chapter of the Association for Computational Linguistics: Human Lan-
   guage Technologies. 2016.
       A neural network based Event extraction system for Indian languages           11

2. Chen, Yubo, et al. ”Event extraction via dynamic multi-pooling convolutional neural
   networks.” Proceedings of the 53rd Annual Meeting of the Association for Computa-
   tional Linguistics and the 7th International Joint Conference on Natural Language
   Processing (Volume 1: Long Papers). Vol. 1. 2015
3. Li, Qi, Heng Ji, and Liang Huang. ”Joint event extraction via structured prediction
   with global features. ” Proceedings of the 51st Annual Meeting of the Association
   for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2013.
4. Hong, Yu, et al. ”Using cross-entity inference to improve event extraction.” Proceed-
   ings of the 49th Annual Meeting of the Association for Computational Linguistics:
   Human Language Technologies-Volume 1. Association for Computational Linguis-
   tics, 2011.
5. Nguyen, Thien Huu, and Ralph Grishman. ”Event detection and domain adaptation
   with convolutional neural networks.” Proceedings of the 53rd Annual Meeting of the
   Association for Computational Linguistics and the 7th International Joint Confer-
   ence on Natural Language Processing (Volume 2: Short Papers). Vol. 2. 2015.
6. Liu, Shulin, et al. ”Exploiting argument information to improve event detection via
   supervised attention mechanisms.” Proceedings of the 55th Annual Meeting of the
   Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2017.
7. Feng, Xiaocheng, Bing Qin, and Ting Liu. ”A language-independent neural network
   for event detection.” Science China Information Sciences 61.9 (2018): 092106.
8. Peng, Haoruo, Yangqiu Song, and Dan Roth. ”Event detection and co-reference with
   minimal supervision.” Proceedings of the 2016 Conference on Empirical Methods in
   Natural Language Processing. 2016.
9. Spiliopoulou, Evangelia, Eduard Hovy, and Teruko Mitamura. ”Event detection us-
   ing frame-semantic parser.” Proceedings of the Events and Stories in the News
   Workshop. 2017.
10. Imran, Muhammad, et al. ”Practical extraction of disaster-relevant information
   from social media.” Proceedings of the 22nd International Conference on World
   Wide Web. ACM, 2013.
11. Imran, Muhammad, et al. ”AIDR: Artificial intelligence for disaster response.”
   Proceedings of the 23rd International Conference on World Wide Web. ACM, 2014.
12. Imran, Muhammad, et al. ”Extracting information nuggets from disaster-related
   messages in social media.” Iscram. 2013.
13. Chowdhury, Soudip Roy, et al. ”Tweet4act: Using incident-specific profiles for clas-
   sifying crisis-related messages.” ISCRAM. 2013.
14. Chambers, Nathanael, and Dan Jurafsky. ”Unsupervised learning of narrative event
   chains.” Proceedings of ACL-08: HLT (2008): 789-797.
15. Chambers, Nathanael, and Daniel Jurafsky. ”A Database of Narrative Schemas.”
   LREC. 2010.
16. Pichotta, Karl, and Raymond J. Mooney. ”Learning Statistical Scripts with LSTM
   Recurrent Neural Networks.” AAAI. 2016.
17. Exner, Peter, and Pierre Nugues. ”Using semantic role labeling to extract events
   from Wikipedia.” Proceedings of the workshop on detection, representation, and
   exploitation of events in the semantic web (DeRiVE 2011). Workshop in conjunction
   with the 10th international semantic web conference. 2011.
18. Kuila, Alapan, and Sudeshna Sarkar. ”An Event Extraction System via Neural
   Networks.” FIRE (Working Notes). 2017