ssn_nlp@FIRE2020 : Automatic extraction of causal
relations using deep learning and machine translation
approaches
Thenmozhi D, Arunima S, Amlan Sengupta, Avantika Balaji
Department of CSE, SSN College of Engineering, Chennai
 theni_d@ssn.edu.in
{arunima17016,amlan17008,avantika17021}@cse.ssn.edu.in


                                       Abstract
                                       Causality can be understood as the relationship between two events such that the occurrence of one event
                                       results in the occurrence of the other event either directly or indirectly. This paper aims to identify whether
                                       the given sentences have a causality effect present in them and to classify the cause and effect words/phrases
                                       if present. The approach used for classification uses deep learning algorithms and the annotation task uses
                                       machine translation. These models are applied to the dataset provided by CEREX@FIRE2020. The best result
                                       for the causality identification part was obtained from BiLSTM with an F1 score of 0.60 and for the second task
                                       of annotation as cause and effect, NMT with Bahdanau attention mechanism with an F1 score of 0.44.

                                       Keywords
                                       CEREX, Cause, Effect, Causal connective, Logistic Regression, Bi-LSTM, NMT


1. Introduction
Causality is defined as the relation between a cause and its effect. A cause is why an event happens.
An effect is an event that happens because of the cause. Any sentence with a causal expression has
the following three components: cause, effect, and causal connective.For example, in the sentence :
“Due to inflation, the dollar is worth less than before”, - the event “inflation” is the cause and the event
“the dollar is worth less than before” is the effect. “Due to” is the causal connective here.
In recent times, automatic extraction of semantic relations, in particular, automatic extraction of
causal relations, has become essential for many natural language processing (NLP) applications like
question answering, document-summarization, opinion mining, event analysis. One of the easiest
ways to express cause-effect relations is through the form “A causes B” or “B is caused by A”. But
causality can be expressed through a wide variety of syntactic expressions, and these variations are
hard to represent using a single model. Therefore, due to the presence of complex grammatical struc-
tures in sentences, the automatic extraction of causal relations becomes a hard NLP problem to solve.
The task here is two-fold.1 The first task is to identify whether a given sentence contains a causal
event (cause/effect). Two models were employed for this task - Logistic Regression and Bidirectional
Long short term memory (Bi-LSTM). The second task is to annotate each word in a sentence in terms
of four labels ( C - cause, E- effect, CC- Causal Connective, and None). This task was implemented
using the NMT model.


FIRE 2020: Forum for Information Retrieval Evaluation, December 16-20, 2020, Hyderabad, India
orcid:
                                    © 2020 Copyright for this paper by its authors.
                                    Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings           CEUR Workshop Proceedings (CEUR-WS.org)
               http://ceur-ws.org
               ISSN 1613-0073


               1
                   https://sites.google.com/view/cerex-fire2020/home
2. Related Works
Several research work has been reported in the field of automatic extraction of cause and effect from
natural language text. The authors of [1] have proposed a linguistically informed recursive neural
network architecture for automatic extraction of cause-effect relations from text. The proposed ar-
chitecture uses word level embeddings and other linguistic features to detect causal events and their
effects mentioned within a sentence.
Cause-effect relations from documents in Metallurgy and Materials Science are extracted in this
paper[2]. They used a BiLSTM model to annotate each word for their dataset which was created
using distant supervision. LSTM based binary classifier was used for predicting whether any sen-
tence expresses causality or not.
The plausible cause-effect pairs are identified through a set of logical rules based on dependencies be-
tween words. Then Bayesian inference is used to reduce the number of pairs produced by ambiguous
patterns[3].
The author[4] extracts the causal knowledge from a medical database using graphical patterns. The
sentences were parsed using Conexor’s Functional Dependency Grammar (FDG) of English parser,
which is used to generate a representation of the syntactic structure of the sentence that is the parse
tree. The information extraction process included the matching of causality patterns with the parse
trees of the sentences, both of which were represented in the linear conceptual graph notation.
Roxana Girjud uses explicit intra-sentential patterns where the verb is a simple causative. A transitive
relation between verb synsets is known as the CAUSE-TO relation. WordNet consists of numerous
causation relationships between nouns which are always true. One way to determine such relation-
ships is to look for all patterns that occur between a noun entry and another noun in the corresponding
gloss definition. This is the basis for the detection of Causal Relations for Question Answering men-
tioned in the paper[5].
The approach used by Blanco for the detection and extraction of causations is based on the use of
syntactic patterns that may encode causation. They[6] then redefine the problem as a classification
between two classes: encoding or not encoding causation (cause or ¬cause). The model used an
implementation of Bagging with C4.5 decision trees.


3. Data Analysis and Pre-processing
The training data for both the tasks released by CEREX 2020 included 6000 rows containing 4 columns,
the sno, sentence, cause, and effect. The sno is unique for every row, sentence consists of the entire
sentence, cause is the causal part of the sentence and effect is the effect part of the sentence.
For Task A, the test dataset consisted of 764 sentences having 2 columns i.e. the sno and sentence.
And the test dataset for Task B contained 178 rows and two columns which were Sent_id i.e the sen-
tence and id and sentence. The sno and Sent_id are distinct for every sentence.

The pre-processing done for Task A included removing emojis and adding an additional label col-
umn which consists of two values i.e. 0 and 1. A sentence is classified as not causal if it doesn’t
contain both cause and effect. These kinds of sentences are labeled as 0. Otherwise, if a sentence has
both cause and effect or just has either one of them, it is causal and labeled as 1. The cause and effect
columns are then dropped. Finally, the data is split in the ratio of 80%-20% and given to the Bi-LSTM
model. For the Logistic Regression classifier, the data is split in the ratio of 70%-30% and given as the
input.
For Task B, the data was preprocessed by removing emojis and punctuations with the help of reg-
ular expressions. Extra spaces in each sentence were removed. We created a list of common causal
connectives, so along with the cause and effect words we can identify the causal connectives in the
sentence too. The list contained both words and phrases. Using wordnet2 , synonyms of each causal
connective were found and their past, present, future participle forms were added to the list.
Each word in the dataset was annotated by its specific label. If a word did not belong in either cause,
effect or was not a causal connective it was labeled as O. For the training, the dataset was split into
multiple files. ’train.in’ consists of the first 4200 sentences from the dataset and ’train.out’ consists
of the respective labels of each word present in the ’train.in’. The labels were C(cause), E(effect),
CC(causal connective), O(None). ’dev.in’ consists of the remaining 1800 sentences from the dataset
and ’dev.out’ consists of the labels generated by the model for each word in ’dev.in’. ’vocab.in’ con-
tains all the distinct set of words present in the dataset and ’vocab.out’ contains the distinct set of
labels to be generated by the model i.e CC, C, E and O.


4. Methodology and Implementation
The general methodology which we have followed contains two steps i.e. model training and post-
processing. For the first task, the input to the model consisted of the sentences with their respective
labels, i.e. 0 or 1. For the second task, the preprocessed data was stored in the form of an input file
i.e. train.in. The output file, train.out, consisted of the corresponding labels for each word in train.in.
After the result is obtained from the model, postprocessing is performed. For the first task, the pre-
dicted labels along with their sentences are returned in a file. For the second task, the resultant data
obtained is post-processed by the removal of extra spaces. This is followed by annotating each word
with its respective label i.e C, E, and CC in the format word/label. If a word contains the label O then
it is not annotated. The result is stored in a new CSV file.

4.1. TASK A :
In this task, our goal is to identify whether or not a sentence contains a causal event and classify the
sentences as 1 (if the sentence contains a causal event) or 0 (if the sentence does not contain a causal
event). Two approaches have been proposed for this task, which are Logistic Regression and Bi-LSTM.


                              Model       Accuracy      Precision    Recall    F1-Score
                             Bi-LSTM         0.92          0.94        0.95    0.95
                                LR           0.91          0.72        0.77    0.74

We have employed the Logistic Regression Classifier as it works well with small datasets. Since the
data isn’t evenly distributed across the two classes, we have performed oversampling for the minority
class 0. This is done using SMOTE (Synthetic Minority Oversampling Technique)3 such that the ratio
of the two classes is 1:1. Then, the Logistic Regression Classifier was used to classify sentences as
having causal events or not. Our second approach was based on Bi-LSTM as it effectively increases
the amount of information available to the network, improving the context available to the algorithm.
It outperformed LR based on the training dataset.
   2
       https://wordnet.princeton.edu/
   3
       https://imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.over_sampling.SMOTE.html
4.2. TASK B:
For this task, our aim is to annotate each word with its respective label which is CC, C, E or None. We
have used three NMT4 models to do the same, i.e. NMT with Scaled Luong and Normed Bahdanau
attention mechanisms, and NMT without an attention mechanism. In all three models, the recurrent
unit is LSTM.


                                            Model                  Accuracy
                                NMT with Normed Bahdanau            22.90%
                                  NMT with Scaled Luong             22.62%
                              NMT without attention mechanism       21.95%

We started with the NMT model without an attention mechanism. The input is encoded into a fixed
dimensional vector and then is decoded into the target labels from the vector. For the Scaled Luong
attention mechanism, the input file is encoded and reduced to attention scores using simple matrix
multiplication which makes it faster and more space-efficient and is then decoded into the target
labels present in train.out. Finally, we used Normed Bahdanau which performs a linear combination
of encoder states and the decoder states. The model predicts a target word based on the context
vectors associated with the initial position and the previously generated target words. We chose
Normed Bahdanau as it works better with smaller datasets and achieved a higher accuracy on the
train dataset compared to the other two models.


5. Results
We have evaluated our models using the test data of CEREX@FIRE2020 . The performance was an-
alyzed using the metrics namely precision, recall and F1-Score. We secured the second place for the
Binary Classification task. For the Tagging task, we outperformed the other teams and came first.


                      Team      Precision   Recall   F1-Score   Task
                  CSECU.DSG       0.51       0.91      0.65     A (Binary Classification)
                    ssn_nlp       0.46       0.87      0.60
                    ssn_nlp       0.36       0.57      0.44     B (Tagging)
                  CSECU.DSG       0.32       0.51      0.39

The submission made by ssn_nlp for Task A i.e classification of sentences as causal or not was the
deep learning approach using BiLSTM. The model obtained F1-Score of 0.60 against the test set. A
precision score of 0.46 and a recall score of 0.87 was encountered.
Our team ssn_nlp officially submitted an NMT model using Normed Bahdanau attention mechanism
for Tak B i.e tagging each word with its respective label. The model performed on the test set with
F1-Score of 0.44, a precision score of 0.36 and, a recall score of 0.57.


   4
       https://opennmt.net/
6. Conclusion
We used a Bi-LSTM model for the classification of sentences as containing causal events or not. This
performed better than the Logistic Regression Classifier. The model achieved an accuracy of 91% and
an F1 score of 0.95. For annotating the sentences with their respective labels, NMT with Bahdanau
attention mechanism was employed as it works better with smaller datasets compared to the Scaled-
Luong attention mechanism that is generally used for larger datasets. It produced an accuracy of
0.44. Future developments would include constructing an enriched corpus of causal connectives and
incorporating more linguistic features.


References
[1] T. Dasgupta, R. Saha, L. Dey, A. Naskar, Automatic extraction of causal relations from text using
    linguistically informed deep neural networks, in: Proceedings of the 19th Annual SIGdial Meeting
    on Discourse and Dialogue, 2018, pp. 306–316.
[2] S. Pawar, R. Sharma, G. K. Palshikar, P. Bhattacharyya, V. Varma, Cause–effect relation extraction
    from documents in metallurgy and materials science, Transactions of the Indian Institute of
    Metals 72 (2019) 2209–2217.
[3] A. Sorgente, G. Vettigli, F. Mele, Automatic extraction of cause-effect relations in natural language
    text., DART@ AI* IA 2013 (2013) 37–48.
[4] C. S. Khoo, S. Chan, Y. Niu, Extracting causal knowledge from a medical database using graph-
    ical patterns, in: Proceedings of the 38th annual meeting of the association for computational
    linguistics, 2000, pp. 336–343.
[5] R. Girju, Automatic detection of causal relations for question answering, in: Proceedings of the
    ACL 2003 workshop on Multilingual summarization and question answering, 2003, pp. 76–83.
[6] E. Blanco, N. Castell, D. I. Moldovan, Causal relation extraction., in: Lrec, volume 66, 2008, p. 74.