<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Preliminary Investigation on Causality Information Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pankaj Dadure</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Partha Pakray</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sivaji Bandyopadhyay</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science &amp; Engineering, National Institute of Technology Silchar</institution>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Causality refers to the association of variables in a system. Humans can communicate causal interactions directly via natural language and this helps them to gain insight into how the system works. In any general context, causality is the study of the specific connection which allows the action of one event to impact others. There are several approaches that have been developed for this but still, the door is open for the inclusion of new technologies and concepts. The main focus of this work is the extraction of causal relations from unstructured text data. In which we have implemented a word embedding approach using a universal sentence encoder model has trained with a deep averaging network encoder. In which, the data of the articles are split into a new instance for each separate text body of news articles and create embeddings. For similarity measurement, cosine similarity is used. This is our primary investigation to developed the baseline system for the retrieval of causally related articles/documents.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Causality extraction</kwd>
        <kwd>Universal sentence encoder</kwd>
        <kwd>Word embedding</kwd>
        <kwd>Cosine similarity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>With the rapid growth and evaluation of web from where user’s only able to access the
information to user’s are able to generate the information [1]. Thus, this process of information retrieval
and generation provides the research direction for many applications like data representation,
data management, data analysis, etc. In recent time, social media (like Facebook), blogging (such
as Twitter) are undoubtedly well embraced technologies which mainly focused on the opinion
sharing, chatting, data sharing (like photos and videos), comments, profile creation [ 2]. In
addition to the social platform, the traditional information sharing platform like news channels,
newspaper tends to be more active when it providing the feasibility of sharing, commenting,
constructing, and linking documents together. These co-operative events provide the bridge for
generating the data online in a huge amount.</p>
      <p>In the current era, the automatic extraction of semantic relationships is becoming incredibly
influential for questions answering, information retrieval, event prediction, generating future
scenarios, and decision processing [3]. The relations for instance part-whole, if-then,
causeefect, etc shows how the event and entities recognized and compressed the pivotal information.
Due to its capacity to influence decision-making, the relation between cause and efect is
considered to play a very important role. However, the representation of cause-efect may vary
and its hard to understand and formalize these variations using single style grammar. The
existing cause-efect studies have numerous syntactical representations and each of them needs
refinement to achieve the state-of-the-art results in a particular domain. These studies argue
that the cause-efect extraction tool will have the capability to extract an undefined association
between the causal and the efect component for the purpose to construct the more accurate
and generic causal relation and also handle the disambiguate of any relation. The novel research
direction in information retrieval is to retrieve an additional list of documents which causally
relevant to search results [4]. In constraint to conventional information retrieval which typically
retrieves the set of documents with respect to user’s enter query, causal retrieval retrieves the
set documents that describe the set of potential causes leading to an efect specified in the query.
In causal retrieval system, the nature of relevance is difer from traditional topical relevance. In
this paper, we have implemented the word embedding using Universal Sentence Encoder (USE)
model has trained with a Deep Averaging Network (DAN) encoder.</p>
      <p>The paper is structured as follows: Section 2 describes the prior works in domain of causal
relation extraction. Section 3 gives a detailed account of the dataset. Section 4 provides detailed
description about the Methodology. Section 5 describes the experimental results. Section 6
conclude with summary and directions of further research and developments.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>The causation knowledge extraction from natural text is under limelight over the period of
time and become the significant contribution of many application like question answering
[5], information retrieval [6], summarization [7], decision making [8]. The exploration of
non-taxonomical relations described as the toughest task in working with the learning process.
The research provides an improved context for the discovery and classification of ontological
relations through a machine learning technique. For instance, in the classification of semantic
relations, the meaning of the input text has taken into consideration [9]. In which initial
semantic patterns through input data have drawn and extracted the patterns which assist
in identifying the cause senses of input pairs of words and decides the presence of causal
relations in the phrase. Dependency trees allow capturing long range relationships of words.
The current approaches may ignore key details by too vigorously cutting the dependency or
are insuficient to computerize because it’s dificult to correlate the tree structures. For instance,
graph convolutional networks are designed for the retrieval of relationships that eficiently
combine knowledge on subjective dependence structures [10]. In addition, a new pruning
technique to input trees has applied to add relevant information while maximally eliminating
irrelevant material by holding words right around the smallest distance between two individuals
in which a relationship can be formed.</p>
      <p>A standard framework for inferences using Constrained Conditional Models (CCMs) [11],
primarily described the joint challenge as an integer linear programming problem, which is
consistent with time and causality limits. The joint mechanisms for inference indicate that the
extraction from the document for both temporal and causal relations, is statistically significantly
improved. The existing causality extraction approaches used the pattern, constraints, deep
learning, and machine learning techniques which require considerable domain knowledge and
huge computation power. The BiLSTM-CRF model [12] extracts the cause and relation without
analyzing the candidate causal pairs and recognized the relation independently. Moreover,
the contextual string embeddings have been deployed to handle data deficiency problems and
also uses the multi-head self-attention mechanism which learned the dependencies between
causal words [13]. To extract the causality of social science text which contains text about
the experiences of the women’s life. In which applied machine learning approach has been
used and extracts the causality with other relevant information. To speed-up the relationship
extraction, hypothesis identification, and cause-efect entities extraction from the articles of
social science [14], V. Chen et. al. developed models to classify these articles into the business
and management. Moreover, the categorized hypotheses are causal relationships or not and if
these are categorized as casual, then extract the cause-efect entities.</p>
      <p>The automatic extraction of cause-efect relations using a recursive neural network [ 15]
convey relationships in arbitrarily complex ways. The technique provides embeddings and
additional linguistic characteristics to detect causal events and their consequences in a phrase. After
clustering and correctly generalizing the identified events and their relationships, the causal
graph is then used for predictive modeling. To extract the causality from text [16], Restricted
Hidden Naive Bayes architecture considered features like contextual features, syntactic features,
position features, and causal connectives. These features are extracted from the tree kernel
similarity of sentences and are considered in independence. In constraint to this, restricted
hidden naive bayes architecture model considered a feature in interaction and this interaction
among the features avoid the over-fitting problem.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset Description</title>
      <p>The provided dataset contains the 3,03,291 news articles which are extracted from the Telegraph
India1. The metadata about the provided dataset is given in table 1. The dataset contains two
ifelds namely doc_id, and text in which text is the main body or content of the news articles.
The process of generating a causally related dataset is time and efort consuming. Sometimes,
the cause and efect have divulged in the same event. The queries in causality-based information
need system are holding a cause-efect relationship with needed information.</p>
      <sec id="sec-3-1">
        <title>1https://cair-miners.github.io/CAIR-2020-website/</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <sec id="sec-4-1">
        <title>4.1. Preprocessing</title>
        <p>4.1.1. Extraction of abbreviations and full form
Abbreviations and acronyms are commonly contained in any text article [17]. It is crucial that all
types of abbreviation must be recognized in order to find the significance of abbreviations that
encourage the processing of the natural language and the collection of knowledge and literature.
To facilitated the abbreviation to full form mapping, we have extracted all the abbreviation and
their full form from the provided the news articles and created the dictionary where keys are
abbreviations and values are full forms. Afterward, all abbreviations were replaced with their
respective full forms.
4.1.2. Removal of hyperlinks
4.1.3. Removal Stopwords
For more refined and structured data, we have removed all email ids, weblink, paper references.
Stopwords (I, an, is, the, etc in English) in any natural language are the most common terms
[18]. These stopwords may not add enough meaning to the importance of the document when
analyzing text data and constructing NLP models. Thus to save execution time and efort for
processing hug amounts of text, we have removed stopwords. For this task, we have used the
spaCy’s inbuilt stopwords removal function.
4.1.4. Lower casing
Words like Book and book have the same syntactic and semantic meaning but when not
converted to the lower case those two are represented as two diferent words in the vector space
model2. To handle this, we have converted all information contains in articles into lower case.
4.1.5. Stemming
Stemming is used to make down the word (e.g. flying) to their root form (e.g. fly). In stemming,
the root of the word is not the actual root word, it just a canonical form of the actual word.
Stemming uses a simple heuristic method to trim the ends of terms/words in the expectation
words that are correctly translated into their source [19]. So the words "benefactor", "benevolent",
"beneficial” might actually be converted to "bene". There are numerous stemming algorithms.
The porter Algorithm seems to be the most popular algorithm considered to be empirically
useful for English.</p>
        <sec id="sec-4-1-1">
          <title>2https://thehelloworldprogram.com/python/python-string-methods/</title>
          <p>4.1.6. Converting Numbers
When the user makes a question like $10 or ten dollars. Both searched words are similar to the
user’s entered question. However, some IR models handle them individually and store $10 and
ten dollars as diferent tokens. So to boost our model, we have converts 10 to ten. To achieve
this we have used a num2word library3.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Word embeddings using USE</title>
        <p>For cause-efect extraction, the model is trained and optimized for larger length text such as
text consist in provided news articles. It is trained on a variety of data like sport, business,
crime, national and international issues, and attain the active responding on a wide spectrum of
natural language understanding. In this trained model input is a variable length lower-cased
tokenized string English text and the output is a 128-dimensional vector. The USE model [20]
has trained with a DAN encoder which fragments the text body of news articles by new line
into a list of the individual instance, so that embeddings is created for each text body of news
articles. In DAN encoder, the embeddings of words of text body and bi-gram is averaged, then
transmitted via a feed forwards deep-neural network to produce embedding for the text body.
The created embeddings, generates the tensor object of shape nx128, where n is the number of
text body of news articles is 303291, therefore the shape of the tensor is 303291x128. The key
benefit of this model is that the computation time of the textual input is linear.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Similarity</title>
        <p>To estimate the similarity between the text body of news articles and the user’s query, the cosine
similarity is taken into the consideration. The cosine similarity is used to compare the similarity
between the documents and based on that it provides the ranking to the documents with respect
to the user’s entered query. Statistically, it calculates the cosine angle in a multidimensional
space between two vectors [21].</p>
        <p>In this work, the two vectors containing the embeddings of the text body of news articles and
text based user’s entered query is compared. Consider the two vector x and y. The mathematical
form of cosine similarity is given as below:
(, ) =</p>
        <p>· 
‖‖ · ‖ ‖ = √︁∑︀=−11()2 ×
∑︀=−01  · 
√︁∑︀=−11()2
(1)
where ‖‖ is the Euclidean norm of vector x=(1,2,...,), defined as 12+22+...x2.
Conceptually, it is the length of the vector. Similarly, ‖‖ is the Euclidean norm of vector y. A cosine
function of 0 indicates that the two vectors have really no correlation with each other and holds
the 90 degrees angle. The nearer the cosine function to 1, the lower the angle and the higher
the vector match. The cosine similarity is useful, because even though the two identical texts
are very distant due to their size from the Euclidean, they still have a closer angle. The angle is
lower, the resemblance is higher.</p>
        <sec id="sec-4-3-1">
          <title>3https://pypi.org/project/num2words/</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Results</title>
      <p>The proficiency of the proposed approach is tested using the 15 queries which consist of a ’title’
(usually a small number of keywords) and a ’narrative’ (a paragraph describing the relevance
criteria in detail). For each user’s entered query, the proposed approach retrieves the top 40
articles based on keywords presents in the user’s query. The obtained retrieved results stored
in TSV file which have six fields namely query_id, iteration, document id, rank, similarity score,
and run number where query_id represents the query unique identification number, iteration is
ifxed to 0 and it’s not used by trec_tool, document id represents the unique identifier of the
retrieved articles, rank attributes represents the rank of a retrieved article in a result set, score
attribute represents the similarity score between the user’s entered &amp; retrieved articles and the
run number represents the number of runs submitted by the participant. For evaluating the
proficiency, trec_tool 4 is used, which compared the gold dataset (qrel file) with the result set
obtained from the proposed system. The obtained results of article retrieval for the cause-efect
relation are shown in table 21. All the obtained results of the participants have been ranked
based on the obtained MAP metric. For generic and more comparative analysis, the results are
also estimated in terms of P@5. The obtained results for the queries "Accused Sanjay Dutt"
and "Babri Masjid demolition case against Advani" are shown in table 3 and 4. As the provided
query, contains the two parts i.e. title and narrative. To estimate the similarity with documents
and for the retrieval, we have used the "title" part only.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>This is our primary investigation in the field of information retrieval from causally related
documents. In which, we have implemented the word embedding using a universal sentence
encoder model that has trained with a deep averaging network. In which, text body of the news
articles are split by new instance and create embeddings for each. For similarity measurement,
cosine similarity is used. This DAN based universal encoding models prepared long sentence
level embeddings that shown the strong influenced in the retrieval of causally related documents.
However, the proposed approach have computationally less expensive and with little lower
accuracy. This depicts the room of improvement and inclusion of highly balance methodologies.</p>
      <p>The powerful ability of feature abstraction to catch the indirect and unclear causal relations
eficiently, which helps to make the majority of the current systems more accurate. So in the</p>
      <sec id="sec-6-1">
        <title>4https://github.com/usnistgov/trec_eval</title>
        <p>future, the recursive neural network will adopt to efectively capture implicit and ambiguous
causal relations.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgment References</title>
      <p>The authors would like to express their gratitude to the Department of Computer Science and
Engineering and Center for Natural Language Processing, National Institute of Technology
Silchar, India for providing the infrastructural facilities and support.</p>
      <p>[1] D. Sánchez, L. Martínez-Sanahuja, M. Batet, Survey and evaluation of web search engine
hit counts as research tools in computational linguistics, Information Systems, 73 (2018)
pp. 50–60.
[2] S. Amer-Yahia, L. Lakshmanan, C. Yu, Socialscope: Enabling information discovery on
social content sites, 4th Biennial Conference on Innovative Data Systems Research (CIDR),
Asilomar, California, USA (2009) pp. 1–11.
[3] C. D. Ta, T. P. Thi, Automatic extraction of semantic relations from text documents, in:
International Conference on Future Data and Security Engineering, Springer, 2016, pp.
344–351.
[4] S. Datta, D. Ganguly, D. Roy, F. Bonin, C. Jochim, M. Mitra, Retrieving potential causes
from a query event, in: Proceedings of the 43rd International ACM SIGIR Conference on
Research and Development in Information Retrieval, 2020, pp. 1689–1692.
[5] R. Girju, Automatic detection of causal relations for question answering, in: Proceedings
of the ACL 2003 workshop on Multilingual summarization and question answering, 2003,
pp. 76–83.
[6] C. S. Khoo, J. Kornfilt, R. N. Oddy, S. H. Myaeng, Automatic extraction of cause-efect
information from newspaper text without knowledge-based inferencing, Literary and
Linguistic Computing, 13 (1998) pp. 177–186.
[7] V. Gupta, G. S. Lehal, A survey of text summarization extractive techniques, Journal of
emerging technologies in web intelligence, 2 (2010) pp. 258–268.
[8] G. Loewenstein, J. S. Lerner, The role of afect in decision making, Handbook of afective
science, 619 (2003) pp. 1–24.
[9] A. S. H. Al Hashimy, N. Kulathuramaiyer, Ontology enrichment with causation relations,
in: IEEE Conference on Systems, Process &amp; Control (ICSPC), IEEE, 2013, pp. 186–192.
[10] Y. Zhang, P. Qi, C. D. Manning, Graph convolution over pruned dependency trees improves
relation extraction, Proceedings of the 2018 Conference on Empirical Methods in Natural
Language Processing, Brussels, Belgium, Association for Computational Linguistics, (2018)
pp. 2205–2215.
[11] Q. Ning, Z. Feng, H. Wu, D. Roth, Joint reasoning for temporal and causal relations,
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,
Melbourne, Australia, (2018) pp. 2278–2288.
[12] Z. Li, Q. Li, X. Zou, J. Ren, Causality extraction based on self-attentive bilstm-crf with
transferred embeddings, arXiv preprint arXiv:1904.07629 (2019) pp. 1–39.
[13] R. P. Kumar, P. Aswathi, Extraction of causality and related events using text analysis, in:
2019 2nd International Conference on Intelligent Computing, Instrumentation and Control
Technologies (ICICICT), volume 1, IEEE, 2019, pp. 1448–1453.
[14] V. Zitian Chen, F. Montano-Campos, W. Zadrozny, Causal knowledge extraction from
scholarly papers in social sciences, arXiv e-prints (2020) pp. 1–12.
[15] T. Dasgupta, R. Saha, L. Dey, A. Naskar, Automatic extraction of causal relations from text
using linguistically informed deep neural networks, in: Proceedings of the 19th Annual
SIGdial Meeting on Discourse and Dialogue, 2018, pp. 306–316.
[16] S. Zhao, T. Liu, S. Zhao, Y. Chen, J.-Y. Nie, Event causality extraction based on connectives
analysis, Neurocomputing, 173 (2016) pp. 1943–1950.
[17] H. Yu, G. Hripcsak, C. Friedman, Mapping abbreviations to full forms in biomedical articles,</p>
      <p>Journal of the American Medical Informatics Association, 9 (2002) pp. 262–272.
[18] D. Munková, M. Munk, M. Vozár, Influence of stop-words removal on sequence patterns
identification within comparable corpora, in: International Conference on ICT Innovations,
Springer, 2013, pp. 67–76.
[19] A. G. Jivani, et al., A comparative study of stemming algorithms, International journal of
computer technology and applications, 2 (2011) pp. 1930–1938.
[20] D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M.
GuajardoCespedes, S. Yuan, C. Tar, et al., Universal sentence encoder, arXiv preprint
arXiv:1803.11175 (2018) pp. 1–7.
[21] F. Rahutomo, T. Kitasuka, M. Aritsugi, Semantic cosine similarity, in: The 7th International
Student Conference on Advanced Science and Technology ICAST, volume 4, 2012, pp. 1–2.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>