<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Real-World Causal Relationship Discovery from Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Constantine Lignos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chester Palen-Michel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oskar Singer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pedro Szekely</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elizabeth Boschee</string-name>
          <email>boscheeg@isi.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Sciences Institute University of Southern California Marina del Rey CA 90292</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Automatic extraction of causal relations from text has the potential to aid in the understanding of complex scenarios, but to date there has been limited work exploring extraction from natural data at scale. We describe a system that implements a rich language processing pipeline for the purpose of extracting causal relations between events described in text. The system uses a syntactic pattern-based approach to causality, using mutual bootstrapping to expand a set of seed patterns by discovering additional high-reliability patterns through a human-in-theloop approach. We evaluate the performance of the system on newswire data and explore the properties of the causal relations it identi es.1</p>
      </abstract>
      <kwd-group>
        <kwd>Causal relationship extraction</kwd>
        <kwd>Information extraction</kwd>
        <kwd>Natural language processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Existing manual methods to holistically represent the causal relationships
contained in complex geopolitical, sociopolitical, and economic environments are
both labor and time intensive. Our goal is to empower the understanding of
such environments by automatically organizing the information available from
vast and diverse data sources, ultimately allowing decision makers to explore the
impact of their possible actions via quantitative analysis and what-if simulations.
For this poster, we focus speci cally on the extraction and organization of causal
information from text, which is critical to downstream planning.</p>
      <p>
        Causality extraction from text has been previously explored in both
unsupervised and supervised settings, but rarely in the context of a real-world application.
For instance, the natural language processing (NLP) community developed a 2010
shared task [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] that provided data annotated with semantic relations, including
1 Copyright (c) 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). This material is based
upon work supported by United States Air Force under Contract No.
FA8650-17-C7715.
causality, between nominal phrases in sentences, e.g. the outbreak is the biggest
ever caused by the vaccine. However, a model can be very successful on these
curated sentences but perform extremely poorly \in the wild". A simple model
we trained achieved a precision of 0.88 and recall of 0.91 on this data set, but
precision plummeted to &lt; 0:10 when run directly on a broad sample of newswire,
where sentences were signi cantly more complex and true causality more rare.
      </p>
      <p>For this poster, we describe two critical elements of our approach to real-world
extraction of causal relationships from text: a bootstrapping approach to causal
pattern discovery and a exible end-to-end document processing pipeline that
integrates a complex variety of customized and o -the-shelf NLP components.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Causal Relationship Discovery</title>
      <p>
        The core concepts of mutual bootstrapping have been well-studied; Rilo and
Jones provide an excellent retrospective summary [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] in honor of their seminal
work from 1999 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. To be e ective, the techniques we use here require two
di erent \views" of the data: the lexical and ontological content of two possibly
causally-related events and the syntactic connections between them.
      </p>
      <p>We begin by manually generating a small set of simple causal sentences, e.g.
the protests caused violence, which we use to generate an initial set of syntactic
patterns. We use these patterns as the seed for our automatic, iterative pattern
discovery process, applying them to a large unannotated corpus. At each iteration
a small amount of human feedback can be included to reduce semantic drift.</p>
      <p>To begin an iteration, the system applies all already-discovered patterns
(including the seed set) to generate pairs of events likely to be causally related,
e.g. Attack(`bombing') and Death(`fatalities') from the sentence The bombing
caused three fatalities. Our hypothesis is that these pairs of events may contain
characteristics that can predict these causal relationships; perhaps Attacks are
likely to be causes, and particularly likely to be causes of Deaths. The use of
ontological classes is critical in moving beyond the simple patterns that capture
the low-hanging fruit of this task (e.g. X causes Y ). For this e ort, we use the
ontology developed for DARPA's Causal Exploration program, which contains
500 event types, mostly in the domain of sociopolitical and military events,
e.g. CounterTerrorismOperation or Tari OrTradeSanctions. At the same time,
our system also nds causal relationships between unontologized events, e.g.
expressing uorescent protein in some tissue allows us to see individual cells.</p>
      <p>For example, in our rst iteration, the system estimates that Attack /Death
pairs are twice as likely as a random pair of events to be causally related; if the
Attack is also ontologically classi ed as a MilitaryAction, a causal relationship is
forty-one times more likely. We assign each event pair a score that re ects that
likelihood, combining a measure of the \causality" of each individual word (e.g.
bombing as a cause and fatalities as an e ect), the word pair together, and if available
the same for their ontological classes (e.g. cause=event:Attack, e ect=event:Death,
as discussed above). Event pairs with scores above an experimentally-tuned
threshold are deemed causally-likely for this iteration.</p>
      <p>Real-World Causal Relationship Discovery from Text</p>
      <p>The second stage of each iteration then generates new patterns using all
instances of these causally-likely event pairs in the corpus. For instance, the
bombing/fatalities pair might suggest a new pattern (cause) -[nsubj]-&gt; result
&lt;-[prep]- in &lt;-[pobj]- (effect) from the sentence The bombing resulted in
ve deaths. Each pattern represents a dependency path between two nodes, where
each node consists of the original text, stemmed text, and optional input and
output labels. The input labels categorize the node as a member of a particular
ontological class (e.g. Person or Attack ). The output labels indicate the role
that a node plays in a causal relation (e.g. cause, precondition, mitigating factor,
e ect). The labels on the paths (e.g. [nsubj]) represent universal dependency
relations. Each node attribute and dependency edge label can become a wildcard,
allowing one path to become many patterns with varying levels of granularity.</p>
      <p>Each pattern is evaluated on the basis of the event pairs it generates and how
causally-likely each one is deemed by the system. The highest scoring patterns
are retained, and their matches can then be used to inform the scoring decisions
in the next iteration. In practice, we use the scoring mechanism to generate a
ranked list of candidate patterns, and use a human-in-the-loop approach to select
new patterns to avoid semantic drift by only adding high-precision patterns. Here
is a selection of patterns suggested by our system at an early iteration, along
with examples of causal relationships they extract.
(effect) -[ccomp]-&gt; mean &lt;-[nsubj]- (cause)
Sa na's ranking means the Williams sisters will be in the same half of the draw.
(effect / event:Death) &lt;-[advcl]- (cause / event:Attack)
Two police o cers died after a car rigged with explosives was detonated.
(cause / event:Decrease) -[advcl]-&gt; (effect / event:Decrease)
Automakers are expected to reduce vehicle production by 25 percent from last year,
when auto sales fell 18 percent from 2007 levels.</p>
      <p>The rst and second are good, high-precision additions to our collection,
the former applying to all possible pairs and the latter restricted by ontological
classes (Attack /Death). The third, however, is too general, since it frequently
produces false positives where the two events are merely coincident, for example
Domestic tra c fell 4.6 percent while international tra c fell 11.2 percent.</p>
      <p>We evaluated our system on 1,000 newswire documents using a baseline of 660
seed patterns developed by pattern-writing experts over several months. Even on
top of this strong baseline, our iteratively-discovered patterns improve recall by
6.6% while maintaining the precision of the seed patterns ( 70%).
3</p>
    </sec>
    <sec id="sec-3">
      <title>End-to-End System</title>
      <p>
        To provide the rich structure required for applying causal patterns, we developed
an NLP pipeline which combines software and components developed speci cally
for this system. Documents are processed in the following sequence: DetectorMorse
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] identi es sentence boundaries, spaCy [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] provides tokenization, part of speech
tagging, and dependency parses, a conditional random eld-based system trained
on ACE [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] data identi es named entities, a sieve-based [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] system provides
      </p>
      <p>
        Lignos et al.
entity coreference, Joint IE [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and BBN ACCENT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] provide ACE and CAMEO
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] ontology events, and JAMR [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] provides abstract meaning representation
(AMR) parses. We use a new, exible Python NLP framework (ISI VistaNLP) to
integrate all components, implement the named entity extraction, coreference,
and causal pattern discovery and matching systems, and perform experiments.
This framework enables the simultaneous representation of multiple analyses of a
document (e.g. events extracted by multiple systems using di erent ontologies),
allowing the merging of information extracted by all components to a single
representation using a single ontology.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>The system described here provides the structure needed to identify causal
relations between linguistically-rich events and a framework for matching and
discovering causal patterns. This precision-centric pattern-based approach can be
bootstrapped from a relatively small set of seed patterns and human-in-the-loop
ltering, enabling the identi cation of relations without annotated data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Boschee</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lautenschlager</surname>
            , J.,
            <given-names>O</given-names>
          </string-name>
          <string-name>
            <surname>'Brien</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shellman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Starz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ward</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>BBN ACCENT event coding evaluation</article-title>
          .
          <source>In: ICEWS Coded Event Data. Harvard Dataverse</source>
          (
          <year>2015</year>
          ), https://doi.org/10.7910/DVN/28075/GBAGXI
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Doddington</surname>
            , G.R., Mitchell,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Przybocki</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramshaw</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strassel</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weischedel</surname>
            ,
            <given-names>R.M.:</given-names>
          </string-name>
          <article-title>The automatic content extraction (ACE) program-tasks, data, and evaluation</article-title>
          .
          <source>In: Proc. LREC</source>
          . vol.
          <volume>2</volume>
          , p.
          <volume>1</volume>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Flanigan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Carbonell, J.,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>A discriminative graph-based parser for the abstract meaning representation</article-title>
          .
          <source>In: Proc. ACL</source>
          . vol.
          <volume>1</volume>
          , pp.
          <volume>1426</volume>
          {
          <issue>1436</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gerner</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schrodt</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yilmaz</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abu-Jabr</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Con ict and mediation event observations (CAMEO): A new event data framework for the analysis of foreign policy interactions (2002), presented at the Annual Meeting of the ISA</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gorman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <source>DetectorMorse</source>
          (
          <year>2014</year>
          ), https://pypi.org/project/DetectorMorse/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hendrickx</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozareva</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seaghdha</surname>
            ,
            <given-names>D.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pado</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pennacchiotti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romano</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szpakowicz</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>SemEval-2010 task 8: Multi-way classi cation of semantic relations between pairs of nominals</article-title>
          .
          <source>In: Proc. International Workshop on Semantic Evaluation</source>
          . pp.
          <volume>33</volume>
          {
          <issue>38</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Honnibal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montani</surname>
          </string-name>
          , I.:
          <article-title>spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Joint event extraction via structured prediction with global features</article-title>
          .
          <source>In: Proc. ACL</source>
          . vol.
          <volume>1</volume>
          , pp.
          <volume>73</volume>
          {
          <issue>82</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Raghunathan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangarajan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chambers</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Surdeanu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A multi-pass sieve for coreference resolution</article-title>
          .
          <source>In: Proc. EMNLP</source>
          . pp.
          <volume>492</volume>
          {
          <issue>501</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Rilo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Learning dictionaries for information extraction by multi-level bootstrapping</article-title>
          .
          <source>In: Proc. AAAI</source>
          . pp.
          <volume>474</volume>
          {
          <issue>479</issue>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rilo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>A retrospective on mutual bootstrapping</article-title>
          .
          <source>AI</source>
          Magazine
          <volume>39</volume>
          (
          <issue>1</issue>
          ),
          <volume>51</volume>
          {
          <fpage>61</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>