Recognizing Implicit Discourse Relations
           through Abductive Reasoning
        with Large-scale Lexical Knowledge

                  Jun Sugiura, Naoya Inoue, and Kentaro Inui

    Tohoku University, 6-3-09 Aoba, Aramaki, Aoba-ku, Sendai, 980-8579, Japan
                   {jun-s,naoya-i,inui}@ecei.tohoku.ac.jp


      Abstract. Discourse relation recognition is the task of identifying the
      semantic relationships between textual units. Conventional approaches to
      discourse relation recognition exploit surface information and syntactic
      information as machine learning features. However, the performance of
      these models is severely limited for implicit discourse relation recognition.
      In this paper, we propose an abductive theorem proving (ATP) approach
      for implicit discourse relation recognition. The contribution of this paper
      is that we give a detailed discussion of an ATP-based discourse relation
      recognition model with open-domain web texts.

      Keywords: Discourse Relation, Abductive Reasoning, Lexical Knowl-
      edge, Association Information


1   Introduction
Discourse relation recognition is the task of identifying the semantic relation-
ship between textual units. For example, given the sentence John pushed Paul
towards a hole.(S1) Paul didn’t get hurt.(S2) , we identify a contrast relationship
between textual units (S1) and (S2). Discourse relation recognition is useful for
many NLP tasks such as summarization, question answering, and coreference
resolution.
    The traditional studies on discourse relation recognition divided discourse re-
lations into two distinct types according to the presence of discourse connectives
between textual units: (i) an explicit discourse relation, or discourse relation with
discourse connectives (e.g. John hit Tom because he is angry.). (ii) an implicit
discourse relation, or discourse relation without discourse connectives (e.g. John
hit Tom. He got angry.). Identifying an implicit discourse relation is much more
diﬃcult than identifying an explicit discourse relation. In this paper, we focus
on the task of implicit discourse relation recognition.
    Conventional approaches to implicit discourse relation recognition exploit
surface information (e.g. bag of words) and syntactic information (e.g. syntactic
dependencies between words) to identify discourse relations [1, 2, etc.]. However,
the performance of these models is severely limited as mentioned in Sec. 2.2. We
believe that the problem of these approaches is two fold: (i) they do not cap-
ture causality between the events mentioned in each textual units, and (ii) they
2         Jun Sugiura, Naoya Inoue, Kentaro Inui

do not capture the factuality of these events. We believe that this information
plays a key role for implicit discourse relation recognition. Suppose we want to
recognize a contrast relation between S1 and S2 in the first example. To recog-
nize the discourse relation, we need to at least know the following information:
(i) commonsense knowledge: pushing into a hole usually causes getting hurt;
(ii) factuality: Paul did not get hurt. Finally, combining (i) and (ii), we need
to recognize the unusualness of discourse: something against our commonsense
knowledge happened in S2 . As described in Sec. 3, our investigation revealed
that we have several patterns of reasoning and need to combine several kinds of
reasoning to identify a discourse relation.
    Motivated by this observation, we propose an abductive theorem proving
(ATP) approach for implicit discourse relation recognition. The key advantage of
using ATP is that the declarative nature of ATP abstracts the flow of information
away in a modeling phase: we do not have to explicitly specify when and where
to use a particular reasoning. Once we give several primitive inference rules to
an ATP system, the system automatically returns the best answer to a problem,
combining the inference rules.
    In this paper, we attempt to answer the following open issues of ATP-based
discourse relation recognition: (i) does it really work on real-life texts?; (ii) does
it work with a large knowledge base which is not customized for solving target
texts? The contribution of this paper is as follows. We give a detailed discussion
of an ATP-based discourse relation recognition model with open-domain web
texts. In addition, we show that our ATP-based model is computationally feasible
with a large knowledge base.
    The structure of this paper is as follows. In Sec. 2, we describe abduction and
give an overview of previous eﬀorts on discourse relation recognition. In Sec. 3,
we describe our ATP-based discourse relation recognition model. In Sec. 4, we
report the results of pilot evaluation of our model with large lexical knowledge.


2      Background

2.1     Weighted abduction

Abduction is inference to the best explanation. Formally, logical abduction is
defined as follows:

    – Given: Background knowledge B, and observations O, where both B and
      O are sets of first-order logical formulas.
    – Find: A hypothesis (or explanation, abductive proof ) H such that H ∪ B |=
      O, H ∪ B 6|=⊥, where H is a conjunction of literals. We say that p is hy-
      pothesized if H ∪ B |= p, and that p is explained if (∃q) q → p ∈ B and
      H ∪ B |= q.

Typically, several hypotheses H explaining O exist. We call each of them a
candidate hypothesis and each literal in a hypothesis an elemental hypothesis.
The goal of abduction is to find the best hypothesis among candidate hypotheses
                       Large-scale Abductive Discourse Relation Recognition         3

by a specific measure. In the literature, several kinds of evaluation measure have
been proposed, including cost-based and probability-based [3, 4, etc.].
    In this paper, we adopt the evaluation measure of weighted abduction, which
is proposed by Hobbs et al. [4]. In principle, the evaluation measure gives a
penalty for assuming specific and unreliable information but rewards for inferring
the same information from diﬀerent observations. We summarize the primary
feature of this measure as follows (see [4] for more detail):
 – Each elemental hypothesis has a positive real-valued cost;
 – The cost of each candidate hypothesis is defined as the sum of costs of the
   elemental hypotheses;
 – The best hypothesis is defined as the minimum-cost candidate hypothesis;
 – If an elemental hypothesis is explained by other elemental hypothesis, the
   cost becomes zero.

2.2   Related work
Discourse relation recognition is a prominent research area in NLP. Most re-
searchers have primarily focused on explicit discourse relation recognition, em-
ploying statistical machine learning-based models [5, 6, etc.] with superficial and
syntactic information. The performance of explicit discourse relation recognition
is comparatively high; for instance, Lin et al. [7] achieved an 80.6% F-score.
    The performance of implicit discourse relation recognition is, however, rel-
atively low (25.5% F-score). Most existing work on implicit discourse relation
recognition [1, 2, etc.] extend the feature set of [5] with richer lexico-syntactic
information. For example, Pitler et al. [2] exploit a syntactic parse tree and sen-
timent polarity information of words contained in textual units to generate a
feature set. However, the performance is not as high as a practical level.
    An abductive discourse relation recognition model is originally presented in
Hobbs et al. [4]. However, Hobbs et al. [4] reported the results in a fairly closed
setting: they tested their model on two test texts with manually encoded back-
ground knowledge which is required to solve the discourse relation recognition
problems that appear in two texts. Therefore, it is an open question whether the
abductive discourse relation recognition model works in an open setting where
the wider range of real-life texts and large knowledge base are considered.


3     Abductive theorem proving for discourse relation
      recognition
In this section, we describe our discourse relation recognition model. We em-
ploy ATP to recognize a discourse relation. Given target discourse segments,
we abductively prove that there exists a coherence relation (i.e. some discourse
relation) between the discourse segments using background knowledge. We ax-
iomatize (i) definition of discourse relations and (ii) lexical knowledge (e.g. causal
knowledge of events) in the background knowledge, which serve as a proof of the
existence of a coherence relation.
4          Jun Sugiura, Naoya Inoue, Kentaro Inui

    The motivation of using abductive theorem proving is that we can assume a
proposition with the cost even if we fail to find a complete proof of a coherence
relation between discourse segments, as mentioned in Sec. 2. By choosing the
minimum-cost abductive proof, we can identify the most likely discourse relation.
    We first show how to axiomatize the definition of discourse relations (Sec.
3.1). We then conduct an example-driven investigation of lexical knowledge
which is required to solve a few real-life discourse relation recognition problems
in order to identify a type of lexical knowledge needed for an ATP-based recogni-
tion model (Sec. 3.2). In Sec. 3.2, we make sure that our developed theory works
on a general-purpose inference engine as we expected. We use the lifted first-order
abductive inference engine Henry, which is developed by one of the authors.[8]
To perform deeper analysis of the inference results, we also improved the existing
visualization module provided by Henry. The inference engine and visualization
tool are publicly available at https://github.com/naoya-i/henry-n700/.


3.1      Axiomatizing definitions of discourse relations

We follow the definitions of discourse relations provided by Penn Discourse Tree-
Bank (PDTB) 2.0 [9], a widely used and large-scale corpus annotated with dis-
course relations.1 The PDTB defines four coarse-grained discourse relations, but
it is still rather diﬃcult to identify all discourse relations. Therefore, we adopt
two-way classification: whether it is adversative (Comparison in PDTB) or re-
sultative (Temporal, Contingency, Expansion in PDTB). Because a resultative
relation can be regarded as relations other than an adversative relation, we first
axiomatize the definition of adversative and then consider the other relation.
    According to the PDTB Annotation Manual [12], an adversative relation
consists of two subtypes: Concession and Contrast. These subtypes are defined
below, respectively.

Concession One of the arguments describes a situation A which causes C, while
  the other asserts (or implies) ￢ C. One argument denotes a fact that triggers
  a set of potential consequences, while the other denies one or more of them.
Contrast Arg1 and Arg2 share a predicate or a property and the diﬀerence is
  highlighted with respect to the values assigned to this property.

The condition of Concession can be described as the following axiom:

    event(e1 , type1, F, x1 , x2 , x3 , s1 ) ∧ event(e2 , type2, N F, y1 , y2 , y3 , s2 )
    ∧ cause(e1 , F, e2 , F ) ∧ event(eu , type2, EF, y1 , y2 , y3 , su ) ⇒ Adversative(s1 , s2 ).

This axiom says that if event e1 occurs in segment s1 (roughly corresponding to
Arg1 in PDTB) and that event is expected to cause an event of type2 while such
an event of type2 does actually not occur in segment s2 (roughly corresponding
to Arg2 in PDTB), then the discourse relation between s1 and s2 is Adversa-
tive. The examples using this type of axiom to recognize discourse relations will
1
     For other corpora, see [10, 9, 11, etc.].
                         Large-scale Abductive Discourse Relation Recognition          5

be mentioned later. On the other hand, a typical pattern where the Contrast
relation holds can be described, for example, as follows:
         value(e1 , P os, s1 ) ∧ value(e2 , N eg, s2 ) ⇒ Adversative(s1 , s2 ).
This axiom says that when the sentiment polarity of e1 in segment s1 is Positive
and the sentiment polarity of e2 in segment s2 is Negative, the discourse relation
between s1 and s2 is Adversative. The examples of axioms described here rep-
resent formation conditions of Adversative and take some variation due to their
value of factuality or sentiment.
     Furthermore, these axioms can represent conditions of Resultative. For in-
stance, if the sentiment polarity of e2 in segment s2 is the same as that of segment
s1 , then the discourse relation between s1 and s2 is Resultative as below:
         value(e1 , P os, s1 ) ∧ value(e2 , P os, s2 ) ⇒ Resultative(s1 , s2 ).
In total, we created 21 axioms for the definition of discourse relations.
    Finally, we add the following axioms to connect the definition of discourse
relations with the existence of a coherence relation between discourse segments:
                        Adversative(s1 , s2 ) ⇒ CoRel(s1 , s2 )                      (1)
                        Resultative(s1 , s2 ) ⇒ CoRel(s1 , s2 ),                     (2)
where CoRel(s1 , s2 ) indicates that there exists a coherence relation between seg-
ments s1 and s2 . Given target discourse segments S1 , S2 , we prove CoRel(S1 , S2 )
using the axioms described above and lexical knowledge which is described in
the next section.
    We formally describe our meaning representation. First, we use cause(ea , fa , ec , fc )
to represent that event ea with factuality fa causes event ec with factuality fc .
Second, we represent an event by using event(e, t, f, x1 , x2 , x3 , s), where e is the
event variable, t is the event type of e, f is the factuality of event e, x1 , x2 , x3
are arguments of event and s is the segment which event e belongs to. Factuality
of event e can take one of the following four values: F (Fact; e occurred), N F
(NonFact; e did not occur), EF (Expected-Fact; e is expected to occur), and
EN F (Expected-NonFact; e is expected not to occur). In addition, the value
(sentiment polarity) of event e is represented as value(e, v, s). v is either Pos
(Positive) or Neg (Negative).

3.2     Example-driven investigation of lexical knowledge
Next, we manually analyzed a small number of samples for each discourse relation
to investigate what types of knowledge are required to explain those samples and
how to axiomatize them. In this paper, we manually convert each sample text
into the logical forms, extracting main verbs in its matrix clauses as predicates.2
2
    In future work, we will exploit the oﬀ-the-shelf semantic parser (e.g. Boxer [13]) to
    automatically get the logical forms. Using automatic semantic parsers brings some
    challenges to us, e.g. how to represent the verbs in embedded clauses. We do not
    address these issues in this work, because we want to focus on the investigation of
    types of world knowledge that are required to identify discourse relations.
6        Jun Sugiura, Naoya Inoue, Kentaro Inui


                cause
        (X8, F, _7~_0, ENF)
              $0.00/4

                                                                         _6=X8, _7=_0

                                    event                       event                       cause                       event
                          (_7, Use-vb~_10, ENF, S1)   (X8~_6, Close-vb~_8, F, S2)     (X8~_6, F, _7, ENF)     (X6~_9, Use-vb~_10, F, S1)
                                   $0.36/14                    $0.36/12                   $0.36/11                    $0.36/13

                inhibit      _7=_0, _10=Use-vb, S1=_1         ADVcause9-2       ADVcause9-2 ADVcause9-2      ADVcause9-2

                                 event
                      (_7~_0, Use-vb, ENF, S1~_1)                                             ^
                                $1.20/3

                                      inhibit                          _6=X8, _8=Close-vb                                _9=X6, _10=Use-vb

                                                                                        Adv
                                      ^                                               (S1, S2)
                                                                                      $1.20/5

                                                                                          ?


                                                                     event             CoRel             event
                                                              (X8, Close-vb, F, S2)   (S1, S2)     (X6, Use-vb, F, S1)
                                                                    $1.00/0           $1.00/2           $1.00/1


Fig. 1. Example of the abductive proof automatically produced by our system. The
black directed edges: backward-chainings. The red undirected edges: unification. The
labels of undirected red edges: unifier. The terms starting with a capital letter: constant;
otherwise, variable. “X ∼ Y ”: Y is unified with X. The grayed nodes: explained literals.
The red nodes: hypothesized literals.


   The dataset consists of text which we collect from the Web.3 This website
provides English texts for adult English learners as a second language. We collect
16 discourse segment pairs in which half of them can be regarded as Adversative
and the others can be regarded as Resultative.
   Let us take one of the simplest samples, example (2).

(2) S1: A lot of traﬃc once used Folsom Dam Road.
    S2: Right now, the road is closed.
    (Topic=Working, StoryID=174)

In this example, S1 and S2 are in the Adversative relation. While the Folsom Dam
Road was once used by a lot of traﬃc, it is not usable now because it is closed.
Something was once used but it is now unusable; therefore, the Adversative
relation holds. This can be described as the following axiom:

Condition of Adversative

      event(e1 , type1, F, x1 , x2 , x3 , s1 ) ∧ event(e2 , type2, F, y1 , y2 , y3 , s2 )
      ∧ cause(e2 , F, eu , EN F ) ∧ event(eu , type1, EN F, x1 , x2 , x3 , s1 ) ⇒ Adversative(s1 , s2 )

The causality relation between the “closed” event and the “unusable” event can
be described as:
3
    http://www.cdlponline.org/
                          Large-scale Abductive Discourse Relation Recognition                   7

Relation between events
              event(e1 , U se, EN F, x1 , x2 , x3 , s1 ) ∧ cause(e2 , F, e1 , EN F )
                                            ⇒ event(e2 , Close, F, y1 , x2 , y3 , s2 )
Note that we have cause(e2 , F, e1 , N F ) in the left-hand side of the axiom. We
use this literal to accumulate the type of reasoning. In an abductive proof, we
expect this literal to unify with an elemental hypothesis generated by the axioms
of discourse relation.
    Fig. 1 shows the result of applying the proposed model to example (2).4 In
Fig. 1, the observations consists of three literals; occurrence of event whose type
is Use in segment S1 , occurrence of event whose type is Close in segment S2 , and
CoRel(S1 , S2 ) which is a symbol of existence of some discourse relation between
the segments.
    To see how our model combines multiple pieces of knowledge, let us take
another example.
(3) S1: Right now, the road is closed.
    S2: Most of the people who used the road every day are angry.
    (Topic=Working, StoryID=174)
The “closed” event causes the “unusable” event and the “unusable” event than
further causes the “angry” event, which can be explained by combining the
knowledge that being “unusable” is negative in sentiment polarity and the knowl-
edge that a negative event may cause someone to be angry. These pieces of
knowledge can be axiomatized as follows. The proof graph is shown in Fig. 2.
Condition of Resultative
       event(e1 , type1, F, x1 , x2 , x3 , s1 ) ∧ event(e2 , type2, F, y1 , y2 , y3 , s2 )
       ∧ cause(e1 , F, eu , EF ) ∧ event(eu , type2, EF, y1 , y2 , y3 , s2 ) ⇒ Resultative(s1 , s2 )
Relation between events
         event(e1 , U se, EN F, x1 , x2 , x3 , s1 ) ∧ cause(e2 , F, e1 , EN F )
                                                    ⇒ event(e2 , Close, F, y1 , x2 , y3 , s2 )
        event(e1 , Angry, EF, x1 , x2 , x3 , s1 ) ∧ cause(e2 , f, e1 , EF )
                      ⇒ value(e2 , N eg, s2 ) ∧ event(e2 , type, f, y1 , y2 , y3 , s2 )
Transitivity
          cause(e1 , f1 , e2 , f2 ) ∧ cause(e2 , f2 , e3 , f3 ) ⇒ cause(e1 , f1 , e3 , f3 )
Polarity
               value(e1 , N eg, s1 ) ⇒ event(e1 , U se, EN F, x1 , x2 , x3 , s1 )
   Through the investigation as illustrated above, we reached the conclusion
that the axioms in Table 1 can recognize discourse relations for most examples.
4
    Throughout this paper, we omit the arguments of events from our representation
    in a proof graph for readability. Since we do not have a lexical knowledge between
    nouns and verbs, this simplification does not aﬀect to the result of inference.
8            Jun Sugiura, Naoya Inoue, Kentaro Inui


                                                                                                                                                             event
                                                                                                                                                  (_18, Angry-adj~_21, EF, S2)
                                                                                                                                                           $0.36/23

                                                                               _26=_18, _21=Angry-adj, S2=_27

                 event                             cause                              cause
    (_18~_26, Angry-adj, EF, S2~_27)      (_0, ENF, _18~_26, EF)                 (X1, F, _0, ENF)
               $3.17/26                          $0.00/27                            $0.00/4

                    cause   cause      cause    cause     _0=_30, _31=ENF, _26=_18                    X1=_17, _0=_30, _31=ENF

                                                                  cause                                    cause
                            ^                   ^       (_0~_30, ENF~_31, _18, EF)             (X1~_17, F, _0~_30, ENF~_31)
                                                                 $0.22/46                                $0.22/45

                                                                                              Transitive       Transitive                                       REScause6-2

                                    value
                                (_0, Neg, _1)                                             inhibit             ^
                                  $1.44/7

                                                    polar

                                                                       event                              event                            cause                                  event
                                                               (_0, Use-vb, ENF, _1)           (X1~_17, Close-vb~_19, F, S1)         (X1~_17, F, _18, EF)             (E16~_20, Angry-adj~_21, F, S2)
                                                                      $1.20/3                            $0.36/21                         $0.36/20                               $0.36/22

                                                                                inhibit                                        REScause6-2 REScause6-2                REScause6-2


                                                                                          ^                                                   ^


                                                                                                               X1=_17, _19=Close-vb                                           _20=E16, _21=Angry-adj

                                                                                                                               Res
                                                                                                                             (S1, S2)
                                                                                                                             $1.20/6

                                                                                                                                 ?


                                                                                                            event             CoRel                 event
                                                                                                     (X1, Close-vb, F, S1)   (S1, S2)       (E16, Angry-adj, F, S2)
                                                                                                           $1.00/0           $1.00/2               $1.00/1


Fig. 2. Example of the abductive proof automatically produced by our system. See the
description of Fig. 1.


4      Pilot large-scale evaluation


As mentioned in the previous section, our model assumes that axioms encoding
lexical knowledge are automatically extracted from a large lexical resources (see
  To be automatically acquired axioms in Sec. 3.2.) In this section, we extract the
axioms of causal relations and synonym/hyperonym relations from WordNet [14]
and FrameNet [15], both popular and large lexical resources, and then apply our
model to the example texts presented in Sec. 3. Regarding sentiment polarity,
we plan to extract the axioms from a large-scale sentiment polarity lexicon such
as [16] in future work.
    We clarify that our primary focus here is the feasibility of our ATP-based
discourse relation recognition model with a large knowledge base. The quan-
titative evaluation of our model (e.g. the predictive accuracy of discourse rela-
tions) is future work. Therefore, we first report how to incorporate WordNet and
FrameNet axioms into our knowledge base (Sec. 4.1) and then preliminarily re-
port the computational time of inference required to solve the example problems,
showing some interesting output (Sec. 4.2).
                         Large-scale Abductive Discourse Relation Recognition              9


                              Table 1. Set of axiom type.

      Scale    Type of knowledge Examples of axiom
                                  event(e1 , type1, F, x1 , x2 , x3 s1 )
                                  ∧ event(e2 , type2, F, y1 , y2 , y3 , s2 )
                                  ∧ cause(e2 , F, eu , EN F )
                                  ∧ event(eu , type1, EN F, x1 , x2 , x3 , s1 )
                                  ⇒ Adversative(s1 , s2 );
               Definitions of     value(e1 , P os, s1 ) ∧ value(e2 , P os, s2 )
  Small-scale discourse relations ⇒ Resultative(s1 , s2 )
  (Manually                       cause(e1 , f1 , e2 , f2 ) ∧ cause(e2 , f2 , e3 , f3 )
    written)   Transitivity       ⇒ cause(e1 , f1 , e3 , f3 )
                                  event(e1 , U se, EN F, x1 , x2 , x3 , s1 ) ∧ cause(e2 , F, e1 , EN F )
                                  ⇒ event(e2 , Close, F, y1 , x2 , y3 , s2 );
                                  event(e1 , Angry, EF, x1 , x2 , x3 , s1 ) ∧ cause(e2 , f, e1 , EF )
  Large-scale Causal relations    ⇒ value(e2 , N eg, s2 ) ∧ event(e2 , type, f, y1 , y2 , y3 , s2 )
(Automatically                    event(e1 , Attack, F, x1 , x2 , x3 , s1 )
   acquired)   Synonym/Hyponym ⇒ event(e1 , Destroy, F, x1 , x2 , x3 , s1 )
                                  value(e1 , N eg, s1 ) ⇒ event(e1 , Die, F, x1 , x2 , x3 , s1 );
               Sentiment polarity value(e1 , P os, s1 ) ⇒ event(e1 , Die, N F, x1 , x2 , x3 , s1 )


4.1    Automatic axiom extraction from linguistic resources

We summarize the axioms extracted from WordNet and FrameNet in Table 2.
For each resource, we extract two kinds of axioms. First, we generate axioms that
map a word to the corresponding WordNet synset or FrameNet frame (Word-
Synset, or Word-Frame types). The example WordNet axiom in Table 2 enables
us to hypothesize that a “die”-typed event can be mapped to WordNet synset
200358431. Second, we also encode a semantic relation between synsets or frames.
For example, the causal relation between Getting frame and Giving frame is
encoded as the axiom in the Table. 5


4.2    Results and discussion

We have tested our large-scale discourse relation recognition model on the ran-
domly selected 7 texts as those presented in Sec. 3. We restricted the maximum
5
    Note that the mapping axioms have bi-directional implications. By using the bi-
    directional axioms, we can combine the knowledge from FrameNet and WordNet to
    perform a robust inference. For instance, we can do an inference like: pass away →
    synsetA → die → FNDeath if we do not have a direct mapping from pass away to
    FNDeath. Since the framework is declarative, we do not have to specify when and
    where to use a particular type of knowledge, which results in a robust reasoning.
10        Jun Sugiura, Naoya Inoue, Kentaro Inui

        Table 2. Axioms automatically extracted from WordNet and FrameNet.


Type               Resources  Example                                                # axioms
Word-Synset        WordNet    event(e, Die, f, x1 , x2 , x3 , s)                      169,362
                              ⇔ event(e, WNSynset200358431, f, x1 , x2 , x3 , s)
Word-Frame       FrameNet     event(e, Shoot, f, x1 , x2 , x3 , s)                      10,358
                              ⇔ event(e, FNUseFireArm, f, x1 , x2 , x3 , s)
                              event(e1 , WNSynset20036712, EF, x1 , x2 , x3 , s1 )
                 WordNet      ∧ cause(e1 , EF, e2 , F )
                 relationsA∗1 ⇒ event(e2 , WNSynset200358431, F, y1 , y2 , y3 , s2 )    35,440
Causal relations
                              event(e1 , FNGiving, EF, x1 , x2 , x3 , s1 )
                 FrameNet     ∧ cause(e1 , EF, e2 , F )
                 relations∗2 ⇒ event(e2 , FNGetting, F, y1 , y2 , y3 , s2 )              6,584
Synonym/         WordNet      event(e, WNSynset200060063, f, x1 , x2 , x3 , s)
Hyponym          relationsB∗3 ⇒ event(e, WNSynset200358431, f, x1 , x2 , x3 , s)      177,837

*1: Causality, Entailment, Antonym. *2: IsCausativeOf, InheritsFrom, PerspectiveOn,
         Precedes, SeeAlso, SubFrameOf, Uses. *3: Meronym, Hyperonym.

number of backward-chaining steps to 2 due to the computational feasibility.
For each problem, on average, the number of potential elemental hypotheses
was 13,034 and the (typed) number of axioms that were used to generate can-
didate abductive proofs was 142. The time of inference required to solve each
problem was 7.00 seconds on average.
   Now let us show one of the proof graphs automatically produced by our sys-
tem.6 In Figure 3, we show the abductive proof graph for the following discourse:

S1 : Only 56 people died from the explosion,
S2 : but many other problems have been caused because of it.
     (Topic=Activity, StoryID=241)

Although we suﬀer from the insuﬃciency of lexical knowledge, the abductive
engine gave us the best proof where two segments are tied with a resultative
relation. In the proof graph, Die and CauseProblem events are used to prove
“event” literals hypothesized by the axiom of discourse relation. Note that the
causal relation between these events is not proven but assumed with $0.36.
    The overall results indicate that we now have a good environment to develop
ATP-based discourse processing.


5      Conclusions

We have explored an abductive theorem proving (ATP)-based approach for im-
plicit discourse relation recognition. We have investigated the type of axioms re-
quired for an ATP-based discourse relation recognition and identified five types
of axioms. Our result is based on real-life Web texts, and no previous work has
6
     For the simplicity, we used only one axiom for the axiom of discourse relation.
                                                                                                                                                                             Large-scale Abductive Discourse Relation Recognition                                                                                                                                              11
                                                                                                                                                                                                                                                                                                                                                                                        cause                         cause
                                                                                                                                                                                                                                                                                                                                                                                  (X0~_6, F, _12, _13)        (_12, _13, X5~_7, F)

                                                                                                                                                                                                                                                                   best proof	
                                                                                                        $0.22/52                     $0.22/53

                                                                                                                                                                                                                                                                                                                                                                                                    Transitive    Transitive

                                                                                                                                                                                                                                                                             event                     Uses                 Precedes                  Is_Causative_of             See_also                           Inherits_from
                                                                                                                                                                                                                                                                   (_10, Die-vb~_8, EF, _11)    (X0~_6, F, X5~_7, F)   (X0~_6, F, X5~_7, F)        (X0~_6, F, X5~_7, F)      (X0~_6, F, X5~_7, F)         ^       (X0~_6, F, X5~_7, F)
                                                                                                                                                                                                                                                                           $0.36/51                  $0.43/55               $0.43/56                     $0.43/57                 $0.43/58                             $0.43/54

                                                                                                                                                                                                                                                 _0=_10, _8=Synset201323958, _1=_11                                          ?          ?               ?            ?                                ?

                                                                                                                                                                                                 event                         cause                              event                        cause                                    cause                             event
                                                                                                                                                                                     (_0, Synset201323958, EF, _1)       (CAG_0, X0, F, _0, F)       (_4, Synset200360932, ENF, _5)     (CAG_2, X0, F, _4, NF)    REScause       (X0~_6, F, X5~_7, F)       (X5~_7, Cause_problem-vb~_9, F, S2)
                                                                                                                                                                                                $1.22/23                      $0.00/24                          $1.22/35                     $0.00/36                                 $0.36/48                           $0.36/50

t200359806    _6=X0, _8=Synset200360501      _6=X0, _8=Pop-o-vb     _6=X0, _8=Snu-it-vb      _6=X0, _8=Decease-vb    _6=X0, _8=Drop-dead-vb          _6=X0, _8=Die-vb                                                 wncsrev wncsrev                             wnant             wnant         REScause                     REScause          REScause

                   event                      event                    event                     event                      event                        event
S1)    (X0, Synset200360501, F, S1)   (X0, Pop-o-vb, F, S1)   (X0, Snu-it-vb, F, S1)   (X0, Decease-vb, F, S1)   (X0, Drop-dead-vb, F, S1)       (X0, Die-vb, F, S1)      _6=X0, _8=Die-vb                                ^                                      ^                                             ^
                 $1.22/30                   $1.22/21                 $1.22/22                   $1.22/12                   $1.22/14                     $1.22/13

                    wnrev                        wnrev                     wnrev                     wnrev                         wnrev                                              _6=X0, _8=Die-vb

                                                                                                                                                            event
                                                                                                                                                      (X0, Die-vb, F, S1)                                        _6=X0, _8=Die-vb
                                                                                                                                                           $2.39/38

                                                                                                                                                                                                                                       _6=X0, _8=Synset202109818

                                                                                                                                                     event
                                                                                                                                               (X0, Die-vb, F, S1)                          _6=X0, _8=Die-vb                                                                                                                                                             _7=X5, _9=Cause_problem-vb
                                                                                                                                                    $2.39/46

                                                                                                                                   wnrev

                                                                                                           event                                                     event
                                                                                               (X0, Synset201784953, F, S1)      wn                            (X0, Die-vb, F, S1)
                                                                                                          $1.99/6                                                   $2.39/47

                                                                                                                                                                                              wnrev

                                                                                                                                                                                                          event                     Res            Adv
                                                                                                                         wn                                                                   (X0, Synset202109818, F, S1)        (S1, S2)       (S1, S2)
                                                                                                                                                                                                         $1.99/7                  $1.20/9        $1.20/8

                                                                                                                                                                                                        wn                   ?         ?


                                                                                                                                                                                            event             CoRel                   event                     only-rb
                                                                                                                                                                                      (X0, Die-vb, F, S1)    (S1, S2)      (X5, Cause_problem-vb, F, S2)         (X0)
                                                                                                                                                                                           $1.00/2           $1.00/3                 $1.00/1                    $1.00/0


                                                                                                  Fig. 3. Abductive proof with potential elemental hypotheses. The grayed-out nodes are
                                                                                                  those which are potentially included in the best proof, but not actually included in the
                                                                                                  best proof. Similarly, the dotted edges are potential explainer-explainee relationships
                                                                                                  between elemental hypotheses.


                                                                                                  done an investigation in the same setting. Also, we have preliminarily evaluated
                                                                                                  our model with a large knowledge base. We have automatically constructed ax-
                                                                                                  ioms of lexical knowledge from WordNet and FrameNet, which results in around
                                                                                                  four hundred thousand inference rules. Our experiments showed the great po-
                                                                                                  tential of our ATP-based model and that we are ready to develop ATP-based
                                                                                                  discourse processing in a real-life setting.
                                                                                                     Our future work includes three directions. First, we will create a larger knowl-
                                                                                                  edge base, exploiting the linguistic resources which have recently become avail-
                                                                                                  able. As a first step, we plan to axiomatize Narrative Chain [17], ConceptNet,7
                                                                                                  and Semantic Orientations of Words [16] to extend our axioms with the large
                                                                                                  knowledge resources. Second, we plan on applying the technique of automatic
                                                                                                  parameter tuning for weighted abduction [18] to our model. Third, we plan to
                                                                                                  create a dataset for abductive discourse processing, where we annotate simple
                                                                                                  English texts with some discourse phenomena including discourse relations and
                                                                                                  coreference etc. As the source text, we will use materials for ESL (English as
                                                                                                  Second Language) learners,8 a set of syntactically and lexically simple texts, so
                                                                                                  that we can trace the detailed behavior of abductive reasoning process. About
                                                                                                  the semantic representation, we plan to use the Boxer semantic parser [13] to
                                                                                                  automatically get the event arguments.

                                                                                                    7
                                                                                                          http://conceptnet5.media.mit.edu/
                                                                                                    8
                                                                                                          http://www.cdlponline.org/
12      Jun Sugiura, Naoya Inoue, Kentaro Inui

Acknowledgments. This work was supported by JSPS KAKENHI Grant
Number 23240018.


References
 1. Lin, Z., Kan, M., Ng, H.: Recognizing implicit discourse relations in the penn
    discourse treebank. In: Proceedings of the 2009 Conference on Empirical Methods
    in Natural Language Processing. (2009) 343–351
 2. Pitler, E., Nenkova, A.: Using syntax to disambiguate explicit discourse connectives
    in text. In: Proceedings of the ACL-IJCNLP 2009. (2009) 13–16
 3. Charniak, E., Goldman, R.: Probabilistic abduction for plan recognition. Brown
    University, Department of Computer Science (1991)
 4. Hobbs, J., Stickel, M., Appelt, D., Martin, P.: Interpretation as Abduction. Arti-
    ficial Intelligence 63(1-2) (1993) 69–142
 5. Marcu, D., Echihabi, A.: An unsupervised approach to recognizing discourse rela-
    tions. In: Proceedings of the 40th Annual Meeting on Association for Computa-
    tional Linguistics. (2002) 368–375
 6. Subba, R., Eugenio, B.D.: An eﬀective discourse parser that uses rich linguistic
    information. In: NAACL. (2009) 566–574
 7. Lin, Z., Ng, H., Kan, M.: A PDTB-Styled End-to-End Discourse Parser. Arxiv
    preprint arXiv:1011.0835 (2010)
 8. Inoue, N., Inui, K.: Large-scale cost-based abduction in full-fledged first-order
    predicate logic with cutting plane inference. In: Proceedings of the 13th European
    Conference on Logics in Artificial Intelligence. (2012) 281–293
 9. Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber,
    B.: The Penn Discourse TreeBank 2.0. In: Proceedings of the 6th International
    Conference on Language Resources and Evaluation (LREC). (2008) 2961–2968
10. Carlson, Lynn and Marcu, Daniel and Okurowski, Mary Ellen: RST Discourse Tree-
    bank, LDC2002T07. Number LDC2002T07, Linguistic Data Consortium (2002)
11. Wolf, F., Gibson, E., Fisher, A., Knight, M.: The Discourse GraphBank: A
    database of texts annotated with coherence relations. LDC (2005)
12. Prasad, R., Miltsakaki, E., Dinesh, N., Lee, A., Joshi, A., Robaldo, L., Webber,
    B.: The Penn Discourse Treebank 2.0 Annotation Manual. IRCS Technical Report
    (2007)
13. Bos, J.: Wide-Coverage Semantic Analysis with Boxer. In Bos, J., Delmonte, R.,
    eds.: Proceedings of STEP. Research in Computational Semantics, College Publi-
    cations (2008) 277–286
14. Fellbaum, C., ed.: WordNet: an electronic lexical database. MIT Press (1998)
15. Ruppenhofer, J., Ellsworth, M., Petruck, M., Johnson, C., Scheﬀczyk, J.: FrameNet
    II: Extended Theory and Practice. Technical report, Berkeley, USA (2010)
16. Takamura, H., Inui, T., Okumura, M.: Extracting semantic orientations of words
    using spin model. In: ACL. (2005) 133–140
17. Chambers, N., Jurafsky, D.: Unsupervised learning of narrative schemas and their
    participants. In: Proceedings of the Joint Conference of the 47th Annual Meet-
    ing of the ACL and the 4th International Joint Conference on Natural Language
    Processing of the AFNLP. (2009) 602–610
18. Yamamoto, K., Inoue, N., Watanabe, Y., Okazaki, N., Inui, K.: Discriminative
    learning of first-order weighted abduction from partial discourse explanations. In:
    CICLing (1). (2013) 545–558