<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Recognizing Implicit Discourse Relations through Abductive Reasoning with Large-scale Lexical Knowledge</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jun Sugiura</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Naoya Inoue</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kentaro Inui</string-name>
          <email>inuig@ecei.tohoku.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tohoku University</institution>
          ,
          <addr-line>6-3-09 Aoba, Aramaki, Aoba-ku, Sendai, 980-8579</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Discourse relation recognition is the task of identifying the semantic relationships between textual units. Conventional approaches to discourse relation recognition exploit surface information and syntactic information as machine learning features. However, the performance of these models is severely limited for implicit discourse relation recognition. In this paper, we propose an abductive theorem proving (ATP) approach for implicit discourse relation recognition. The contribution of this paper is that we give a detailed discussion of an ATP-based discourse relation recognition model with open-domain web texts.</p>
      </abstract>
      <kwd-group>
        <kwd>Discourse Relation</kwd>
        <kwd>Abductive Reasoning</kwd>
        <kwd>Lexical Knowledge</kwd>
        <kwd>Association Information</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Discourse relation recognition is the task of identifying the semantic
relationship between textual units. For example, given the sentence John pushed Paul
towards a hole.(S1) Paul didn't get hurt.(S2), we identify a contrast relationship
between textual units (S1) and (S2). Discourse relation recognition is useful for
many NLP tasks such as summarization, question answering, and coreference
resolution.</p>
      <p>The traditional studies on discourse relation recognition divided discourse
relations into two distinct types according to the presence of discourse connectives
between textual units: (i) an explicit discourse relation, or discourse relation with
discourse connectives (e.g. John hit Tom because he is angry.). (ii) an implicit
discourse relation, or discourse relation without discourse connectives (e.g. John
hit Tom. He got angry.). Identifying an implicit discourse relation is much more
difficult than identifying an explicit discourse relation. In this paper, we focus
on the task of implicit discourse relation recognition.</p>
      <p>Conventional approaches to implicit discourse relation recognition exploit
surface information (e.g. bag of words) and syntactic information (e.g. syntactic
dependencies between words) to identify discourse relations [1, 2, etc.]. However,
the performance of these models is severely limited as mentioned in Sec. 2.2. We
believe that the problem of these approaches is two fold: (i) they do not
capture causality between the events mentioned in each textual units, and (ii) they
do not capture the factuality of these events. We believe that this information
plays a key role for implicit discourse relation recognition. Suppose we want to
recognize a contrast relation between S1 and S2 in the rst example. To
recognize the discourse relation, we need to at least know the following information:
(i) commonsense knowledge: pushing into a hole usually causes getting hurt ;
(ii) factuality: Paul did not get hurt. Finally, combining (i) and (ii), we need
to recognize the unusualness of discourse: something against our commonsense
knowledge happened in S2. As described in Sec. 3, our investigation revealed
that we have several patterns of reasoning and need to combine several kinds of
reasoning to identify a discourse relation.</p>
      <p>Motivated by this observation, we propose an abductive theorem proving
(ATP) approach for implicit discourse relation recognition. The key advantage of
using ATP is that the declarative nature of ATP abstracts the ow of information
away in a modeling phase: we do not have to explicitly specify when and where
to use a particular reasoning. Once we give several primitive inference rules to
an ATP system, the system automatically returns the best answer to a problem,
combining the inference rules.</p>
      <p>In this paper, we attempt to answer the following open issues of ATP-based
discourse relation recognition: (i) does it really work on real-life texts?; (ii) does
it work with a large knowledge base which is not customized for solving target
texts? The contribution of this paper is as follows. We give a detailed discussion
of an ATP-based discourse relation recognition model with open-domain web
texts. In addition, we show that our ATP-based model is computationally feasible
with a large knowledge base.</p>
      <p>The structure of this paper is as follows. In Sec. 2, we describe abduction and
give an overview of previous efforts on discourse relation recognition. In Sec. 3,
we describe our ATP-based discourse relation recognition model. In Sec. 4, we
report the results of pilot evaluation of our model with large lexical knowledge.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <sec id="sec-2-1">
        <title>Weighted abduction</title>
        <p>Abduction is inference to the best explanation. Formally, logical abduction is
de ned as follows:
{ Given: Background knowledge B, and observations O, where both B and</p>
        <p>O are sets of rst-order logical formulas.
{ Find: A hypothesis (or explanation, abductive proof ) H such that H [ B j=
O; H [ B 6j=?, where H is a conjunction of literals. We say that p is
hypothesized if H [ B j= p, and that p is explained if (9q) q ! p 2 B and
H [ B j= q.</p>
        <p>Typically, several hypotheses H explaining O exist. We call each of them a
candidate hypothesis and each literal in a hypothesis an elemental hypothesis.
The goal of abduction is to nd the best hypothesis among candidate hypotheses
by a speci c measure. In the literature, several kinds of evaluation measure have
been proposed, including cost-based and probability-based [3, 4, etc.].</p>
        <p>
          In this paper, we adopt the evaluation measure of weighted abduction, which
is proposed by Hobbs et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In principle, the evaluation measure gives a
penalty for assuming speci c and unreliable information but rewards for inferring
the same information from different observations. We summarize the primary
feature of this measure as follows (see [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for more detail):
{ Each elemental hypothesis has a positive real-valued cost;
{ The cost of each candidate hypothesis is de ned as the sum of costs of the
elemental hypotheses;
{ The best hypothesis is de ned as the minimum-cost candidate hypothesis;
{ If an elemental hypothesis is explained by other elemental hypothesis, the
cost becomes zero.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Related work</title>
        <p>
          Discourse relation recognition is a prominent research area in NLP. Most
researchers have primarily focused on explicit discourse relation recognition,
employing statistical machine learning-based models [5, 6, etc.] with super cial and
syntactic information. The performance of explicit discourse relation recognition
is comparatively high; for instance, Lin et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] achieved an 80.6% F-score.
        </p>
        <p>
          The performance of implicit discourse relation recognition is, however,
relatively low (25.5% F-score). Most existing work on implicit discourse relation
recognition [1, 2, etc.] extend the feature set of [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] with richer lexico-syntactic
information. For example, Pitler et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] exploit a syntactic parse tree and
sentiment polarity information of words contained in textual units to generate a
feature set. However, the performance is not as high as a practical level.
        </p>
        <p>
          An abductive discourse relation recognition model is originally presented in
Hobbs et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. However, Hobbs et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] reported the results in a fairly closed
setting: they tested their model on two test texts with manually encoded
background knowledge which is required to solve the discourse relation recognition
problems that appear in two texts. Therefore, it is an open question whether the
abductive discourse relation recognition model works in an open setting where
the wider range of real-life texts and large knowledge base are considered.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Abductive theorem proving for discourse relation recognition</title>
      <p>In this section, we describe our discourse relation recognition model. We
employ ATP to recognize a discourse relation. Given target discourse segments,
we abductively prove that there exists a coherence relation (i.e. some discourse
relation) between the discourse segments using background knowledge. We
axiomatize (i) de nition of discourse relations and (ii) lexical knowledge (e.g. causal
knowledge of events) in the background knowledge, which serve as a proof of the
existence of a coherence relation.</p>
      <p>The motivation of using abductive theorem proving is that we can assume a
proposition with the cost even if we fail to nd a complete proof of a coherence
relation between discourse segments, as mentioned in Sec. 2. By choosing the
minimum-cost abductive proof, we can identify the most likely discourse relation.</p>
      <p>
        We rst show how to axiomatize the de nition of discourse relations (Sec.
3.1). We then conduct an example-driven investigation of lexical knowledge
which is required to solve a few real-life discourse relation recognition problems
in order to identify a type of lexical knowledge needed for an ATP-based
recognition model (Sec. 3.2). In Sec. 3.2, we make sure that our developed theory works
on a general-purpose inference engine as we expected. We use the lifted rst-order
abductive inference engine Henry, which is developed by one of the authors.[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
To perform deeper analysis of the inference results, we also improved the existing
visualization module provided by Henry. The inference engine and visualization
tool are publicly available at https://github.com/naoya-i/henry-n700/.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Axiomatizing de nitions of discourse relations</title>
        <p>
          We follow the de nitions of discourse relations provided by Penn Discourse
TreeBank (PDTB) 2.0 [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], a widely used and large-scale corpus annotated with
discourse relations.1 The PDTB de nes four coarse-grained discourse relations, but
it is still rather difficult to identify all discourse relations. Therefore, we adopt
two-way classi cation: whether it is adversative (Comparison in PDTB) or
resultative (Temporal, Contingency, Expansion in PDTB). Because a resultative
relation can be regarded as relations other than an adversative relation, we rst
axiomatize the de nition of adversative and then consider the other relation.
        </p>
        <p>
          According to the PDTB Annotation Manual [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], an adversative relation
consists of two subtypes: Concession and Contrast. These subtypes are de ned
below, respectively.
        </p>
        <p>Concession One of the arguments describes a situation A which causes C, while
the other asserts (or implies) C. One argument denotes a fact that triggers
a set of potential consequences, while the other denies one or more of them.
Contrast Arg1 and Arg2 share a predicate or a property and the difference is
highlighted with respect to the values assigned to this property.</p>
        <p>The condition of Concession can be described as the following axiom:
event(e1; type1; F; x1; x2; x3; s1) ^ event(e2; type2; N F; y1; y2; y3; s2)
^ cause(e1; F; e2; F ) ^ event(eu; type2; EF; y1; y2; y3; su) )
This axiom says that if event e1 occurs in segment s1 (roughly corresponding to
Arg1 in PDTB) and that event is expected to cause an event of type2 while such
an event of type2 does actually not occur in segment s2 (roughly corresponding
to Arg2 in PDTB), then the discourse relation between s1 and s2 is
Adversative. The examples using this type of axiom to recognize discourse relations will
1 For other corpora, see [10, 9, 11, etc.].
be mentioned later. On the other hand, a typical pattern where the Contrast
relation holds can be described, for example, as follows:
value(e1; P os; s1) ^ value(e2; N eg; s2) )
This axiom says that when the sentiment polarity of e1 in segment s1 is Positive
and the sentiment polarity of e2 in segment s2 is Negative, the discourse relation
between s1 and s2 is Adversative. The examples of axioms described here
represent formation conditions of Adversative and take some variation due to their
value of factuality or sentiment.</p>
        <p>Furthermore, these axioms can represent conditions of Resultative. For
instance, if the sentiment polarity of e2 in segment s2 is the same as that of segment
s1, then the discourse relation between s1 and s2 is Resultative as below:
value(e1; P os; s1) ^ value(e2; P os; s2) )
Resultative(s1; s2):
In total, we created 21 axioms for the de nition of discourse relations.</p>
        <p>Finally, we add the following axioms to connect the de nition of discourse
relations with the existence of a coherence relation between discourse segments:
Adversative(s1; s2) ) CoRel(s1; s2)
Resultative(s1; s2) ) CoRel(s1; s2);
(1)
(2)
where CoRel(s1; s2) indicates that there exists a coherence relation between
segments s1 and s2. Given target discourse segments S1; S2, we prove CoRel(S1; S2)
using the axioms described above and lexical knowledge which is described in
the next section.</p>
        <p>We formally describe our meaning representation. First, we use cause(ea; fa; ec; fc)
to represent that event ea with factuality fa causes event ec with factuality fc.
Second, we represent an event by using event(e; t; f; x1; x2; x3; s), where e is the
event variable, t is the event type of e, f is the factuality of event e, x1; x2; x3
are arguments of event and s is the segment which event e belongs to. Factuality
of event e can take one of the following four values: F (Fact; e occurred), N F
(NonFact; e did not occur), EF (Expected-Fact; e is expected to occur), and
EN F (Expected-NonFact; e is expected not to occur). In addition, the value
(sentiment polarity) of event e is represented as value(e; v; s). v is either Pos
(Positive) or Neg (Negative).
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Example-driven investigation of lexical knowledge</title>
        <p>
          Next, we manually analyzed a small number of samples for each discourse relation
to investigate what types of knowledge are required to explain those samples and
how to axiomatize them. In this paper, we manually convert each sample text
into the logical forms, extracting main verbs in its matrix clauses as predicates.2
2 In future work, we will exploit the off-the-shelf semantic parser (e.g. Boxer [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]) to
automatically get the logical forms. Using automatic semantic parsers brings some
challenges to us, e.g. how to represent the verbs in embedded clauses. We do not
address these issues in this work, because we want to focus on the investigation of
types of world knowledge that are required to identify discourse relations.
        </p>
        <p>cause
(X8, F, _7~_0, ENF)
$0.00/4</p>
        <p>event event cause event
(_7, Use-vb~_10, ENF, S1) (X8~_6, Close-vb~_8, F, S2) (X8~_$60,.3F6,/_171, ENF) (X6~_9, Use-vb~_10, F, S1)</p>
        <p>$0.36/14 $0.36/12 $0.36/13
inhibit _7=_0, _10=Use-vb, S1=_1 ADVcause9-2 ADVcause9-2 ADVcause9-2 ADVcause9-2
(_7~_0, Us$ee1-vv.2be,0nE/t3NF, S1~_1) ^
inhibit
_9=X6, _10=Use-vb
^
3 http://www.cdlponline.org/</p>
      </sec>
      <sec id="sec-3-3">
        <title>Relation between events</title>
        <p>event(e1; U se; EN F; x1; x2; x3; s1) ^ cause(e2; F; e1; EN F )</p>
        <p>) event(e2; Close; F; y1; x2; y3; s2)
Note that we have cause(e2; F; e1; N F ) in the left-hand side of the axiom. We
use this literal to accumulate the type of reasoning. In an abductive proof, we
expect this literal to unify with an elemental hypothesis generated by the axioms
of discourse relation.</p>
        <p>Fig. 1 shows the result of applying the proposed model to example (2).4 In
Fig. 1, the observations consists of three literals; occurrence of event whose type
is Use in segment S1, occurrence of event whose type is Close in segment S2, and
CoRel(S1; S2) which is a symbol of existence of some discourse relation between
the segments.</p>
        <p>To see how our model combines multiple pieces of knowledge, let us take
another example.
(3) S1: Right now, the road is closed.</p>
        <p>S2: Most of the people who used the road every day are angry.</p>
        <p>(Topic=Working, StoryID=174)
The \closed" event causes the \unusable" event and the \unusable" event than
further causes the \angry" event, which can be explained by combining the
knowledge that being \unusable" is negative in sentiment polarity and the
knowledge that a negative event may cause someone to be angry. These pieces of
knowledge can be axiomatized as follows. The proof graph is shown in Fig. 2.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Condition of Resultative</title>
        <p>event(e1; type1; F; x1; x2; x3; s1) ^ event(e2; type2; F; y1; y2; y3; s2)
^ cause(e1; F; eu; EF ) ^ event(eu; type2; EF; y1; y2; y3; s2) ) Resultative(s1; s2)</p>
      </sec>
      <sec id="sec-3-5">
        <title>Relation between events</title>
      </sec>
      <sec id="sec-3-6">
        <title>Transitivity Polarity</title>
        <p>event(e1; U se; EN F; x1; x2; x3; s1) ^ cause(e2; F; e1; EN F )
event(e1; Angry; EF; x1; x2; x3; s1) ^ cause(e2; f; e1; EF )</p>
        <p>) event(e2; Close; F; y1; x2; y3; s2)
) value(e2; N eg; s2) ^ event(e2; type; f; y1; y2; y3; s2)
cause(e1; f1; e2; f2) ^ cause(e2; f2; e3; f3) ) cause(e1; f1; e3; f3)
value(e1; N eg; s1) ) event(e1; U se; EN F; x1; x2; x3; s1)</p>
        <p>Through the investigation as illustrated above, we reached the conclusion
that the axioms in Table 1 can recognize discourse relations for most examples.
4 Throughout this paper, we omit the arguments of events from our representation
in a proof graph for readability. Since we do not have a lexical knowledge between
nouns and verbs, this simpli cation does not affect to the result of inference.</p>
        <p>_26=_18, _21=Angry-adj, S2=_27
(_18~_26, An$g3er.vy1e-7an/d2tj6,EF, S2~_27) (_0, EN$F,0c_.a01u08s/e~27_26, EF) (X1,$Fc0a,.u0_0s0e,/4ENF)
cause cause cause cause _0=_30, _31=ENF, _26=_18
^ ^ (_0~_30, $E0Nc.aF2u~2s_/e4361, _18, EF)</p>
        <p>value
(_0, Neg, _1)
$1.44/7</p>
        <p>Transitive</p>
        <p>X1=_17, _0=_30, _31=ENF</p>
        <p>cause
(X1~_17, F, _0~_30, ENF~_31)
$0.22/45</p>
        <p>Transitive
inhibit</p>
        <p>^
polar</p>
        <p>event
(_0, Use-vb, ENF, _1)
$1.20/3
inhibit
^</p>
        <p>event cause
(X1~_17, Close-vb~_19, F, S1) (X1~_17, F, _18, EF)
$0.36/21 $0.36/20</p>
        <p>REScause6-2 REScause6-2
^</p>
        <p>event
(_18, Angry-adj~_21, EF, S2)
$0.36/23</p>
        <p>REScause6-2</p>
        <p>event
(E16~_20, Angry-adj~_21, F, S2)</p>
        <p>$0.36/22
REScause6-2</p>
        <p>_20=E16, _21=Angry-adj
X1=_17, _19=Close-vb</p>
        <p>Res
(S1, S2)
$1.20/6</p>
        <p>?
event CoRel event
(X1, Close-vb, F, S1) (S1, S2) (E16, Angry-adj, F, S2)
$1.00/0 $1.00/2 $1.00/1
As mentioned in the previous section, our model assumes that axioms encoding
lexical knowledge are automatically extracted from a large lexical resources (see</p>
        <p>
          To be automatically acquired axioms in Sec. 3.2.) In this section, we extract the
axioms of causal relations and synonym/hyperonym relations from WordNet [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
and FrameNet [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], both popular and large lexical resources, and then apply our
model to the example texts presented in Sec. 3. Regarding sentiment polarity,
we plan to extract the axioms from a large-scale sentiment polarity lexicon such
as [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] in future work.
        </p>
        <p>We clarify that our primary focus here is the feasibility of our ATP-based
discourse relation recognition model with a large knowledge base. The
quantitative evaluation of our model (e.g. the predictive accuracy of discourse
relations) is future work. Therefore, we rst report how to incorporate WordNet and
FrameNet axioms into our knowledge base (Sec. 4.1) and then preliminarily
report the computational time of inference required to solve the example problems,
showing some interesting output (Sec. 4.2).</p>
        <p>Type of knowledge Examples of axiom
event(e1; type1; F; x1; x2; x3s1)
^ event(e2; type2; F; y1; y2; y3; s2)
^ cause(e2; F; eu; EN F )
^ event(eu; type1; EN F; x1; x2; x3; s1)
) Adversative(s1; s2);
De nitions of value(e1; P os; s1) ^ value(e2; P os; s2)
discourse relations ) Resultative(s1; s2)</p>
        <p>cause(e1; f1; e2; f2) ^ cause(e2; f2; e3; f3)
Transitivity ) cause(e1; f1; e3; f3)
event(e1; U se; EN F; x1; x2; x3; s1) ^ cause(e2; F; e1; EN F )
) event(e2; Close; F; y1; x2; y3; s2);
event(e1; Angry; EF; x1; x2; x3; s1) ^ cause(e2; f; e1; EF )
Causal relations ) value(e2; N eg; s2) ^ event(e2; type; f; y1; y2; y3; s2)
event(e1; Attack; F; x1; x2; x3; s1)
Synonym/Hyponym ) event(e1; Destroy; F; x1; x2; x3; s1)</p>
        <p>value(e1; N eg; s1) ) event(e1; Die; F; x1; x2; x3; s1);</p>
        <p>Sentiment polarity value(e1; P os; s1) ) event(e1; Die; N F; x1; x2; x3; s1)</p>
      </sec>
      <sec id="sec-3-7">
        <title>Automatic axiom extraction from linguistic resources</title>
        <p>We summarize the axioms extracted from WordNet and FrameNet in Table 2.
For each resource, we extract two kinds of axioms. First, we generate axioms that
map a word to the corresponding WordNet synset or FrameNet frame
(WordSynset, or Word-Frame types). The example WordNet axiom in Table 2 enables
us to hypothesize that a \die"-typed event can be mapped to WordNet synset
200358431. Second, we also encode a semantic relation between synsets or frames.
For example, the causal relation between Getting frame and Giving frame is
encoded as the axiom in the Table. 5
4.2</p>
      </sec>
      <sec id="sec-3-8">
        <title>Results and discussion</title>
        <p>We have tested our large-scale discourse relation recognition model on the
randomly selected 7 texts as those presented in Sec. 3. We restricted the maximum
5 Note that the mapping axioms have bi-directional implications. By using the
bidirectional axioms, we can combine the knowledge from FrameNet and WordNet to
perform a robust inference. For instance, we can do an inference like: pass away !
synsetA ! die ! FNDeath if we do not have a direct mapping from pass away to
FNDeath. Since the framework is declarative, we do not have to specify when and
where to use a particular type of knowledge, which results in a robust reasoning.
number of backward-chaining steps to 2 due to the computational feasibility.
For each problem, on average, the number of potential elemental hypotheses
was 13,034 and the (typed) number of axioms that were used to generate
candidate abductive proofs was 142. The time of inference required to solve each
problem was 7.00 seconds on average.</p>
        <p>Now let us show one of the proof graphs automatically produced by our
system.6 In Figure 3, we show the abductive proof graph for the following discourse:
S1: Only 56 people died from the explosion,
S2: but many other problems have been caused because of it.</p>
        <p>(Topic=Activity, StoryID=241)
Although we suffer from the insufficiency of lexical knowledge, the abductive
engine gave us the best proof where two segments are tied with a resultative
relation. In the proof graph, Die and CauseProblem events are used to prove
\event" literals hypothesized by the axiom of discourse relation. Note that the
causal relation between these events is not proven but assumed with $0:36.</p>
        <p>The overall results indicate that we now have a good environment to develop
ATP-based discourse processing.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>We have explored an abductive theorem proving (ATP)-based approach for
implicit discourse relation recognition. We have investigated the type of axioms
required for an ATP-based discourse relation recognition and identi ed ve types
of axioms. Our result is based on real-life Web texts, and no previous work has
6 For the simplicity, we used only one axiom for the axiom of discourse relation.
200359806 _6=X0, _8=Synset200360501 _6=X0, _8=Pop-o -vb _6=X0, _8=Snu -it-vb _6=X0, _8=Decease-vb _6=X0, _8=Drop-dead-vb _6=X0, _8=Die-vb
S1) (X0, Synse$t21e0.v20e23n/63t00501, F, S1) (X0, Po$p1e-.ov2e2-n/v2tb1, F, S1) (X0, Sn$u1e.v2-ei2tn-/v2tb2, F, S1) (X0, De$c1ee.av2se2en/-1tv2b, F, S1) (X0, Dro$p1-ed.v2ee2and/1t-v4b, F, S1) (X0, $D1eie.v2-ev2nb/1t,3F, S1)
wnrev wnrev wnrev wnrev wnrev
_6=X0, _8=Die-vb
_6=X0, _8=Die-vb
best</p>
      <p>proof
event
(_10, Die-vb~_8, EF, _11)</p>
      <p>$0.36/51
_0=_10, _8=Synset201323958, _1=_11</p>
      <p>event
(_4, Synset200360932, ENF, _5)
$1.22/35
wnant
^</p>
      <p>Uses
(X0~_6, F, X5~_7, F)</p>
      <p>$0.43/55
cause
(CAG_2, X0, F, _4, NF)</p>
      <p>$0.00/36
wnant REScause
(X0~_$6P0,re.F4c,e3Xd/55e6~s_7, F) (X0I~s__C$6a0,u.F4s,a3Xt/i55v7~e__o7f, F)
? ? ? ?
REScause (X0~_$60,c.Fa3,u6sX/e458~_7, F) (X5~_7, Cause$_0per.v3oe6bnl/e5tm0-vb~_9, F, S2)</p>
      <p>REScause REScause</p>
      <p>See_also
(X0~_6, F, X5~_7, F)
$0.43/58</p>
      <p>?
^
done an investigation in the same setting. Also, we have preliminarily evaluated
our model with a large knowledge base. We have automatically constructed
axioms of lexical knowledge from WordNet and FrameNet, which results in around
four hundred thousand inference rules. Our experiments showed the great
potential of our ATP-based model and that we are ready to develop ATP-based
discourse processing in a real-life setting.</p>
      <p>
        Our future work includes three directions. First, we will create a larger
knowledge base, exploiting the linguistic resources which have recently become
available. As a rst step, we plan to axiomatize Narrative Chain [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], ConceptNet,7
and Semantic Orientations of Words [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to extend our axioms with the large
knowledge resources. Second, we plan on applying the technique of automatic
parameter tuning for weighted abduction [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] to our model. Third, we plan to
create a dataset for abductive discourse processing, where we annotate simple
English texts with some discourse phenomena including discourse relations and
coreference etc. As the source text, we will use materials for ESL (English as
Second Language) learners,8 a set of syntactically and lexically simple texts, so
that we can trace the detailed behavior of abductive reasoning process. About
the semantic representation, we plan to use the Boxer semantic parser [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to
automatically get the event arguments.
http://conceptnet5.media.mit.edu/
http://www.cdlponline.org/
Acknowledgments. This work was supported by JSPS KAKENHI Grant
Number 23240018.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
          </string-name>
          , H.:
          <article-title>Recognizing implicit discourse relations in the penn discourse treebank</article-title>
          .
          <source>In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing</source>
          . (
          <year>2009</year>
          )
          <volume>343</volume>
          {
          <fpage>351</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Pitler</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nenkova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Using syntax to disambiguate explicit discourse connectives in text</article-title>
          .
          <source>In: Proceedings of the ACL-IJCNLP</source>
          <year>2009</year>
          .
          <article-title>(</article-title>
          <year>2009</year>
          )
          <volume>13</volume>
          {
          <fpage>16</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Charniak</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Probabilistic abduction for plan recognition</article-title>
          . Brown University, Department of Computer Science (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hobbs</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stickel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Appelt</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Interpretation as Abduction</article-title>
          .
          <source>Articial Intelligence</source>
          <volume>63</volume>
          (
          <issue>1-2</issue>
          ) (
          <year>1993</year>
          )
          <volume>69</volume>
          {
          <fpage>142</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Marcu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Echihabi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An unsupervised approach to recognizing discourse relations</article-title>
          .
          <source>In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics</source>
          . (
          <year>2002</year>
          )
          <volume>368</volume>
          {
          <fpage>375</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Subba</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eugenio</surname>
            ,
            <given-names>B.D.:</given-names>
          </string-name>
          <article-title>An effective discourse parser that uses rich linguistic information</article-title>
          . In: NAACL. (
          <year>2009</year>
          )
          <volume>566</volume>
          {
          <fpage>574</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <given-names>A</given-names>
            <surname>PDTB-Styled</surname>
          </string-name>
          End-to-
          <source>End Discourse Parser. Arxiv preprint arXiv:1011.0835</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Inoue</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inui</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Large-scale cost-based abduction in full- edged rst-order predicate logic with cutting plane inference</article-title>
          .
          <source>In: Proceedings of the 13th European Conference on Logics in Arti cial Intelligence</source>
          .
          <article-title>(</article-title>
          <year>2012</year>
          )
          <volume>281</volume>
          {
          <fpage>293</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Prasad</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dinesh</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miltsakaki</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robaldo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Webber</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>The Penn Discourse TreeBank 2.0</article-title>
          .
          <source>In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC)</source>
          .
          <article-title>(</article-title>
          <year>2008</year>
          )
          <volume>2961</volume>
          {
          <fpage>2968</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Carlson</surname>
          </string-name>
          , Lynn and Marcu, Daniel and Okurowski, Mary Ellen:
          <article-title>RST Discourse Treebank, LDC2002T07</article-title>
          . Number LDC2002T07,
          <article-title>Linguistic Data Consortium (</article-title>
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gibson</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Fisher</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knight</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The Discourse GraphBank: A database of texts annotated with coherence relations</article-title>
          .
          <source>LDC</source>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Prasad</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miltsakaki</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dinesh</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robaldo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Webber</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>The Penn Discourse Treebank 2.0 Annotation Manual</article-title>
          .
          <source>IRCS Technical Report</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Bos</surname>
          </string-name>
          , J.:
          <article-title>Wide-Coverage Semantic Analysis with Boxer</article-title>
          . In Bos, J.,
          <string-name>
            <surname>Delmonte</surname>
          </string-name>
          , R., eds.
          <source>: Proceedings of STEP. Research in Computational Semantics</source>
          , College Publications (
          <year>2008</year>
          )
          <volume>277</volume>
          {
          <fpage>286</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Fellbaum</surname>
          </string-name>
          , C., ed.:
          <article-title>WordNet: an electronic lexical database</article-title>
          . MIT Press (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ruppenhofer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ellsworth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petruck</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scheffczyk</surname>
          </string-name>
          , J.:
          <source>FrameNet II: Extended Theory and Practice</source>
          .
          <source>Technical report</source>
          , Berkeley, USA (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Takamura</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inui</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Okumura</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Extracting semantic orientations of words using spin model</article-title>
          .
          <source>In: ACL</source>
          . (
          <year>2005</year>
          )
          <volume>133</volume>
          {
          <fpage>140</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Chambers</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Unsupervised learning of narrative schemas and their participants</article-title>
          .
          <source>In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP</source>
          . (
          <year>2009</year>
          )
          <volume>602</volume>
          {
          <fpage>610</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Yamamoto</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inoue</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watanabe</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Okazaki</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inui</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Discriminative learning of rst-order weighted abduction from partial discourse explanations</article-title>
          .
          <source>In: CICLing (1)</source>
          . (
          <year>2013</year>
          )
          <volume>545</volume>
          {
          <fpage>558</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>