<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Paraphrase Substitution for Recognizing Textual Entailment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chris Callison-Burch</string-name>
          <email>callison-burch@ed.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>General Terms</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Experimentation</institution>
          ,
          <addr-line>Languages, Reliability</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Edinburgh</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Wauter Bosma University of Twente</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe a method for recognizing textual entailment that uses the length of the longest common subsequence (LCS) between two texts as its decision criterion. Rather than requiring strict word matching in the common subsequences, we perform a flexible match using automatically generated paraphrases. We find that the use of paraphrases over strict word matches represents an average F-measure improvement from 0.22 to 0.36 on the CLEF 2006 Answer Validation Exercise for 7 languages.</p>
      </abstract>
      <kwd-group>
        <kwd>Recognizing textual entailment</kwd>
        <kwd>Paraphrase generation</kwd>
        <kwd>Question answering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Recognizing textual entailment has recently generated interest from a wide range of Natural
Language Processing related research areas, such as automatic summarization, information extraction
and question answering. Advances have been made with various techniques, such as aligning
syntactic trees and word overlap. While there is still much room for improvement, Vanderwende
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] showed that current approaches are close to hitting the boundaries of what is feasible with
lexical-syntactic approaches.
      </p>
      <p>
        Proposed directions to cross this boundary include using logical inference, background
knowledge and paraphrasing [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We explore the possibility of applying paraphrasing to obtain a more
reliable match between a given text and hypothesis for which the presence of an entailment relation
is to be determined. For instance, consider the following RTE2 pair.
      </p>
      <p>In this example, text and hypothesis use different words to express the same meaning. Although
deep inference is required to recognize that a mother is part of the family, and that daughter and
baby in this context most likely refer to the same person, most variation occurs on the surface
level. For instance, announced in the hypothesis could be replaced by said without changing its
meaning. Similarly, the phrases the United States and the US would not be matched by a system
relying solely on word overlap.</p>
      <p>The criterion that our system uses to decide whether a text entails a hypothesis is the length
of the longest common subsequence (LCS) between the passages. Rather than identifying the
LCS using word matching, our system employs an automatic paraphrasing method that extends
matches to synonymous, but non-identical phrases. We automatically generate our paraphrases
by extracting them from bilingual parallel corpora.</p>
      <p>Whereas many systems use dependency parsers and other linguistic resources that are only
available for a limited number of languages, our system employes a method that is comparatively
language independent. For this paper we extract paraphrases in Dutch, English, French, German,
Italian, Spanish, and Portuguese.</p>
      <p>The paraphrase extraction algorithm is described in section 2. Section 3 describes how the
entailment score is calculated, and how paraphrases are generated in the entailment detection
system. The results of our participation in the CLEF2006 Answer Validation Exercide (henceforth
AVE) are described in section 4. We will wrap up with a conclusion and suggestions for future
work in section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Paraphrase extraction</title>
      <p>
        Paraphrases are alternative ways of conveying the same information. The automatic generation
of paraphrases has been the focus of a significant amount of research lately [
        <xref ref-type="bibr" rid="ref1 ref12 ref3 ref4">4, 12, 3, 1</xref>
        ]. In this
work, we use Bannard and Callison-Burch’s method [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which extracts paraphrases from bilingual
parallel corpora.
* 6 ! $ !
, ! 6 7
" $ &amp; $ 3 %
" " ! $ !
1 % # + ! $ ! % 3
" . # " ! $ ! #
" 6 ( ! ’ $ ! % $ %
6 ! $ ( ! 7 #6
, $
" 6 8
0 $ &amp; $
% $ 5 $ &amp; ,,
"
$ &amp; 5 "+
1
" 0 (
,
, $ &amp; ! # 8
5 ! % ( 8 +0
, 9 % $ # $ $ # ) $ 5 0"
$ &amp; "
# " % 6 $
" 9 ! # &amp; &amp; ! 3 ! " ### $5. %!$
      </p>
      <p>
        Bannard and Callison-Burch extract paraphrases from a parallel corpus by equating English
phrases which share a common foreign language phrase. English phrases are aligned with their
foreign translations using alignment techniques drawn from recent phrase-based approaches to
statistical machine translation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Paraphrases are identified by pivoting through phrases in
another language. Candidate paraphrases are found by first identifying all occurrences of the
English phrase to be paraphrased, then finding its corresponding foreign language translations of
the phrase, finally looking at what other English phrases those foreign languages translate back
to.
where c is a parallel corpus from a set of parallel corpora C. Thus multiple corpora may be used
by summing over all paraphrase probabilities calculated from a single corpus (as in Equation 1)
and normalizing by the number of parallel corpora. We calculate the paraphrase probabilities
using the Europarl parallel corpus [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which contains parallel corpora for Danish, Dutch, English,
French, Finnish, German, Greek, Italian, Portuguese, Spanish and Swedish.
      </p>
      <p>The method is multilingual, since it can be applied to any language which has a parallel corpus.</p>
      <p>Thus paraphrases can be easily generated for each of the languages in the CLEF AVE task using
the Europarl corpus.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Recognizing entailment</title>
      <p>
        The longest common subsequence (LCS) is used as a measure of similarity between passages.
LCS is also used by the ROUGE [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] summarization evaluation package to measure recall of a
system summary with respect to a model summary. We use it not to measure recall but precision,
to approximate the ratio of information in the hypothesis which is also in the text. Unlike the
longest common substring, the longest common subsequence does not require adjacency. A longest
common subsequence of a text T = ht1..tni and a hypothesis H = hh1..hni is defined as a longest
possible sequence Q = hq1..qni with words in Q also being words in T and H in the same order.
LCS(T, H) is the length of the longest common subsequence:
      </p>
      <p>LCS(T, H) = max { |Q| | Q ⊆ T ; Q ⊆ H; (ti = hk ∈ Q ∧ tj = hl ∈ Q ∧ j &gt; i) → l &gt; k}
(4)</p>
      <p>From the LCS, the entailment score LCS(T, H)/|H| is derived. In order to account for variation
in natural language text, the LCS is measured after paraphrasing the hypothesis. The underlying
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
random
longest common subseq.</p>
      <p>LCS after paraphrasing
random
longest common subsequence</p>
      <p>LCS after paraphrasing
dependency tree alignment
precision
recall</p>
      <sec id="sec-3-1">
        <title>F-measure precision recall</title>
      </sec>
      <sec id="sec-3-2">
        <title>F-measure</title>
        <p>idea is that whenever a paraphrase of the hypothesis exists which entails the text, the hypothesis
itself also entails the text.</p>
        <p>We attempted to extract paraphrases for every phrase in the hypothesis of up to 8 words.
Note that by “phrase” we simply mean sequence of words. Figure 2 shows all paraphrases that
we were able to extract for the example hypothesis. After generating these candidate mappings
we iteratively transform the hypothesis to be closer to the text by substituting in paraphrases. At
each iteration, the substitution is made which constitutes the greatest increase of the entailment
score. To prevent overgeneration, a word which was introduced in the hypothesis by a paraphrase
substitution cannot be substituted itself. The process stops when no more substitutions can be
made which positively affect the entailment score. By example, the following paraphrase of the
hypothesis from section 1 is obtained by a number of substitutions.</p>
        <p>Hypothesis: Clonaid announced that mother and daughter would be returning to the US on</p>
        <p>Monday.</p>
        <p>Substitutions:
the US → the United States
returning → return
said → announced
would be → is
on Monday → Monday
Paraphrased hypothesis: Clonaid said that mother and daughter is return to the United States
Monday.
6</p>
        <p>In this case, paraphrasing caused the length of the LCS to increase from 43% ( 14 ) to 77%
( 1103 ). The words in italics are the words which are aligned with the text sentence, i.e. which are
part of the longest common subsequence. Table 2 shows a number of CLEF AVE pairs for which
paraphrases are were to recognize entailment.</p>
        <p>
          In order to judge whether a hypothesis is entailed by a text we see if the value of the entailment
score, LCS(T, H)/|H|, is greater than some threshold value. Support vector machines [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] are used
to determine the entailment treshold. Unfortunately, the only suitable training data available was
g
n
r is
        </p>
        <p>
          a
fte r
a ph
the Question Answering subset of the RTE2 data set [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. This is a monolingual collection of
English passage pairs, with for each pair a boolean annotation of the presence of an entailment
relation. Lacking training data for other languages, for our submission we used the RTE2 data
to learn the entailment treshold for all languages. The threshold value use used throughout these
experiments was 0.75.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>We compared the performance of the paraphrasing method with two baselines on seven languages
within the the CLEF 2006 Answer Validation Exercise. The first baseline is a system which
decides at random with a probability distribution reflecting the corpus distribution of entailment
values. The second baseline is a system which measures the longest common subsequence of text
and hypothesis. Given the fact that paraphrasing is a form of query expansion, we expected
that precision drops and recall increases when using paraphrases. Figure 3 (left) shows that
this is indeed the case, but that the system using paraphrases shows considerably better overall
performance, as indicated by the F-measure, compared to plain LCS.</p>
      <p>
        For Dutch, Spanish and English, we made a syntactic analysis of each sentence using the
parsers of [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] respectively. As a fourth entailment recognition system, we measured
the largest common subtree of the dependency trees of the text and hypothesis. The algorithm
of Marsi et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] was used to align dependency trees. Interestingly, as shown in Figure 3
(right), the dependency tree alignment system performs slightly better but comparably to the
largest common subsequence after paraphrasing, while the first uses syntactic information and the
latter uses paraphrase generation. The fact that both systems disagree on 37 percent of all pairs
with positive entailment (see Table 3) indicates that performance can be further increased when
employing both types of information in an integrated system.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We showed that paraphrases can boost the performance of an entailment recognition system
relying on the longest common subsequence to decide for entailment. Our method is applicable to
a wide range of languages, since no language specific natural language analysis or background
knowledge is used other than paraphrases automatically extracted from bilingual parallel corpora.
Although our system performs similarly to a syntax based system, we showed that there is
relatively little overlap between the sets of correctly recognized pairs of both systems. This indicates
that information conveyed by paraphrases and syntax are largely complementary for the task of
recognizing entailment.</p>
      <p>In the future we plan to investigate three things: a combination of syntax-based and
paraphrasebased approaches to entailment recognition, improved methods for determining the entailment
threshold, and further enhancements to paraphrase extraction techniques, such as taking syntax
into account.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is funded by the Interactive Multimodal Information Extraction (IMIX) program of the
Netherlands Organization for Scientific Research (NWO).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Colin</given-names>
            <surname>Bannard</surname>
          </string-name>
          and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Callison-Burch</surname>
          </string-name>
          .
          <article-title>Paraphrasing with bilingual parallel corpora</article-title>
          .
          <source>In ACL-2005</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Roy</given-names>
            <surname>Bar-Haim</surname>
          </string-name>
          , Ido Dagan, Bill Dolan, Lisa Ferro, Danilo Giampiccolo, Bernardo Magnini, and
          <string-name>
            <given-names>Idan</given-names>
            <surname>Szpektor</surname>
          </string-name>
          .
          <article-title>The second PASCAL Recognising Textual Entailment Challenge</article-title>
          . In Bernardo Magnini and Ido Dagan, editors,
          <source>Proceedings of the Second PASCAL Recognising Textual Entailment Challenge</source>
          , Trento, Italy,
          <year>April 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Regina</given-names>
            <surname>Barzilay</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lillian</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Learning to paraphrase: An unsupervised approach using multiple-sequence alignment</article-title>
          .
          <source>In Proceedings of HLT/NAACL</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Regina</given-names>
            <surname>Barzilay and Kathleen McKeown</surname>
          </string-name>
          .
          <article-title>Extracting paraphrases from a parallel corpus</article-title>
          .
          <source>In ACL-2001</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Gosse</given-names>
            <surname>Bouma</surname>
          </string-name>
          , Gertjan van Noord,
          <string-name>
            <given-names>and Robert</given-names>
            <surname>Malouf</surname>
          </string-name>
          .
          <article-title>Alpino: wide-coverage computational analysis of Dutch</article-title>
          .
          <source>In Proceedings of CLIN</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Xavier</given-names>
            <surname>Carreras</surname>
          </string-name>
          , Isaac Chao,
          <article-title>Llu´ıs Padr´o, and Muntsa Padr´o. FreeLing: an open-source suite of language analyzers</article-title>
          .
          <source>In Proceedings of the 4th international Language Resources and Evaluation Conference</source>
          , Lisbon, Portugal,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Koehn</surname>
          </string-name>
          .
          <article-title>A parallel corpus for statistical machine translation</article-title>
          .
          <source>In Proceedings of MTSummit</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Koehn</surname>
          </string-name>
          , Franz Josef Och, and Daniel Marcu.
          <article-title>Statistical phrase-based translation</article-title>
          .
          <source>In Proceedings of HLT/NAACL</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Chin-Yew Lin</surname>
          </string-name>
          .
          <article-title>ROUGE: a package for automatic evaluation of summaries</article-title>
          .
          <source>In Proceedings of ACL 2004 Workshop Text Summarization Branches Out</source>
          , Barcelona, Spain,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Dekang</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <article-title>Dependency-based evaluation of MiniPar</article-title>
          .
          <source>In Proceedings of LREC Workshop on the Evaluation of Parsing Systems</source>
          , Granada, Spain,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Erwin</surname>
            <given-names>Marsi</given-names>
          </string-name>
          , Emiel Krahmer, Wauter Bosma, and
          <string-name>
            <surname>Mari</surname>
          </string-name>
          ¨et Theune.
          <article-title>Normalized alignment of dependency trees for detecting textual entailment</article-title>
          .
          <source>In Bernardo Magnini and Ido Dagan</source>
          , editors,
          <source>Second PASCAL Recognising Textual Entailment Challenge</source>
          , pages
          <fpage>56</fpage>
          -
          <lpage>61</lpage>
          , Venice, Italy,
          <year>April 2006</year>
          . PASCAL.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Bo</surname>
            <given-names>Pang</given-names>
          </string-name>
          , Kevin Knight, and Daniel Marcu.
          <article-title>Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences</article-title>
          .
          <source>In Proceedings of HLT/NAACL</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Lucy</given-names>
            <surname>Vanderwende</surname>
          </string-name>
          and
          <string-name>
            <given-names>William B.</given-names>
            <surname>Dolan</surname>
          </string-name>
          .
          <article-title>What syntax can contribute in the entailment task</article-title>
          .
          <source>In PASCAL Challenges Workshop on Recognizing Textual Entailment</source>
          , pages
          <fpage>205</fpage>
          -
          <lpage>216</lpage>
          , Southampton, United Kingdom,
          <year>2005</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Vladimir</surname>
            <given-names>Naumovich</given-names>
          </string-name>
          <string-name>
            <surname>Vapnik</surname>
          </string-name>
          .
          <article-title>The nature of statistical learning theory</article-title>
          . Springer, 2nd edition,
          <year>November 1999</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>