<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Survey on Paraphrase Recognition</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Simone Magnolini University of Brescia Fondazione Bruno Kessler</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Paraphrase Recognition is a task of growing interest in the natural language process (NLP) research during the last years. This task aims to detect if two sentences have the same meaning. Paraphrase relationship described in this work is not the definition given by the common knowledge, but is more natural language oriented since it is driven by a lot of background human knowledge. This type of relation can be used as a support for many other NLP applications, such as Question Answering, Multi-document Summarization and for Machine Translation too. This paper presents an overview of the different phenomena that lead to paraphrase, of the methods used for Paraphrase Recognition and of different data-sets used for the task evaluation and of the main issue still opened.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Paraphrase is an important phenomenon that can be used to improve many other
NLP task. Possible ways to face paraphrase are recognition, generation or
extraction; in this paper we’ll focus only on the first one, but it’s easy to show
that they are strongly connected. For example with a system that can recognize
paraphrase is possible to improve the quality of a paraphrase generator, the first
system can choose the best paraphrase among the ones proposed by the
generator. The same is true if we have to validate a candidate given by a paraphrase
extractor; we can assume that every improvement in one of these tasks will
effect also the others.</p>
      <p>
        Before starting to describe the task is useful to give some examples of the
possible application of paraphrasing. In question answering (QA) paraphrase can be
use in both directions, first is possible that the QA system needs a paraphrase
of the original question to find the right answer. A further paraphrase cold be
also possible to present the answer. In machine translation (MT) paraphrase can
be used to improve the quality of a translation, especially to cover missing
expressions or words that the system didn’t learn during the training phase [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
Another important task that use intensively paraphrase recognition technique is
plagiarism identification [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], in this task it’s request to find ideas and sentences
that are paraphrase of others (the original ones), an intense use of synonyms and
syntactic modifications can generate very challenging data-sets.
Aim of this paper is to give a wide overview of the problems and of the most
challenging part of the task, but we avoid to go deep into details or technical
aspects. We begin our survey with a definition of paraphrase, trying to stress the
critical aspect related to the task. The second part is focused on the different
approach used for paraphrase recognition, for every type of technique we present
a paper that use this approach; the description of every system is not the goal of
this paper. The third part presents some issue of the paraphrase data-sets.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Paraphrase</title>
      <p>
        A definition of paraphrase of a sentence, according to the common knowledge,
is another sentence with the same meaning of using different words. This
definition introduces two important aspects: same meaning and different words. These
two concepts are quite intuitive but difficult to formalize. For example, in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
three different sentences are proposed:
(1) Wonderworks Ldt. constructed the new bridge.
(2) The new bridge was constructed by Wonderworks Ldt.
(3) Wonderworks Ldt. is the constructor of the new bridge.
      </p>
      <p>
        Only the (1) and the (2) are actually paraphrase, but many people accept
(3) as a paraphrase, too. The idea that in (3) the bridge cold not be finished is
usually ignored, we usually accept a little decrease (or increase) of information,
if not too large. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is introduced the concept of ”quasi-paraphrases” to better
describe the notion in linguistic, and to distinguish it from the logical definition.
Another definition of paraphrase is derived from [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], two sentences T1; T2 are
paraphrase if T1 entails T2 and T2 entails T1.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Paraphrasing phenomena classification</title>
        <p>
          Many linguistic phenomena lead to paraphrase, in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] are listed 25 possible type
of ”substitution” that maintain the meaning of the sentence inside the
boundaries of the quasi-paraphrase. This analysis has a linguistic point of view and
not a computational approach to the task, this grants a wider classification that
includes also rare phenomena that don’t compare in the most used data-sets, but
that are common in the spoken language. According to this analysis is
interesting to note that the type of paraphrase are not uniformly used, but that three of
them collect more than the 75% of the examples in both the data-sets taken into
consideration in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The types (with an example) are:
        </p>
        <sec id="sec-2-1-1">
          <title>Synonym substitution: ”He quickly leaves the room”</title>
          <p>
            room”
”He speedily leaves the
Function word variation: ”This letter is very important to your admission”
”This letter is very important for your admission”
External knowledge: ”Obama was named the 2009 Nobel Peace Prize
laureate” ”The President of the United States was named the 2009 Nobel Peace
Prize laureate”
In [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ] is proposed a more structured classification that takes into
consideration not only the types of substitution, but divides the 24 types in 7 sub-classes
and in 5 classes. In the paper is pointed the attention on the coexistence of more
than one phenomena in a paraphrase.
          </p>
          <p>The presence of one modification is not enough to be sure that two sentences
are paraphrase linked, for example:</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>I like my dog</title>
        </sec>
        <sec id="sec-2-1-3">
          <title>I like dogs</title>
          <p>It’s a general substitution, but the meaning is different.</p>
          <p>At the moment the classification of different paraphrase is not used for
recognizing tasks, this is due to the leak of annotated data-sets with this kind of
information, and to the absence of a unique classification for the paraphrase.
Understanding the phenomena that origin the paraphrase is important for two
main goals: add features to the systems that can be used and find the boundaries
of the task in order to obtain better training set.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Different Approaches</title>
      <p>The paraphrase recognition is used as a part, sometimes as the main part, of
different NLP systems. The semantic text similarity (STS) is one of them, in
this task the system has to measure the similarity of meaning of two sentences
in a range from 0 (nothing in common) to 5 (perfect paraphrase). Systems that
take part at this challenge use, sometimes, as training and test sets paraphrase
data-sets, also the STS data-set is sometimes considered a paraphrase data-set.
An overview of the different approach to the task will include also systems
designed for the STS.</p>
      <p>
        Others common approaches to this task are textual entailment systems, since
paraphrase can be seen as a double entailment.
In this kind of approach, like in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the pair of sentences are mapped into a logic
form and then a prover extracts a similarity score based on the operations needed
to satisfy both the sentences. In this system background-knowledge doesn’t
affect the similarity score given by the logic prover, but is used combined with
this score for a further elaboration of the pair of sentences. This approach is
generally worse that others, but doesn’t need knowledge from large corpora.
Another approach is to use background knowledge inside the prover and durin
training generate a threshold for the proof found. An example of this approach
can be found in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] for the workshop [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This kind of approach is designed for
textual entailment, but, as explained, can be used also for paraphrase. Axioms
are introduced into the system by a source called eXtended WordNet
Knowledge Base (XWN-KB) that grants good results; this shows that some algorithms
need also good knowledge base to obtain notable result.
3.2
      </p>
      <sec id="sec-3-1">
        <title>Vector Space Models approach</title>
        <p>
          In this approach every word is mapped in a vector that contain other words with
a score that represents the connection between that word and the others. The
quality of the vector depends on the corpora used to generate the model and on
the length of the vector [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          An example of this kind of mapping is Word2Vec [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], this tool is used in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
for detecting paraphrase candidates inside a parser. In this paraphrase system the
similarity score is calculated with the product of a combination of the
components of the vector representations of the sentences. This is not the only possible
way to use this model, as it is also possible to sum or subtract the vector to
obtain different similarity scores.
3.3
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>String Similarity approaches</title>
        <p>To overcome the problem of transformation from natural language into logic
form or into other models a variety of new approaches was developed. In this
kind of paraphrase systems the decision is taken just on the analysis of the two
texts with a simpler pre-processing. Part-of-speech (POS) tagging and
lemmatisation are examples of elaborations used in this kind of approach. The main
hypothesis of this kind of approach is that even if in a paraphrase different words
are used, a lot of them remain the same (we’ll take more into consideration this
point during the data-sets analysis). A system may for example count only the
lexical overlap, or the edit distance to score the paraphrase distance of two
sentence.</p>
        <p>
          The string similarity is used also as a base-line for other systems, like in [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]; a
system can take into consideration not only the single token in the two sentences
but also the common n-grams [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
3.4
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Syntactic Similarity Approach</title>
        <p>
          It’s possible to detect paraphrases not only at semantic level, with a structure
similar to bag of words, but analyzing also the syntactic level. An example of
this kind of approach may be found in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] or in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] (a textual entailment system).
In these systems the paraphrase is decided on the number of syntactic rules used
to transform a sentence into the other; the rules can be derived from grammatical
analysis or be based on a statistical approach.
        </p>
        <p>
          The number of possible rules is usually quite big, and to detect a non paraphrase
case it’s needed to try all of them, so some systems can decide to reduce the set
of possible transformations to the most probable ones.
This approach covers a large variety of systems that use different measures
listed above as features for machine learning algorithms. For example in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] not
only syntax is taken into consideration but also semantic, named entity
recognition, overlap and other features using the v-Support Vector Regression model
(v-SVR) [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>
          This approach is focused not only on the type of similarity relation but also on
the impact that every feature may have on the paraphrase relationship. Select
the right features is an hard task and can be managed also using heuristic taken
from operative research like in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] where a genetic algorithm is used to select
features for a Support Vector Machine (SVM).
3.6
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Machine Translation approach</title>
        <p>
          This approach uses large bilingual corpora to detect word or n-grams that are
translated in the same way. This idea is also the main assumption used to
create the paraphrase database (PPDB) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. In these systems are applied ideas or
measures already described in the previous points, but with different training
set: bilingual corpora. The main advantage is that these kinds of corpora are
bigger and easier to obtain that the paraphrase ones.
        </p>
        <p>
          The system described in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] shows that this approach is designed for paraphrase
extraction. We decide to mention it because with this is possible to create
resources used as a feature for recognizing algorithms.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Data-sets</title>
      <p>
        In the previous sections of this paper we have stressed some points that still have
a vague answer that are: what is the meaning of a sentence and how many words
have to be different to define a pair of sentence as a paraphrase? The answers to
these questions are, also, in the data-sets; we can notice that many algorithm to
paraphrase recognition are supervised, so we can assume that from a
computational point of view two sentences have the same meaning when they’re tagged
like this in a training set. This is quite common for NLP tasks, but is quite
interesting to notice that, as discussed in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], the data-sets for paraphrasing are quite
heterogeneous; different annotator, different score. The main corpus for this task
is the Microsoft Research Paraphrase corpus (MSRP) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]: a deep analysis of
this corpus and of the others usually used to train and evaluate the systems can
be found in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], we take into consideration only the result on Table 2.
The interesting property of the data-sets is the distribution of the lexical overlap.
It’s easy to notice that simple overlap systems can obtain good results with easy
elaboration. The first letter in a methods name indicates OpenNLP package (O)
or Stanford NLP package (S). The second letter indicates the type of
normalization for the lexical overlap: average length (A) versus maximum length (M). The
remaining indicate: (1) tokens used (P means we compared all tokens, including
punctuation; W means we excluded punctuation; C means content words only;
S means all words, excluding the stop words), (2) form of the tokens used (W
original raw form, B base form, P only words with the same POS and same
base form), (3) case sensitivity (S) or insensitivity (I), (4) unigrams (U) or
bigrams (B), (5) type of global weight used for each token (I means IDF, E means
entropy-based, or N means weight of 1), and (6) type of local weight used (F
means word type frequency, N means local weighting of 1).
      </p>
      <p>If we compare the results described in Table 2 with the result of state of the
art systems described in Table 1 we can notice that the use of more complex
systems grant little improvements. This issue is due to structure and to
assumption of the corpora used to train and to evaluate systems, and not to the systems
themselves.</p>
      <p>
        This problem is typical also for other NLP tasks as described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for
recognizing textual entailment (RTE) task, but to obtain more significant (and
challenging) data-sets this trait should be reduced or distributed between paraphrase
and not paraphrase sentence pairs.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Paraphrasing recognition is a challenging and popular research area, that shares
many points with other difficult and useful tasks like STS and RTE. In this
paper we have presented an overview, not only about approaches, but also about
data-sets and possible applications of paraphrasing recognition.</p>
      <p>Important characteristics of the task are: the absence, at the moment, of an
algorithm or approach that overwhelms the others; data-sets are strongly influenced
by lexical overlap; no linguistic classification is used to face the task. We expect
to see a lot of effort on this task, because every improvement on this field can
extended also to other important (and maybe more practical) NLP systems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Rte '
          <volume>07</volume>
          :
          <article-title>Proceedings of the acl-pascal workshop on textual entailment and paraphrasing</article-title>
          , Stroudsburg, PA, USA, Association for Computational Linguistics,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Ion</given-names>
            <surname>Androutsopoulos</surname>
          </string-name>
          and
          <string-name>
            <given-names>Prodromos</given-names>
            <surname>Malakasiotis</surname>
          </string-name>
          ,
          <article-title>A survey of paraphrasing and textual entailment methods</article-title>
          ,
          <source>arXiv preprint arXiv:0912.3747</source>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Colin</given-names>
            <surname>Bannard</surname>
          </string-name>
          and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Callison-Burch</surname>
          </string-name>
          ,
          <article-title>Paraphrasing with bilingual parallel corpora</article-title>
          ,
          <source>Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>597</fpage>
          -
          <lpage>604</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Roy</given-names>
            <surname>Bar-Haim</surname>
          </string-name>
          , Ido Dagan, Iddo Greental, and Eyal Shnarch,
          <article-title>Semantic inference at the lexical-syntactic level</article-title>
          ,
          <source>PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE</source>
          , vol.
          <volume>22</volume>
          , Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press;
          <year>1999</year>
          ,
          <year>2007</year>
          , p.
          <fpage>871</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Luisa</given-names>
            <surname>Bentivogli</surname>
          </string-name>
          , Bernardo Magnini, Ido Dagan, Hoa Trang Dang, and Danilo Giampiccolo,
          <article-title>The fifth PASCAL recognising textual entailment challenge</article-title>
          ,
          <source>Proceedings of the TAC Workshop on Textual Entailment (Gaithersburg, MD)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Berant</surname>
          </string-name>
          and
          <string-name>
            <given-names>Percy</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Semantic parsing via paraphrasing</article-title>
          ,
          <source>Proceedings of ACL</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Rahul</given-names>
            <surname>Bhagat</surname>
          </string-name>
          and
          <string-name>
            <given-names>Eduard</given-names>
            <surname>Hovy</surname>
          </string-name>
          ,
          <article-title>What is a paraphrase?</article-title>
          ,
          <source>Computational Linguistics</source>
          <volume>39</volume>
          (
          <year>2013</year>
          ), no.
          <issue>3</issue>
          ,
          <fpage>463</fpage>
          -
          <lpage>472</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Eduardo</given-names>
            <surname>Blanco</surname>
          </string-name>
          and
          <string-name>
            <surname>Dan I Moldovan</surname>
          </string-name>
          ,
          <article-title>A logic prover approach to predicting textual similarity</article-title>
          ., FLAIRS Conference,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Davide</given-names>
            <surname>Buscaldi</surname>
          </string-name>
          , Joseph Le Roux, Jorge J Garc´ıa Flores,
          <string-name>
            <surname>Adrian Popescu</surname>
          </string-name>
          , et al.,
          <article-title>Lipn-core: Semantic text similarity using n-grams, wordnet, syntactic analysis, esa and information retrieval based features</article-title>
          ,
          <source>* SEM</source>
          <year>2013</year>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Chris</surname>
            Callison-Burch,
            <given-names>Philipp</given-names>
          </string-name>
          <string-name>
            <surname>Koehn</surname>
          </string-name>
          , and Miles Osborne,
          <article-title>Improved statistical machine translation using paraphrases</article-title>
          ,
          <source>Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Association for Computational Linguistics</source>
          ,
          <year>2006</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>A. CHITRA</given-names>
            and
            <surname>ANUPRIYA</surname>
          </string-name>
          <string-name>
            <surname>RAJKUMAR</surname>
          </string-name>
          ,
          <article-title>Genetic algorithm based feature selection for paraphrase recognition</article-title>
          ,
          <source>International Journal on Artificial Intelligence Tools</source>
          <volume>22</volume>
          (
          <year>2013</year>
          ), no.
          <volume>02</volume>
          ,
          <fpage>1350007</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Bill</surname>
            <given-names>Dolan</given-names>
          </string-name>
          , Chris Quirk, and Chris Brockett,
          <article-title>Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources</article-title>
          ,
          <source>Proceedings of the 20th international conference on Computational Linguistics, Association for Computational Linguistics</source>
          ,
          <year>2004</year>
          , p.
          <fpage>350</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Juri</surname>
            <given-names>Ganitkevitch</given-names>
          </string-name>
          , Benjamin Van Durme, and
          <string-name>
            <surname>Chris</surname>
          </string-name>
          Callison-Burch,
          <article-title>Ppdb: The paraphrase database</article-title>
          .,
          <source>HLT-NAACL</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>758</fpage>
          -
          <lpage>764</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Philip M McCarthy and Danielle S McNamara</surname>
          </string-name>
          ,
          <article-title>The user-language paraphrase challenge</article-title>
          ,
          <source>Retrieved January</source>
          <volume>10</volume>
          (
          <year>2008</year>
          ),
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Efficient estimation of word representations in vector space</article-title>
          ,
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Vasile</surname>
            <given-names>Rus</given-names>
          </string-name>
          , Rajendra Banjade, and Mihai Lintean,
          <article-title>On paraphrase identification corpora</article-title>
          ,
          <source>Proceeding on the International Conference on Language Resources and Evaluation (LREC</source>
          <year>2014</year>
          ),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Bernhard</surname>
            <given-names>Scholkopf</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peter L Bartlett</surname>
          </string-name>
          ,
          <article-title>Alex J Smola, and Robert Williamson, Shrinking the tube: a new support vector regression algorithm</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          (
          <year>1999</year>
          ),
          <fpage>330</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>Marta</given-names>
            <surname>Tatu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Moldovan</surname>
          </string-name>
          , Cogex at rte3,
          <source>Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing</source>
          , Association for Computational Linguistics,
          <year>2007</year>
          , pp.
          <fpage>22</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>O</given-names>
            <surname>¨ zlem</surname>
          </string-name>
          <string-name>
            <surname>Uzuner</surname>
          </string-name>
          , Boris Katz, and Thade Nahnsen,
          <article-title>Using syntactic information to identify plagiarism</article-title>
          ,
          <source>Proceedings of the second workshop on Building Educational Applications Using NLP, Association for Computational Linguistics</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>V</given-names>
            <surname>Vaishnavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Saritha</surname>
          </string-name>
          , and RS Milton,
          <article-title>Paraphrase identification in short texts using grammar patterns</article-title>
          ,
          <source>Recent Trends in Information Technology (ICRTIT)</source>
          ,
          <source>2013 International Conference on, IEEE</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>472</fpage>
          -
          <lpage>477</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>Marta</given-names>
            <surname>Vila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Anto</surname>
          </string-name>
          <article-title>`nia Mart´ı, and Horacio Rodr´ıguez, Is this a paraphrase? what kind? paraphrase boundaries and typology</article-title>
          ,
          <source>Open Journal of Modern Linguistics</source>
          <year>2014</year>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>