=Paper= {{Paper |id=Vol-1737/T6-4 |storemode=property |title=KS_JU@DPIL-FIRE2016: Detecting Paraphrases in Indian Languages Using Multinomial Logistic Regression Model |pdfUrl=https://ceur-ws.org/Vol-1737/T6-4.pdf |volume=Vol-1737 |authors=Kamal Sarkar |dblpUrl=https://dblp.org/rec/conf/fire/Sarkar16 }} ==KS_JU@DPIL-FIRE2016: Detecting Paraphrases in Indian Languages Using Multinomial Logistic Regression Model== https://ceur-ws.org/Vol-1737/T6-4.pdf
  KS_JU@DPIL-FIRE2016:Detecting Paraphrases in Indian
  Languages Using Multinomial Logistic Regression Model
                                                             Kamal Sarkar
                                                Department of Computer Science and
                                                           Engineering
                                                 Jadavpur University, Kolkata, India
                                                     jukamal2001@yahoo.com

ABSTRACT                                                               by applying paraphrasing techniques to generate similar orrelated
In this work, we describe a system that detects paraphrases in         queries [4][5][6][7][8].
Indian Languages as part of our participation in the shared Task       Ravichandran and Hovy (2002)[9] use semi-supervised learning
on detecting paraphrases in Indian Languages (DPIL) organized          to generate several paraphrasepatterns for each question type and
by Forum for Information Retrieval Evaluation (FIRE) in 2016.          use them in an open-domain question answering system(QA
Our paraphrase detection method uses a multinomial logistic            system). Riezler et al. (2007)[10] expand a query by generating n-
regression model trained with a variety of features which are          best paraphrases for the queryand then using any novel words in
basically lexical and semantic level similarities between two          the paraphrases to expand the original query:
sentences in a pair. The performance of the system has been
evaluated against the test set released for the FIRE 2016 shared       NLP applications such as machine translation and multi-document
task on DPIL. Our systemachieves the highest f-measure of              summarization, system performance are evaluated by comparing
0.95on task1 in Punjabi language.The performance of our system         the system generated output and the references created by human.
ontask1 in Hindi language is f-measure of 0.90. Out of 11 teams        Manual creation of references is a laborious task. So, many
participated in the shared task, only four teams participated in all   researchers have suggested to use paraphrase generation
four languages, Hindi, Punjabi, Malayalam and Tamil, but the           techniques for generating variants of references for evaluating
remaining 7 teams participated in one of the four languages. We        summarization and machine translation output[11][12].
also participated in task1 and task2 both for all four Indian          Callison-Burch, Koehn, and Osborne (2006) [13] use
Languages. The overall average performance of our system               automatically induced paraphrases toimprove a statistical phrase-
including task1 and task2 overall four languages is F1-score of        based machine translation system. Such a system works
0.81 which is the second highest score among the four systems          bydividing the given sentence into phrases and translating each
thatparticipated in all four languages.                                phrase individually bylooking up its translation in a table and
Keywords                                                               using the translation of one of paraphrases of any source phrase
Paraphrasing; Multinomial Logistic regression model; Sentence          that does not have a translation in the table.
similarity; Hindi language; Punjabi Language; Malayalam                Like paraphrase generation, paraphrase recognition is also an
Language; Tamil Language                                               important task which is to assign a quantitativemeasurement to the
                                                                       semantic similarity of two phrases [14] or even two given pieces
                                                                       of text[15][16].In other words, the paraphrase recognition task is
1. INTRODUCTION                                                        to detect or recognizewhich sentences in the two texts are
                                                                       paraphrases of each other [17][18][19][20][21][22][23]. The latter
The concept of paraphrasing is defined in [1] as follows:              formulation of the task has becomepopular in recent years [24]
“The concept of paraphrasing is most generally defined on the          andparaphrase generation techniquesthat canbenefit immensely
basis of the principle ofsemantic equivalence: A paraphrase is an      fromthis task. In general, paraphrase recognition can be very
alternative surface form in the same languageexpressing the same       helpfulfor several NLP applications such as text-to-text
semantic content as the original form.” Paraphrases may occur at       generationand information extraction.Plagiarism detection is
various levels such as           lexical paraphrases (synonyms,        another important application area which needs the paraphrase
hyperonymy etc.) , phrasal paraphrase (phrasal fragments sharing       Identification technique to detect the sentences which are
the same semantic content) sentential paraphrases ( for example, I     paraphrases of others.
finished my work, I completed my assignment)[1].                       Detecting redundancy is a very important issuefor a multi-
The task of paraphrasing can be of two types based on its              document summarization system because two sentences from
applications: paraphrase generation and paraphrase recognition. In     different documents may convey the same semanticcontent and to
broader context, paraphrase generation has various applications.       make summary more informative, the redundant sentences should
One of the most common applications of paraphrasingis the              not be selected in the summary. Barzilay and McKeown
automatic generation of query variants for submission to               (2005)[25] exploit the redundancy present in a given setof
information retrieval systems Culicover(1968)[2]describes an           sentences by fusing into a single coherent sentence the sentence
earlier approach to query keyword expansionusing paraphrases.          segments which are paraphrases of each other. Sekine (2006)[26]
The approach in [3] generates several simple variants for              shows how to use paraphrase recognition to cluster
compound nouns present in queriesto enhance a technical                togetherextraction patterns to improve the cohesion of the
information retrieval system. In fact, the information retrieval       extracted information.
community has extensively exploredthe task of query expansion
Another recently proposed natural language processing task is that    2.1.1 Cosine Similarity
of recognizingtextual entailment: A piece of text T is said to
entail a hypothesis H if humans readingT will infer that H is most    To compute cosine similarity, we represent each sentence in a pair
likely true [27][28][29][30].                                         using a bag-of-words model. Then cosine similarity is computed
                                                                      between two vectors where each vector corresponds to a sentence
One of the important requirements for initiating research in          in a pair. Basically we consider the set of distinct words in the pair
paraphrase detection is creation of annotated corpus. The most        as the vector of features based on which the cosine similarity
commonly used corpora for paraphrase detection is the MSRP            between two sentences is computed. The size of the vector is n
corpus1 which contains 5,801 English sentence pairs from news         where n is |S1US2| , S1 is the set of words in the sentence 1 and S2
articles manually labelled with 67% paraphrases and 33% non-          is the set of words in sentence 2.. Each sentence in a pair is
paraphrases. The shared task on Semantic Textual Similarity           mapped to vector of length n. If the vector for sentence 1 is  and the vector for sentence 2 is , where vi
benchmark datasets for the similar kind of task, but its main focus   and ui are the values of i-th word feature in sentence 1 and
was to develop systems that can examine the degree of semantic        sentence 2 respectively, the cosine similarity between two vectors
equivalence between two sentences unlike paraphrase                   is computed as follows:
detectionwhich determines yes/no decision for given pair of
sentences.                                                                                                               +     +⋯ � �
                                                                      ��    � ,�    =       �           ,   =                                           (1)
Howeverthere are at present no annotated corpora or automated                                                   √   +   +...�� √    +    +... �
semantic interpretation systems available for Indian languages. So    Here the vector component vi in vector V corresponds to value of
creating benchmark data for paraphrases is necessary. With this       the i-th word feature which is basically the TF*IDF weight of the
motivation, creating annotated corpora for paraphrase detection       corresponding word. Similarly vector U is also constructed for the
and utilizing that data in open shared task competitions is a         sentence 2.
commendable effort which will motivate the research community
for further research in Indian languages.On this note, the shared     2.1.2 Word Overlap- Exact Match
task on detecting Paraphrases in Indian Languages (DPIL)@FIRE
2016 is a good effort towards creating benchmark data for             We also used the word overlap measure as a feature for
paraphrases in Indian Languages. In this shared task, there were      paraphrase detection. If two sentences in the pair are S1 and S2,
two sub-tasks: task1 is to classify a given pair of sentences in      the similarity based on word overlap is computed as follows:
Punjabi language as paraphrases (P) or not paraphrases (NP) and                         |� ∩    |
task2 is to identify whether a given pair of sentences are            ��    � ,�    = |� |+|                                                      (2)
                                                                                                    |
completely equivalent (E) or roughly equivalent (RE) or not
equivalent (NE). Four Indian Languages –Hindi, Punjabi,               Where |� ∩ S | is the number of words common between two
Malayalam and Tamil were considered in this shared task. We           sentences.and|S| is the length of sentence S in terms of words.
describe in the subsequent sections our proposed methodology          2.1.3 Stemmed Word Overlap
used to implement our system participated in the shared task and
we also present performance comparisons of our system with            Since the most Indian languages are highly inflectional, stemming
other systems participated in the competition.                        is an essential step while comparing words. Accurate stemmers
                                                                      are also not available for Indian languages. So, we applied a
2. OUR PROPOSED METHODOLOGY                                           lightweight approach to stemming. In this approach, when we
We view the paraphrase detection problem as classification            match two words, we find the unmatched portions of two words.
problem. Given a pair of sentences, the task1 is to classify          If we find that the matched portion of two words is greater than or
whether the pair of sentences is a paraphrase (P) or not –            equal to a threshold T1 and the minimum of unmatched portions
paraphrase (NP). When task1 is a two class problem, task2 is a        of word1 and word2 is less than or equal to a threshold T2, we
three class problem. The task2 is to classify a given pair of         assume that there exists a match between word1 and word2.
sentences into one of three categories: completely equivalent (E)     Stemmed Word overlap is computed using equation (2) with the
or roughly equivalent (RE) or not equivalent (NE).                    only difference in word matching criteria. We set T1 to 3 and T2
                                                                      to 2. We indicate such similarity between two sentences S1 and
Since the problems are basically a classification problem, we have
                                                                      S2as ��     � ,� .
used a traditional classifier for implementing our system. We have
used multinomial logistic regression classifier with ridge            2.1.4 N-gram Based Similarity
estimator for both task1 and task2.Each pair of sentences is
considered as a training instance. Features are extracted from the    The similarity measures mentioned above compares sentences
training pairs. We consider a number of features for representing     based on individual word matching. But bag-of-words model does
sentence pairs. The features which we have used for implementing      not take into account the context of occurrences of words. We
our system are described in the subsequent subsections:               consider n-gram based sentence similarity as one of the features
                                                                      for paraphrase detection.
2.1 Features
                                                                      We compute n-gram based similarity as follows:
We have used various similarity measures as the features.

                                                                      ��    � ,�    =                                              (3)
1                                                                                       +
 https://www.microsoft.com/en-
  us/download/confirmation.aspx?id=52398                              Where
2
 https://www.cs.york.ac.uk/semeval-2012/task6/index.html
 =#           −                ℎ       �       �   � a=#      −       WEKA is machine learning workbench consists of many machine
           � �                                                        learning algorithms for data mining tasks [34].
  = #                      �                                          We set the “ridge” parameter to 0.4 for all our experiments. The
                                                                      other parameters of the classifiers are set to default values.
We have only considered bigrams(n=2) for implementing our
present system.                                                       3. EVALUATION AND RESULTS
2.1.5 Semantic Similarity                                             3.1 Description of Datasets
                                                                      We have obtained the datasets from the organizers of the shared
We have used semantic similarity between two sentences as one         task on detecting paraphrases in Indian Languages (DPIL) held in
of the features for paraphrase detection. To compute semantic         conjunction with FIRE 2016 @ ISI – Kolkata. The datasets
similarity between sentences, we calculate whether words in the       released for four Indian languages-(1) Hindi, (2) Punjabi, (3)
sentences are semantically similar or not. To determine whether       Tamil and (4) Malayalam. For each language, two paraphrase
two words are semantically similar or not, we have cosine             detection tasks were defined: Task1- to classify a given pair of
similarity between word vectors for the words. The vector             sentences in Punjabi language as paraphrases (P) or not
representations of words learned by word2vec models[31] have          paraphrases (NP) and Task2- to identify whether a given pair of
been used to carry semantic meanings. Word2vec is a group of          sentences are completely equivalent (E) or roughly equivalent
related models used to produce word embeddings[32] [33]               (RE) or not equivalent (NE). The training data set for task1
Word2vec takes as its input a large corpus of text and produces a     contains a collection of sentence pairs labelled as P (paraphrase)
high-dimensional space where each unique word in the corpus is        or NP (not a paraphrase) and the training dataset for task2
assigned a corresponding vector in the space.                         contains a collection of sentence pairs labelled as completely
                                                                      equivalent (E) or roughly equivalent (RE) or not equivalent (NE).
Such representation of words into vectors positions the word in       The description of the datasets is shown in Table 1 and Table 2.
the vector space such that words that share common contexts are
positioned in close proximity to one another in the space                      Table 1: Description of Data sets for Task1
                                                                         Language         Training Data Size      Test Data Size
We have used word2vec model available in Python for computing
word vectors for the words. We have used gensim word2vec                 Hindi            2500                    900
model under Python platform with dimension set to 50,                    Punjabi          1700                    500
min_countto 5(ignore all words with total frequency lower than           Malayalam        2500                    900
this). The training algorithm used for developing word2vec model         Tamil            2500                    900
is CBOW (Continuous Bag of words). The other parameters of
word2vec model are set to default values. If the cosine similarity             Table 2: Description of Data sets for Task2
between the word vectors for the two words is greater than a             Language         Training Data Size      Test Data Size
threshold value, we consider these two words are semantically            Hindi            3500                    1400
similar. We set the threshold value to 0.8. We combine a small           Punjabi          2200                    750
amount of additional news data with the training data for each           Malayalam        3500                    1400
language to create the corpus used for computing word vectors.           Tamil            3500                    1400
Size of the corpora used to compute word vectors for the different
languagesis as follows:
For Hindi, 1.93 MB(8752 sentences), for Punjabi, 1.5 MB(5848
                                                                      3.2 Evaluation
                                                                      For evaluating the system performance, two evaluation metrics -
sentences), for Tamil, 2.20 MB (7847 sentences) and for
                                                                      Accuracy and F-measure have been used. Accuracy is defined as
Malayalam, 2.12 MB (7448 sentences)
                                                                      follows:
We compute semantic similarity between two sentences as                            #o   o      ly la i i   Pai
follows:                                                              Accuracy =                                               (5)
                                                                                            o al # o Pai

��      � ,�       =                                   (4)
                       +
                                                                      Though the same formula was used to calculate accuracy for both
where                                                                 the tasks-Task1 and Task2, the formula used to calculate F-
                                                                      measure for Task1 was not the same for Task2. The F-measure
      =#       �               �   �       ℎ       �   �      �
                                                                      used for evaluating task1 is defined as follows:
f=# �          � �
                                                                      F1-Score = F1 measure of Detecting Paraphrases=F1- score over
     = #    �          �                                              P class only.
2.2 Our Used Classifier                                               F-measure for the task2 is defined as:
We have used multinomial logistic regression as the classifier for
paraphrase detection task. We view the paraphrase detection           F1-Score = Macro F1 Score which is an average of F1 scores of
problem as a pattern classification problem where each pair of        all three classes -P, NP and SP.
sentences under consideration of paraphrase checking is mapped
to a pattern vector based on the features discussed in section 2.1.
We have chosen multinomial logistic regression classifier from
WEKA. This is present in WEKA with the name “logistic”.
3.3 Results                                                           labeled files to the organizers of the contest within a short period
                                                                      of time. Thereafter they evaluated the system output and
                                                                      announced the results. The official results of the various systems
For system development, we have used training data [35] released      participated in Task1 and Task 2 of the contest are shown in Table
for the shared task. At the first stage of this shared task,          3 and Table 4 respectively. As we can see from the tables, no
participants were given the training data sets for system             system performs equally well in both the tasks- task1 and task2
development. At the second stage, the unlabeled test data sets [35]   acrossall languages. Some systems have performed the best in
were supplied and the participants were asked to submit the           some languages on task1 and some other systems have performed
                                                                      the best in some other languages on the same task. This is also
Table 3. Official results obtained for Task 1 @ DPIL 2016
                                                                      true for task2.
   Team          Langua      Task      Accuracy     F1 Measure        Table 4. Official results obtainedfor Task 2 by the various
   Name          ge                                 /                 participating teams @ DPIL 2016
                                                    Macro
                                                    F1Measure            Team          Language      Task      Accurac     F1
   KS_JU         Hindi       Task1     0.90666      0.9                  Name                                  y           Measure/Ma
   KS_JU         Malayal     Task1     0.81         0.79                                                                   cro
                 am                                                                                                        F1 Measure
   KS_JU         Punjabi     Task1     0.946        0.95                 KS_JU         Hindi         Task2     0.85214     0.84816
   KS_JU         Tamil       Task1     0.78888      0.75
                                                                         KS_JU         Malayala      Task2     0.66142     0.65774
   NLP-          Hindi       Task1     0.91555      0.91                               m
   NITMZ                                                                 KS_JU         Punjabi       Task2     0.896       0.896
   NLP-          Malayal     Task1     0.83444      0.79                 KS_JU         Tamil         Task2     0.67357     0.66447
   NITMZ         am
   NLP-          Punjabi     Task1     0.942        0.94                 NLP-          Hindi         Task2     0.78571     0.76422
   NITMZ                                                                 NITMZ
   NLP-          Tamil       Task1     0.83333      0.79                 NLP-          Malayala      Task2     0.62428     0.60677
   NITMZ                                                                 NITMZ         m
                                                                         NLP-          Punjabi       Task2     0.812       0.8086
   HIT2016       Hindi       Task1     0.89666      0.89                 NITMZ
   HIT2016       Malayal     Task1     0.83777      0.81                 NLP-          Tamil         Task2     0.65714     0.63067
                 am                                                      NITMZ
   HIT2016       Punjabi     Task1     0.944        0.94
   HIT2016       Tamil       Task1     0.82111      0.79                 HIT2016       Hindi         Task2     0.9         0.89844
                                                                         HIT2016       Malayala      Task2     0.74857     0.74597
   JU-NLP        Hindi       Task1     0.8222       0.74                               m
   JU-NLP        Malayal     Task1     0.59         0.16                 HIT2016       Punjabi       Task2     0.92266     0.923
                 am                                                      HIT2016       Tamil         Task2     0.755       0.73979
   JU-NLP        Punjabi     Task1     0.942        0.94
   JU-NLP        Tamil       Task1     0.57555      0.09                 JU-NLP        Hindi         Task2     0.68571     0.6841
                                                                         JU-NLP        Malayala      Task2     0.42214     0.3078
   BITS-         Hindi       Task1     0.89777      0.89                               m
   PILANI                                                                JU-NLP        Punjabi       Task2     0.88666     0.88664
                                                                         JU-NLP        Tamil         Task2     0.55071     0.4319
   DAVPBI        Punjabi     Task1     0.938        0.94
                                                                         BITS-         Hindi         Task2     0.71714     0.71226
   CUSAT_        Malayal     Task1     0.80444      0.76                 PILANI
   TEAM          am
                                                                         DAVPBI        Punjabi       Task2     0.74666     0.7274
   ASE           Hindi       Task1     0.35888      0.34
                                                                         CUSAT_        Malayala      Task2     0.50857     0.46576
   NLP@K         Tamil       Task1     0.82333      0.79                 TEAM          m
   EC
                                                                         ASE           Hindi         Task2     0.35428     0.3535
   Anuj          Hindi       Task1     0.92         0.91
                                                                         NLP@KE        Tamil         Task2     0.68571     0.66739
   CUSAT_        Malayal     Task1     0.76222      0.75                 C
   NLP           am
                                                                         Anuj          Hindi         Task2     0.90142     0.90001

                                                                         CUSAT_        Malayala      Task2     0.52071     0.51296
                                                                         NLP           m
                                                                           Most Indian languages are highly inflectional. So, use of
                                                                            morphological analyzer/stemmer/lemmatizer may improve
As we can see from the tables, only 4 teams out of 11 participated
                                                                            the system performance.
teams submitted their systems for all four languages- Hindi,
Punjabi, Malayalam and Tamil and the remaining 7 teams                 5. ACKNOWLEDGMENTS
participated in only one of the four languages.                        This research work has received support from the project entitled
We have shown in the tables in bold font the performance scores        ‘‘Design and Development of a System for Querying, Clustering
highest in a particular task for a particular language. It is also     and Summarization for Bengali’’ funded by the Department of
evident from the tables that most systems perform well on Punjabi      Science and Technology, Government of India under the SERB
and Hindi languages, but they show relatively poor performance         scheme.
in Tamil and Malayalam languages. We think that the main reason        6. REFERENCES
for achieving the better performances inPunjabi and Hindi              [1] Madnani, N., and Bonnie J. D. 2010 Generating phrasal and
language domain is the nature of training and testing data sets            sentential paraphrases: A survey of data-driven methods.
supplied for those languages. Most likely, that is why most                Computational Linguistics. 36.3 (2010): 341-387.
systems perform almost equally well on the Punjabi and Hindi
languages. Another reason for having poor performance on Tamil         [2] Culicover, P.W. 1968. Paraphrase generation and
and Malayalam may be the complex morphology of these                       information retrieval from stored text. Mechanical
languages.                                                                 Translation and Computational Linguistics. 11(1–2):78–88.
                                                                       [3] Sp¨arck-Jones, K. and Tait, J. I. 1984. Automatic search term
We have computed the relative rank order of the participating
                                                                           variant generation. Journal of Documentation. 40(1):50–66.
teams based on overall average performance on task1 and task2 in
all four languages (simple average of F1-scores obtained by a          [4] Beeferman, D. and Berger, A. 2000. Agglomerative
team on task1 and task2 over all four languages). Since only four          clustering of a search engine query log. In Proceedings of the
teams have participated in all four languages, we have only shown          ACM SIGKDD International Conference on Knowledge
rank order of these four teams in Table 5.As we can see from               Discovery and Data mining. pages 407–416, Boston, MA.
Table 5, our system (Team code: KS_JU) obtains second best             [5] Jones, R., Rey, B., Madani, O. and Greiner, W. 2006.
accuracy among the four systems which participated in all four             Generating query substitutions. In Proceedings of the World
languages.                                                                 Wide Web Conference, pages 387–396, Edinburgh.
Table 5. Overall average performance of systems including              [6] Sahami, M. and Heilman, T. D. 2006. A Web-based kernel
task1 and task2 both over all four languages- Hindi, Punjabi,              function for measuring the similarity of short text snippets. In
Malayalam and Tamil                                                        Proceedings of the World Wide Web Conference. pages 377–
                                                                           386, Edinburgh.
    Team Name                      Overall Average F1-Score
                                                                       [7] Metzler, D., Dumais, S. and M., Christopher. 2007.
    HIT2016                        0.84                                    Similarity measures for short segments of text. In
                                                                           Proceedings of the European Conference on Information
    KS_JU                          0.81
                                                                           Retrieval (ECIR), pages 16–27, Rome.
    NLP-NITMZ                      0.78                                [8] Shi, X. and Christopher C. Y. 2007. Mining related queries
    JU-NLP                         0.53                                    fromWeb search engine query logs using an improved
                                                                           association rule mining model. JASIST. 58(12):1871–1883.
                                                                       [9] Ravichandran, D. and Hovy, E. 2002. Learning surface text
4. CONCLUSION                                                              patterns for a question answering system. In Proceedings of
                                                                           ACL, pages 41–47, Philadelphia, PA.
In this work, we implement a paraphrase detection system that can
detect paraphrases in four Indian Languages-Hindi, Punjabi,            [10] Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V. O.
Tamil and Malayalam. We use various lexical and semantic level              and Liu Y. 2007. Statistical machine translation for query
similarity measures for computing features for paraphrase                   expansion in         answer retrieval. In Proceedings of
detection task. We view paraphrase detection problem as a                   ACL.pages 464–471, Prague.
classification problem and use multinomial logistic regression         [11] Owczarzak, K., Groves, D., Genabith, J. V. and Way. A.
model as a classifier. Our model performs relatively better on              2006. Contextual bitext-derived paraphrases in automatic MT
task1 than on task2.                                                        evaluation. In Proceedings on the Workshop on Statistical
Our system has the scope for further improvement in the                     Machine Translation.pages 86–93, New York, NY.
following ways:                                                        [12] Zhou, L., Lin, C-Y and Hovy, E. 2006. Re-evaluating
    Word2Vec models requires large corpus for proper                       machine translation results with paraphrase support. In
     representation of word meaning, but for our present                    Proceedings of EMNLP, pages 77–84, Sydney.
     implementation, we have used a relatively small size corpus       [13] Callison-Burch, C., Koehn, P. and Osborne. M. 2006.
     for computing word vectors. Use of large corpus for                    Improved statistical machine translation using paraphrases.
     computing word vectors may improve semantic similarity                 In Proceedings of NAACL, pages 17–24, New York, NY.
     measure leading to improving system performance.
                                                                       [14] Fujita, A. and Sato. S. 2008a.A probabilistic model for
    Since we have only used multinomial logistic regression
                                                                            measuring grammaticality and similarity of automatically
     model as the classifier, there is also the scope to improve the
                                                                            generated paraphrases of predicate phrases.In Proceedings of
     system performance using other classifiers or combination of
                                                                            COLING.pages 225–232, Manchester.
     classifiers.
[15] Corley, C. and Mihalcea, R. 2005. Measuring the semantic        [26] Sekine, S.. 2006. On-demand information extraction. In
     similarity of texts. In Proceedings of the ACL Workshop on           Proceedings of COLING-ACL.pages 731–738, Sydney.
     Empirical Modeling of Semantic Equivalence and                  [27] Dagan, I., Glickman,O. and Magnini.B. 2006. The PASCAL
     Entailment, pages 13–18, Ann Arbor, MI.                              Recognising Textual Entailment Challenge. In Machine
[16] Uzuner, O¨. and Katz, B. 2005. Capturing expression using            Learning Challenges, Lecture Notes in Computer Science,
     linguistic information. In Proceedings of AAAI, pages 1124–          Volume 3944, Springer-Verlag, pages 177–190.
     1129, Pittsburgh, PA.                                           [28] Roy,B-H.,                Dagan,I.,              Dolan,B.,
[17] Brockett, C. and Dolan, W. B. 2005. Support vector                   Ferro,L.,Giampiccolo,D.,Magnini,B. and Szpektor,I. editors.
     machines for paraphrase identification and corpus                    2007. Proceedings of the Second PASCAL Challenges
     construction. In Proceedings of the Third International              Workshop on Recognizing Textual Entailment, Venice.
     Workshop on Paraphrasing, pages 1–8, Jeju Island.               [29] Sekine, S., Inui,K. Dagan,I., Dolan,B.Giampiccolo,D. and
[18] Marsi, E. and Krahmer, E. 2005b. Explorations in sentence            Magnini,B. editors. 2007. Proceedings of the ACL-PASCAL
     fusion. In Proceedings of the European Workshop on Natural           Workshop on Textual Entailment and Paraphrasing.
     Language Generation, pages 109–117, Aberdeen.                        Association for Computational Linguistics, Prague.
[19] Wu, D. 2005. Recognizing paraphrases and textual                [30] Giampiccolo, D., Dang,H., Dagan, I., Dolan,B.and
     entailment using inversion transduction grammars. In                 Magnini,B. editors. 2008. Proceedings of the Text Analysis
     Proceedings of the ACL Workshop on Empirical Modeling                Conference (TAC): Recognizing Textual Entailment Track,
     of Semantic Equivalence and Entailment, pages 25–30, Ann             Gaithersburg, MD.
     Arbor, MI.                                                      [31] "Gensim       -    Deep    learning   with    word2vec",
[20] Cordeiro, J., Dias, G. and Pavel, B. 2007a. A metric for             https://radimrehurek.com/gensim/models/word2vec.html,
     paraphrase detection. In Proceedings of the Second                   Retrieved in 2016.
     International Multi-Conference on Computing in the Global       [32] Mikolov,T., Chen,K.,Corrado,G. S. andDean,J.     2013
     Information Technology, page 7, Guadeloupe.                          Efficient estimation of word representations in vector
[21] Cordeiro, J., Dias, G. and Pavel, B. 2007b. New functions for        space.In ICLR Workshop Papers.
     unsupervised asymmetrical paraphrase detection.Journal of       [33] Mikolov,T., Sutskever,I., Chen,K., Corrado,G. S. and Dean,
     Software. 2(4):12–23.                                                J. 2013 Distributed representations of words and phrases and
[22] Das, D. and Smith,N. A. 2009. Paraphrase identification as           their compositionality. In NIPS, pages 3111–3119.
     probabilistic quasi-synchronous recognition. In Proceedings     [34] Hall,M.,Frank,E., Holmes,G. ,Pfahringer, B., Reutemann,P.,
     of ACL/IJCNLP.pages 468–476, Singapore.                              WittenI. H. (2009).The WEKA Data Mining Software: An
[23] Malakasiotis, P. 2009. Paraphrase recognition using machine          Update; SIGKDD Explorations, Volume 11.
     learning to combine similarity measures. In Proceedings of      [35] Anand Kumar, M., Singh, S.,Kavirajan, B., and Soman, K P.
     the ACL-IJCNLP 2009 Student Research Workshop, pages                 2016. DPIL@FIRE2016:Overview of shared task on
     27–35, Singapore.                                                    Detecting Paraphrases in Indian Languages. Working notes
[24] Dolan, B. and Dagan,I. editors. 2005. Proceedings of the             of FIRE 2016 - Forum for Information Retrieval Evaluation,
     ACL Workshop on Empirical Modeling of Semantic                       Kolkata, India, December 7-10,CEUR Workshop
     Equivalence and Entailment, Ann Arbor, MI.                           Proceedings,CEUR-WS.org.
[25] Barzilay, R. and McKeown, K. R. 2005. Sentence fusion for
     multidocument news summarization. Computational
     Linguistics. 31(3):297–328.