KS_JU@DPIL-FIRE2016:Detecting Paraphrases in Indian Languages Using Multinomial Logistic Regression Model Kamal Sarkar Department of Computer Science and Engineering Jadavpur University, Kolkata, India jukamal2001@yahoo.com ABSTRACT by applying paraphrasing techniques to generate similar orrelated In this work, we describe a system that detects paraphrases in queries [4][5][6][7][8]. Indian Languages as part of our participation in the shared Task Ravichandran and Hovy (2002)[9] use semi-supervised learning on detecting paraphrases in Indian Languages (DPIL) organized to generate several paraphrasepatterns for each question type and by Forum for Information Retrieval Evaluation (FIRE) in 2016. use them in an open-domain question answering system(QA Our paraphrase detection method uses a multinomial logistic system). Riezler et al. (2007)[10] expand a query by generating n- regression model trained with a variety of features which are best paraphrases for the queryand then using any novel words in basically lexical and semantic level similarities between two the paraphrases to expand the original query: sentences in a pair. The performance of the system has been evaluated against the test set released for the FIRE 2016 shared NLP applications such as machine translation and multi-document task on DPIL. Our systemachieves the highest f-measure of summarization, system performance are evaluated by comparing 0.95on task1 in Punjabi language.The performance of our system the system generated output and the references created by human. ontask1 in Hindi language is f-measure of 0.90. Out of 11 teams Manual creation of references is a laborious task. So, many participated in the shared task, only four teams participated in all researchers have suggested to use paraphrase generation four languages, Hindi, Punjabi, Malayalam and Tamil, but the techniques for generating variants of references for evaluating remaining 7 teams participated in one of the four languages. We summarization and machine translation output[11][12]. also participated in task1 and task2 both for all four Indian Callison-Burch, Koehn, and Osborne (2006) [13] use Languages. The overall average performance of our system automatically induced paraphrases toimprove a statistical phrase- including task1 and task2 overall four languages is F1-score of based machine translation system. Such a system works 0.81 which is the second highest score among the four systems bydividing the given sentence into phrases and translating each thatparticipated in all four languages. phrase individually bylooking up its translation in a table and Keywords using the translation of one of paraphrases of any source phrase Paraphrasing; Multinomial Logistic regression model; Sentence that does not have a translation in the table. similarity; Hindi language; Punjabi Language; Malayalam Like paraphrase generation, paraphrase recognition is also an Language; Tamil Language important task which is to assign a quantitativemeasurement to the semantic similarity of two phrases [14] or even two given pieces of text[15][16].In other words, the paraphrase recognition task is 1. INTRODUCTION to detect or recognizewhich sentences in the two texts are paraphrases of each other [17][18][19][20][21][22][23]. The latter The concept of paraphrasing is defined in [1] as follows: formulation of the task has becomepopular in recent years [24] “The concept of paraphrasing is most generally defined on the andparaphrase generation techniquesthat canbenefit immensely basis of the principle ofsemantic equivalence: A paraphrase is an fromthis task. In general, paraphrase recognition can be very alternative surface form in the same languageexpressing the same helpfulfor several NLP applications such as text-to-text semantic content as the original form.” Paraphrases may occur at generationand information extraction.Plagiarism detection is various levels such as lexical paraphrases (synonyms, another important application area which needs the paraphrase hyperonymy etc.) , phrasal paraphrase (phrasal fragments sharing Identification technique to detect the sentences which are the same semantic content) sentential paraphrases ( for example, I paraphrases of others. finished my work, I completed my assignment)[1]. Detecting redundancy is a very important issuefor a multi- The task of paraphrasing can be of two types based on its document summarization system because two sentences from applications: paraphrase generation and paraphrase recognition. In different documents may convey the same semanticcontent and to broader context, paraphrase generation has various applications. make summary more informative, the redundant sentences should One of the most common applications of paraphrasingis the not be selected in the summary. Barzilay and McKeown automatic generation of query variants for submission to (2005)[25] exploit the redundancy present in a given setof information retrieval systems Culicover(1968)[2]describes an sentences by fusing into a single coherent sentence the sentence earlier approach to query keyword expansionusing paraphrases. segments which are paraphrases of each other. Sekine (2006)[26] The approach in [3] generates several simple variants for shows how to use paraphrase recognition to cluster compound nouns present in queriesto enhance a technical togetherextraction patterns to improve the cohesion of the information retrieval system. In fact, the information retrieval extracted information. community has extensively exploredthe task of query expansion Another recently proposed natural language processing task is that 2.1.1 Cosine Similarity of recognizingtextual entailment: A piece of text T is said to entail a hypothesis H if humans readingT will infer that H is most To compute cosine similarity, we represent each sentence in a pair likely true [27][28][29][30]. using a bag-of-words model. Then cosine similarity is computed between two vectors where each vector corresponds to a sentence One of the important requirements for initiating research in in a pair. Basically we consider the set of distinct words in the pair paraphrase detection is creation of annotated corpus. The most as the vector of features based on which the cosine similarity commonly used corpora for paraphrase detection is the MSRP between two sentences is computed. The size of the vector is n corpus1 which contains 5,801 English sentence pairs from news where n is |S1US2| , S1 is the set of words in the sentence 1 and S2 articles manually labelled with 67% paraphrases and 33% non- is the set of words in sentence 2.. Each sentence in a pair is paraphrases. The shared task on Semantic Textual Similarity mapped to vector of length n. If the vector for sentence 1 is and the vector for sentence 2 is , where vi benchmark datasets for the similar kind of task, but its main focus and ui are the values of i-th word feature in sentence 1 and was to develop systems that can examine the degree of semantic sentence 2 respectively, the cosine similarity between two vectors equivalence between two sentences unlike paraphrase is computed as follows: detectionwhich determines yes/no decision for given pair of sentences. + +⋯ � � �� � ,� = � , = (1) Howeverthere are at present no annotated corpora or automated √ + +...�� √ + +... � semantic interpretation systems available for Indian languages. So Here the vector component vi in vector V corresponds to value of creating benchmark data for paraphrases is necessary. With this the i-th word feature which is basically the TF*IDF weight of the motivation, creating annotated corpora for paraphrase detection corresponding word. Similarly vector U is also constructed for the and utilizing that data in open shared task competitions is a sentence 2. commendable effort which will motivate the research community for further research in Indian languages.On this note, the shared 2.1.2 Word Overlap- Exact Match task on detecting Paraphrases in Indian Languages (DPIL)@FIRE 2016 is a good effort towards creating benchmark data for We also used the word overlap measure as a feature for paraphrases in Indian Languages. In this shared task, there were paraphrase detection. If two sentences in the pair are S1 and S2, two sub-tasks: task1 is to classify a given pair of sentences in the similarity based on word overlap is computed as follows: Punjabi language as paraphrases (P) or not paraphrases (NP) and |� ∩ | task2 is to identify whether a given pair of sentences are �� � ,� = |� |+| (2) | completely equivalent (E) or roughly equivalent (RE) or not equivalent (NE). Four Indian Languages –Hindi, Punjabi, Where |� ∩ S | is the number of words common between two Malayalam and Tamil were considered in this shared task. We sentences.and|S| is the length of sentence S in terms of words. describe in the subsequent sections our proposed methodology 2.1.3 Stemmed Word Overlap used to implement our system participated in the shared task and we also present performance comparisons of our system with Since the most Indian languages are highly inflectional, stemming other systems participated in the competition. is an essential step while comparing words. Accurate stemmers are also not available for Indian languages. So, we applied a 2. OUR PROPOSED METHODOLOGY lightweight approach to stemming. In this approach, when we We view the paraphrase detection problem as classification match two words, we find the unmatched portions of two words. problem. Given a pair of sentences, the task1 is to classify If we find that the matched portion of two words is greater than or whether the pair of sentences is a paraphrase (P) or not – equal to a threshold T1 and the minimum of unmatched portions paraphrase (NP). When task1 is a two class problem, task2 is a of word1 and word2 is less than or equal to a threshold T2, we three class problem. The task2 is to classify a given pair of assume that there exists a match between word1 and word2. sentences into one of three categories: completely equivalent (E) Stemmed Word overlap is computed using equation (2) with the or roughly equivalent (RE) or not equivalent (NE). only difference in word matching criteria. We set T1 to 3 and T2 to 2. We indicate such similarity between two sentences S1 and Since the problems are basically a classification problem, we have S2as �� � ,� . used a traditional classifier for implementing our system. We have used multinomial logistic regression classifier with ridge 2.1.4 N-gram Based Similarity estimator for both task1 and task2.Each pair of sentences is considered as a training instance. Features are extracted from the The similarity measures mentioned above compares sentences training pairs. We consider a number of features for representing based on individual word matching. But bag-of-words model does sentence pairs. The features which we have used for implementing not take into account the context of occurrences of words. We our system are described in the subsequent subsections: consider n-gram based sentence similarity as one of the features for paraphrase detection. 2.1 Features We compute n-gram based similarity as follows: We have used various similarity measures as the features. �� � ,� = (3) 1 + https://www.microsoft.com/en- us/download/confirmation.aspx?id=52398 Where 2 https://www.cs.york.ac.uk/semeval-2012/task6/index.html =# − ℎ � � � a=# − WEKA is machine learning workbench consists of many machine � � learning algorithms for data mining tasks [34]. = # � We set the “ridge” parameter to 0.4 for all our experiments. The other parameters of the classifiers are set to default values. We have only considered bigrams(n=2) for implementing our present system. 3. EVALUATION AND RESULTS 2.1.5 Semantic Similarity 3.1 Description of Datasets We have obtained the datasets from the organizers of the shared We have used semantic similarity between two sentences as one task on detecting paraphrases in Indian Languages (DPIL) held in of the features for paraphrase detection. To compute semantic conjunction with FIRE 2016 @ ISI – Kolkata. The datasets similarity between sentences, we calculate whether words in the released for four Indian languages-(1) Hindi, (2) Punjabi, (3) sentences are semantically similar or not. To determine whether Tamil and (4) Malayalam. For each language, two paraphrase two words are semantically similar or not, we have cosine detection tasks were defined: Task1- to classify a given pair of similarity between word vectors for the words. The vector sentences in Punjabi language as paraphrases (P) or not representations of words learned by word2vec models[31] have paraphrases (NP) and Task2- to identify whether a given pair of been used to carry semantic meanings. Word2vec is a group of sentences are completely equivalent (E) or roughly equivalent related models used to produce word embeddings[32] [33] (RE) or not equivalent (NE). The training data set for task1 Word2vec takes as its input a large corpus of text and produces a contains a collection of sentence pairs labelled as P (paraphrase) high-dimensional space where each unique word in the corpus is or NP (not a paraphrase) and the training dataset for task2 assigned a corresponding vector in the space. contains a collection of sentence pairs labelled as completely equivalent (E) or roughly equivalent (RE) or not equivalent (NE). Such representation of words into vectors positions the word in The description of the datasets is shown in Table 1 and Table 2. the vector space such that words that share common contexts are positioned in close proximity to one another in the space Table 1: Description of Data sets for Task1 Language Training Data Size Test Data Size We have used word2vec model available in Python for computing word vectors for the words. We have used gensim word2vec Hindi 2500 900 model under Python platform with dimension set to 50, Punjabi 1700 500 min_countto 5(ignore all words with total frequency lower than Malayalam 2500 900 this). The training algorithm used for developing word2vec model Tamil 2500 900 is CBOW (Continuous Bag of words). The other parameters of word2vec model are set to default values. If the cosine similarity Table 2: Description of Data sets for Task2 between the word vectors for the two words is greater than a Language Training Data Size Test Data Size threshold value, we consider these two words are semantically Hindi 3500 1400 similar. We set the threshold value to 0.8. We combine a small Punjabi 2200 750 amount of additional news data with the training data for each Malayalam 3500 1400 language to create the corpus used for computing word vectors. Tamil 3500 1400 Size of the corpora used to compute word vectors for the different languagesis as follows: For Hindi, 1.93 MB(8752 sentences), for Punjabi, 1.5 MB(5848 3.2 Evaluation For evaluating the system performance, two evaluation metrics - sentences), for Tamil, 2.20 MB (7847 sentences) and for Accuracy and F-measure have been used. Accuracy is defined as Malayalam, 2.12 MB (7448 sentences) follows: We compute semantic similarity between two sentences as #o o ly la i i Pai follows: Accuracy = (5) o al # o Pai �� � ,� = (4) + Though the same formula was used to calculate accuracy for both where the tasks-Task1 and Task2, the formula used to calculate F- measure for Task1 was not the same for Task2. The F-measure =# � � � ℎ � � � used for evaluating task1 is defined as follows: f=# � � � F1-Score = F1 measure of Detecting Paraphrases=F1- score over = # � � P class only. 2.2 Our Used Classifier F-measure for the task2 is defined as: We have used multinomial logistic regression as the classifier for paraphrase detection task. We view the paraphrase detection F1-Score = Macro F1 Score which is an average of F1 scores of problem as a pattern classification problem where each pair of all three classes -P, NP and SP. sentences under consideration of paraphrase checking is mapped to a pattern vector based on the features discussed in section 2.1. We have chosen multinomial logistic regression classifier from WEKA. This is present in WEKA with the name “logistic”. 3.3 Results labeled files to the organizers of the contest within a short period of time. Thereafter they evaluated the system output and announced the results. The official results of the various systems For system development, we have used training data [35] released participated in Task1 and Task 2 of the contest are shown in Table for the shared task. At the first stage of this shared task, 3 and Table 4 respectively. As we can see from the tables, no participants were given the training data sets for system system performs equally well in both the tasks- task1 and task2 development. At the second stage, the unlabeled test data sets [35] acrossall languages. Some systems have performed the best in were supplied and the participants were asked to submit the some languages on task1 and some other systems have performed the best in some other languages on the same task. This is also Table 3. Official results obtained for Task 1 @ DPIL 2016 true for task2. Team Langua Task Accuracy F1 Measure Table 4. Official results obtainedfor Task 2 by the various Name ge / participating teams @ DPIL 2016 Macro F1Measure Team Language Task Accurac F1 KS_JU Hindi Task1 0.90666 0.9 Name y Measure/Ma KS_JU Malayal Task1 0.81 0.79 cro am F1 Measure KS_JU Punjabi Task1 0.946 0.95 KS_JU Hindi Task2 0.85214 0.84816 KS_JU Tamil Task1 0.78888 0.75 KS_JU Malayala Task2 0.66142 0.65774 NLP- Hindi Task1 0.91555 0.91 m NITMZ KS_JU Punjabi Task2 0.896 0.896 NLP- Malayal Task1 0.83444 0.79 KS_JU Tamil Task2 0.67357 0.66447 NITMZ am NLP- Punjabi Task1 0.942 0.94 NLP- Hindi Task2 0.78571 0.76422 NITMZ NITMZ NLP- Tamil Task1 0.83333 0.79 NLP- Malayala Task2 0.62428 0.60677 NITMZ NITMZ m NLP- Punjabi Task2 0.812 0.8086 HIT2016 Hindi Task1 0.89666 0.89 NITMZ HIT2016 Malayal Task1 0.83777 0.81 NLP- Tamil Task2 0.65714 0.63067 am NITMZ HIT2016 Punjabi Task1 0.944 0.94 HIT2016 Tamil Task1 0.82111 0.79 HIT2016 Hindi Task2 0.9 0.89844 HIT2016 Malayala Task2 0.74857 0.74597 JU-NLP Hindi Task1 0.8222 0.74 m JU-NLP Malayal Task1 0.59 0.16 HIT2016 Punjabi Task2 0.92266 0.923 am HIT2016 Tamil Task2 0.755 0.73979 JU-NLP Punjabi Task1 0.942 0.94 JU-NLP Tamil Task1 0.57555 0.09 JU-NLP Hindi Task2 0.68571 0.6841 JU-NLP Malayala Task2 0.42214 0.3078 BITS- Hindi Task1 0.89777 0.89 m PILANI JU-NLP Punjabi Task2 0.88666 0.88664 JU-NLP Tamil Task2 0.55071 0.4319 DAVPBI Punjabi Task1 0.938 0.94 BITS- Hindi Task2 0.71714 0.71226 CUSAT_ Malayal Task1 0.80444 0.76 PILANI TEAM am DAVPBI Punjabi Task2 0.74666 0.7274 ASE Hindi Task1 0.35888 0.34 CUSAT_ Malayala Task2 0.50857 0.46576 NLP@K Tamil Task1 0.82333 0.79 TEAM m EC ASE Hindi Task2 0.35428 0.3535 Anuj Hindi Task1 0.92 0.91 NLP@KE Tamil Task2 0.68571 0.66739 CUSAT_ Malayal Task1 0.76222 0.75 C NLP am Anuj Hindi Task2 0.90142 0.90001 CUSAT_ Malayala Task2 0.52071 0.51296 NLP m  Most Indian languages are highly inflectional. So, use of morphological analyzer/stemmer/lemmatizer may improve As we can see from the tables, only 4 teams out of 11 participated the system performance. teams submitted their systems for all four languages- Hindi, Punjabi, Malayalam and Tamil and the remaining 7 teams 5. ACKNOWLEDGMENTS participated in only one of the four languages. This research work has received support from the project entitled We have shown in the tables in bold font the performance scores ‘‘Design and Development of a System for Querying, Clustering highest in a particular task for a particular language. It is also and Summarization for Bengali’’ funded by the Department of evident from the tables that most systems perform well on Punjabi Science and Technology, Government of India under the SERB and Hindi languages, but they show relatively poor performance scheme. in Tamil and Malayalam languages. We think that the main reason 6. REFERENCES for achieving the better performances inPunjabi and Hindi [1] Madnani, N., and Bonnie J. D. 2010 Generating phrasal and language domain is the nature of training and testing data sets sentential paraphrases: A survey of data-driven methods. supplied for those languages. Most likely, that is why most Computational Linguistics. 36.3 (2010): 341-387. systems perform almost equally well on the Punjabi and Hindi languages. Another reason for having poor performance on Tamil [2] Culicover, P.W. 1968. Paraphrase generation and and Malayalam may be the complex morphology of these information retrieval from stored text. Mechanical languages. Translation and Computational Linguistics. 11(1–2):78–88. [3] Sp¨arck-Jones, K. and Tait, J. I. 1984. Automatic search term We have computed the relative rank order of the participating variant generation. Journal of Documentation. 40(1):50–66. teams based on overall average performance on task1 and task2 in all four languages (simple average of F1-scores obtained by a [4] Beeferman, D. and Berger, A. 2000. Agglomerative team on task1 and task2 over all four languages). Since only four clustering of a search engine query log. In Proceedings of the teams have participated in all four languages, we have only shown ACM SIGKDD International Conference on Knowledge rank order of these four teams in Table 5.As we can see from Discovery and Data mining. pages 407–416, Boston, MA. Table 5, our system (Team code: KS_JU) obtains second best [5] Jones, R., Rey, B., Madani, O. and Greiner, W. 2006. accuracy among the four systems which participated in all four Generating query substitutions. In Proceedings of the World languages. Wide Web Conference, pages 387–396, Edinburgh. Table 5. Overall average performance of systems including [6] Sahami, M. and Heilman, T. D. 2006. A Web-based kernel task1 and task2 both over all four languages- Hindi, Punjabi, function for measuring the similarity of short text snippets. In Malayalam and Tamil Proceedings of the World Wide Web Conference. pages 377– 386, Edinburgh. Team Name Overall Average F1-Score [7] Metzler, D., Dumais, S. and M., Christopher. 2007. HIT2016 0.84 Similarity measures for short segments of text. In Proceedings of the European Conference on Information KS_JU 0.81 Retrieval (ECIR), pages 16–27, Rome. NLP-NITMZ 0.78 [8] Shi, X. and Christopher C. Y. 2007. Mining related queries JU-NLP 0.53 fromWeb search engine query logs using an improved association rule mining model. JASIST. 58(12):1871–1883. [9] Ravichandran, D. and Hovy, E. 2002. Learning surface text 4. CONCLUSION patterns for a question answering system. In Proceedings of ACL, pages 41–47, Philadelphia, PA. In this work, we implement a paraphrase detection system that can detect paraphrases in four Indian Languages-Hindi, Punjabi, [10] Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V. O. Tamil and Malayalam. We use various lexical and semantic level and Liu Y. 2007. Statistical machine translation for query similarity measures for computing features for paraphrase expansion in answer retrieval. In Proceedings of detection task. We view paraphrase detection problem as a ACL.pages 464–471, Prague. classification problem and use multinomial logistic regression [11] Owczarzak, K., Groves, D., Genabith, J. V. and Way. A. model as a classifier. Our model performs relatively better on 2006. Contextual bitext-derived paraphrases in automatic MT task1 than on task2. evaluation. In Proceedings on the Workshop on Statistical Our system has the scope for further improvement in the Machine Translation.pages 86–93, New York, NY. following ways: [12] Zhou, L., Lin, C-Y and Hovy, E. 2006. Re-evaluating  Word2Vec models requires large corpus for proper machine translation results with paraphrase support. In representation of word meaning, but for our present Proceedings of EMNLP, pages 77–84, Sydney. implementation, we have used a relatively small size corpus [13] Callison-Burch, C., Koehn, P. and Osborne. M. 2006. for computing word vectors. Use of large corpus for Improved statistical machine translation using paraphrases. computing word vectors may improve semantic similarity In Proceedings of NAACL, pages 17–24, New York, NY. measure leading to improving system performance. [14] Fujita, A. and Sato. S. 2008a.A probabilistic model for  Since we have only used multinomial logistic regression measuring grammaticality and similarity of automatically model as the classifier, there is also the scope to improve the generated paraphrases of predicate phrases.In Proceedings of system performance using other classifiers or combination of COLING.pages 225–232, Manchester. classifiers. [15] Corley, C. and Mihalcea, R. 2005. Measuring the semantic [26] Sekine, S.. 2006. On-demand information extraction. In similarity of texts. In Proceedings of the ACL Workshop on Proceedings of COLING-ACL.pages 731–738, Sydney. Empirical Modeling of Semantic Equivalence and [27] Dagan, I., Glickman,O. and Magnini.B. 2006. The PASCAL Entailment, pages 13–18, Ann Arbor, MI. Recognising Textual Entailment Challenge. In Machine [16] Uzuner, O¨. and Katz, B. 2005. Capturing expression using Learning Challenges, Lecture Notes in Computer Science, linguistic information. In Proceedings of AAAI, pages 1124– Volume 3944, Springer-Verlag, pages 177–190. 1129, Pittsburgh, PA. [28] Roy,B-H., Dagan,I., Dolan,B., [17] Brockett, C. and Dolan, W. B. 2005. Support vector Ferro,L.,Giampiccolo,D.,Magnini,B. and Szpektor,I. editors. machines for paraphrase identification and corpus 2007. Proceedings of the Second PASCAL Challenges construction. In Proceedings of the Third International Workshop on Recognizing Textual Entailment, Venice. Workshop on Paraphrasing, pages 1–8, Jeju Island. [29] Sekine, S., Inui,K. Dagan,I., Dolan,B.Giampiccolo,D. and [18] Marsi, E. and Krahmer, E. 2005b. Explorations in sentence Magnini,B. editors. 2007. Proceedings of the ACL-PASCAL fusion. In Proceedings of the European Workshop on Natural Workshop on Textual Entailment and Paraphrasing. Language Generation, pages 109–117, Aberdeen. Association for Computational Linguistics, Prague. [19] Wu, D. 2005. Recognizing paraphrases and textual [30] Giampiccolo, D., Dang,H., Dagan, I., Dolan,B.and entailment using inversion transduction grammars. In Magnini,B. editors. 2008. Proceedings of the Text Analysis Proceedings of the ACL Workshop on Empirical Modeling Conference (TAC): Recognizing Textual Entailment Track, of Semantic Equivalence and Entailment, pages 25–30, Ann Gaithersburg, MD. Arbor, MI. [31] "Gensim - Deep learning with word2vec", [20] Cordeiro, J., Dias, G. and Pavel, B. 2007a. A metric for https://radimrehurek.com/gensim/models/word2vec.html, paraphrase detection. In Proceedings of the Second Retrieved in 2016. International Multi-Conference on Computing in the Global [32] Mikolov,T., Chen,K.,Corrado,G. S. andDean,J. 2013 Information Technology, page 7, Guadeloupe. Efficient estimation of word representations in vector [21] Cordeiro, J., Dias, G. and Pavel, B. 2007b. New functions for space.In ICLR Workshop Papers. unsupervised asymmetrical paraphrase detection.Journal of [33] Mikolov,T., Sutskever,I., Chen,K., Corrado,G. S. and Dean, Software. 2(4):12–23. J. 2013 Distributed representations of words and phrases and [22] Das, D. and Smith,N. A. 2009. Paraphrase identification as their compositionality. In NIPS, pages 3111–3119. probabilistic quasi-synchronous recognition. In Proceedings [34] Hall,M.,Frank,E., Holmes,G. ,Pfahringer, B., Reutemann,P., of ACL/IJCNLP.pages 468–476, Singapore. WittenI. H. (2009).The WEKA Data Mining Software: An [23] Malakasiotis, P. 2009. Paraphrase recognition using machine Update; SIGKDD Explorations, Volume 11. learning to combine similarity measures. In Proceedings of [35] Anand Kumar, M., Singh, S.,Kavirajan, B., and Soman, K P. the ACL-IJCNLP 2009 Student Research Workshop, pages 2016. DPIL@FIRE2016:Overview of shared task on 27–35, Singapore. Detecting Paraphrases in Indian Languages. Working notes [24] Dolan, B. and Dagan,I. editors. 2005. Proceedings of the of FIRE 2016 - Forum for Information Retrieval Evaluation, ACL Workshop on Empirical Modeling of Semantic Kolkata, India, December 7-10,CEUR Workshop Equivalence and Entailment, Ann Arbor, MI. Proceedings,CEUR-WS.org. [25] Barzilay, R. and McKeown, K. R. 2005. Sentence fusion for multidocument news summarization. Computational Linguistics. 31(3):297–328.