=Paper= {{Paper |id=Vol-1737/T6-6 |storemode=property |title=HIT2016@DPIL-FIRE2016: Detecting Paraphrases in Indian Languages based on Gradient Tree Boosting |pdfUrl=https://ceur-ws.org/Vol-1737/T6-6.pdf |volume=Vol-1737 |authors=Leilei Kong,Kaisheng Chen,Liuyang Tian,Zhenyuan Hao,Zhongyuan Han,Haoliang Qi |dblpUrl=https://dblp.org/rec/conf/fire/KongCTHHQ16 }} ==HIT2016@DPIL-FIRE2016: Detecting Paraphrases in Indian Languages based on Gradient Tree Boosting== https://ceur-ws.org/Vol-1737/T6-6.pdf
    HIT2016@DPIL-FIRE2016:Detecting Paraphrases in Indian
          Languages based on Gradient Tree Boosting
                 Leilei Kong*                               Kaisheng Chen                                  Liuyang Tian
         1
      College of Information and      School of Computer Science and          College of Information and
Communication Engineering, Harbin Technology, Heilongjiang Institute of Communication Engineering, Harbin
Engineering University, Harbin, China    Technology, Harbin, China;     Engineering University, Harbin, China
 2
  School of Computer Science and             +86 451 88028910                     +86 451 88028910
Technology, Heilongjiang Institute of kaishengchen1997@outlook.com         tianliuyang2016@outlook.com
     Technology, Harbin, China;
         +86 451 88028910
     kongleilei1979@gmail.com

               ZhenyuanHao                                  Zhongyuan Han                                  Haoliang Qi
     School of Computer Science and               School of Computer Science and               School of Computer Science and
    Technology, Heilongjiang Institute of        Technology, Heilongjiang Institute of        Technology, Heilongjiang Institute of
        Technology, Harbin, China;                   Technology, Harbin, China;                   Technology, Harbin, China;
            +86 451 88028910                             +86 451 88028910                             +86 451 88028910
         zhenyuan_hao@163.com                       Hanzhongyuan@gmail.com                         haoliang.qi@gmail.com


ABSTRACT                                                                sentence is expressed in another sentence using different words”.
Detecting paraphrase is an important and challenging task. It can       The proposed task is focused on sentence level paraphrase
be used in paraphrases generation and extraction, machine               identification for Indian languages (Tamil, Malayalam, Hindi and
translation, question and answer and plagiarism detection. Since        Punjabi). There are two tasks are proposed by FIRE. The first sub
the same meaning of a sentence is expressed in another sentence         task is: given a pair of sentences from newspaper domain, the task
using different words, it makes the traditional methods based on        is to classify them as paraphrases (P) or not paraphrases (NP), and
lexical similarity ineffective. In this paper, we describe a strategy   the second one is: given two sentences from newspaper domain,
of Detecting Paraphrases in Indian Languages, which is a                the task is to identify whether they are completely equivalent (E)
workshop track proposed by Forum Information Retrieval                  or roughly equivalent (RE)1 or not equivalent (NE) [6].
Evaluation 2016. We formalize this task as a classification             The paraphrased sentences always retain the semantic meaning
problem, and a supervised learning method based on Gradient             and usually obfuscated by manipulating the text and changing
Boosting Tree is utilized to classify the types of paraphrase           most of its appearance. The words in the original sentence is
plagiarism. Inspired by the Meteor evaluation metrics of machine        replaced with synonyms/antonyms, and short phrases are inserted
translation, the Meteor-like features are used for the classifier.      to change the appearance, but not the idea, of the text (Alzahrani
Evaluation shows the performance of our approach, which                 et al., 2012). Otherwise, the sentence reduction, combination,
achieved the highest Overall Score (0.77), the highest F1 measure       restructuring, paraphrasing, concept generalization, and concept
for both Task1 and Task2 on Malayalam and Tamil, and the                specification also are used to paraphrase the sentence. All of these
highest F1 measure on Punjabi Task2 in the 2016 FIRE Detecting          operations make the paraphrases identification difficult, because it
Paraphrase in Indian Languages task.                                    involves the semantic similarity, lexical comprehension,
                                                                        syntactical identification, morphological analysis, and so on.
CCS Concepts
• Information systems➝Information retrieval                             Since the appearance have changed beyond recognition in
                                                                        paraphrased sentence, the methods only relying on the term
Keywords                                                                matching or single feature may be become ineffective in detecting
Paraphrase; Classification; Indian Languages; Gradient Tree             paraphrase. More features should be integrated in the model to
Boosting.                                                               detecting paraphrase. So we consider a machine learning method
                                                                        based on classification to address this problem.
                                                                        Intuitively, the former sub tasks can be viewed as a two-category
1. INTRODUCTION                                                         classification and the latter is multi-category classification. If we
Detecting Paraphrasing has attracted the attention of researchers       formalize the task of detecting paraphrase as a classification
in recent years. It is widely used in paraphrases generation and        problem, our objectives focus on answeringthe following two
extraction, machine translation, question and answer and                questions: (1) Which classification-based methods can effectively
plagiarism detection.                                                   be applied to the detecting paraphraseproblem, and (2) which
                                                                        features should be used in the classifier.
In the task description of Detecting Paraphrases in Indian
Languages of Forum Information Retrieval Evaluation 2016                For the first problem, we choose Gradient Tree Boosting to learn t
(FIRE 2016)1, the paraphrase is defined as “the same meaning of a       he classifier [2,3]. Regarding the second issues, inspired by the
                                                                        METEOR evaluation metrics of machine translation [4], we design
1
 http://nlp.amrita.edu/dpil_cen/
*
    Corresponding author
the METEOR-like features for our classifier. Integrating some
classical similarity measure feature, we develop the feature set.
Using the training and testing corpora of Detecting Paraphrases in
Indian Languages proposed by FIRE, we rigorously evaluate
various aspects of our classification method for detecting
paraphrases. Experimental results show that the proposed method
can effectively classify the paraphrases pairs.
The rest of this paper is organized as follows. In Section 2, we ana
lyze the problem of Detecting Paraphrases in Indian Languages, in
troduce the model we used, and describe the features which the cl
assifier uses. In Section 3, we report the experimental results and
performance comparisons with the other detection methods. And i
n the last section we conclude our study.

2. CLASSIFICATION FOR DPIL                                                Figure 2. Score distribution of Jaccard coefficient on
We now explore machine-learning methods for Detecting                     Malayalam (up) sub corpora and all four languages
Paraphrases in Indian Languages. In this section, we analyze the          corpora(down)
main issues of Detecting Paraphrases in Indian Languages firstly.
And then a classification method based on boosting tree is
proposed. Finally, we describe the features which the classifier          xi  ( xi(1) , xi( 2) ,..., xi( n) )T , i  1,2,..., N . We use a function to get each
used.                                                                     xi defined as follows.
2.1 Problem Analysis                                                                                 x(i )  (oi , pi )                              (1)
  As we have discussed in above section, paraphrases
identification is difficult to detect. The traditional similarity         where x(i )  (oi , pi ) is a mapping onto features that describes the
computing methods, such as Cosine Distance, Jaccard Coefficient,          paraphrase between the i-th original sentence oi and the
Dice Distance, may be ineffective for paraphrases. Figure 1               paraphrased sentence pi.
exemplifies the paraphrases cases.
                                                                          And yiis the label of xi to denote the category of each xi. For the
                                                                          task 1, we define yi∈{P, NP}, and for task 2, we define yi∈{E,
                                                                          RE, NE}.
                                                                          Then the framework of learning problem can be depicted in
                                                                          Figure 3.


                  Figure 1.A paraphrases cases
From Figure 1, we can see that the two sentences having the
paraphrasing relationship are different in their appearance.
Furthermore, we conduct the analysis on 1000 randomly selected
cases with paraphrase relationship on Malayalam sub corpora and
all four languages corpora. Figure 2 displays the distribution with
Jaccard Coefficient and METEOR-F1 as y-coordinate.                               Figure 3. The framework of Detection Paraphrase
It is easy to detect from Figure 2 that the scores of Jaccard             Then, given D as training data, the learning system will learn a
coefficient are all very low, the average score is only 0.1332.           condition probability P(Y|X) based on the training data. Then
Since there are few the same terms between the two sentence,              given a new input xn+1, the classification system gives the
only considering the term similarity may be inadequate. We                corresponding output label yn+1according to the learned classifier.
analysis for identifying the relationship of them, more feature
should be considered.                                                     2.3 Classification                          Model:                   Gradient
                                                                          TreeBoosting
2.2 Problem Definition                                                    Boosting tree is one of the best methods to improve the performan
According the description of detection paraphrases, we formalize          ce of statistical learning
the problem as follows. Denote a pair sentences as s i=(oi, pi),          [2,3]
                                                                                . In this experiment, we use the Gradient Tree Boosting as the
where oi is the original sentence and pi is the paraphrased sentence.     classification algorithm to learn the classifier. Gradient boosting is
Note that given a pair (oi, pi) on the training data, we can get its       typically used with decision trees (especially CART trees) of a fi
label, which make learn a model for classification possible. Let          xed size as base learners.
the train corpora D={(x1,y1), (x2,y2), ....., (xi,yi),......, (xn,yn)},
             N
where xi∈R is a feature vector of siand                                   2.4 Features
                                                                          There are two groups of features, the similarity-based features and
                                                                          the METEOR-like features, are utilized to define x(i )  (oi , pi ) .
                                                                          The similarity-based features are used to capture the matching
                                                                          degree of oi and pi, and METEOR-like features is used to describe
                                                                          the semantic similarity. Specially, the METEOR-like features is
inspired by METEOR, the measure metrics for machine                                                            Table 3. Corpus statistics of DPIL 2016 on Task2
translation, which is used to evaluate the performance of a
                                                                                                                                   Train                            Test
translator. Table 1 list these features in detail.
                                                                                                           Language   Hin Mal Pun Tam             All   Hin Mal Pun Tam           all
              Table 1. Features for detecting paraphrases
                                                                                                          SampleNum
 Features             Computing methods                                    Description                              3500 3500 2200 3500 12700 1400 1400 750 1400 4950
                                                                                                             ber
                                                                 The ratio of number of shared
Jaccard                              si    rj                                                              Avg blank 34       18       41   24    28    42    19    41     28     31
                  JC  si , rj                                 terms against total number of
Coefficient                          si    rj                                                             terms 4gram 131    164   156      178   158   154   177   157    207    176
                                                                 terms.
                                                                  
                                                               xi  yi is the inner product of x    3.2 Experimental Settings
Cosine                               x y
                  CS ( xi , yi )   i i
                                  || xi ||  || yi ||            and y, and || x || represents the
Similarity                                                                                            3.2.1 Pre-processing
                                                                 length of vector.
                                                                                                      For each sentence pair in training data and test data, wefirstly
                                                                 common (s, r) is the total           remove numbers, punctuation and blank spaces. Then, we adopt
                                                                 number of the common                 two types of word segmentation, one is taking each word as a
Dice                                 2  common(s,r)
                  DC( s, r )                                    unigrams in s and r, and len(r)
Coefficient                          len ( s)  len (r )
                                                                 and len(s) are the total number
                                                                                                      term unit, and the other is based on the n-gram, which the words
                                                                 of unigrams in r and s.              in sentence are segmented in the form of n-gram. For example,
                                                                                                      Figure 4 shows an example of 4-gram. In the experiments, the n is
                                                                 common (s, r) is the total           set empirically.
                     common( s, r )                              number of the common
METEOR            P                                             unigrams in s and r, and len(r)
Precision               len (r )
                                                                 is the total number of unigrams
                                                                 in r.
METEOR                    common( s, r )                         len(s) is the total number of
                  R
Recall                       len ( s)                            unigrams in s.
METEOR                       2 PR                                Combine the precision and
                  F1 
F1                           RP                                 recall.

METEOR                                    10PR                   Combine the precision and
                                                                                                                        Figure 4. The example of 4-gram
                  Fm ean
Fmean                                     R  9P                 recall.                              3.2.2 Parameter Tuning
                                   len(chunks) 
                                                           3
                                                                                                      On the training corpus, the classifier is trained by using sklearn
METEOR            Penalty 0.5                               len(chunks)is the number of the
                                               ) 
                                   com m on(s,r                                                      Boosting Classifier Gradient 2. The learning rate (learning rate
Penalty                                                          longer matchesin each chunk.
                                                                                                      shrinks the contribution of each tree by learning rate) is set as 1.0,
                                                                                                      the max_depth (the maximum depth limits the number of nodes in
                  Score  Fmean 1  Penalty
METEOR
                                                                 The overall METEOR score.            the tree) is set as 1, the random state (random state is the seed
score
                                                                                                      used by the random number generator) is set as 0. All the other
                                                                                                      parameters are set as their default values except the parameter
3. Experiments                                                                                        n_estimators (The number of boosting stages to perform).
3.1 Dataset                                                                                           The other parameters, including the methods of word
The evaluation dataset is the Detecting Paraphrase in India                                           segmentation, the method of pre-processing method, the n value
Language (DPIL) which is mainly obtained from the newspaper.                                          of ngram, are set experimentally.
The details of this corpora can be found in                                                           We use the cross validation to tune the parameter n_estimators.
http://nlp.amrita.edu/dpil_cen/.                                                                      The training corpora is randomly divided into two equal parts, and
                                                                                                      one is chosen as the training data and the other as the validation
The corpora are divided into two different subsets: Task1-set and                                     data.
Task2-set, and each sub set contains four different categories
India language: Tamil, Malayalam, Hindi and Punjabi. The                                              3.3 Performance Measures
Task1-set contains 12400 samples, including 9200 training                                             In this evaluation experiment, the experimental results are
samples and 3200 test samples, and the Task2-set contains 17650                                       evaluated according to [5].
examples, including 12700 training samples and 4950 test                                              1) TP: The sample is true, and the results obtained are positive.
samples. The statistics of training and testing data is shown in                                      2) FP: The sample is false, and the results obtained are positive.
Table 2 and Table 3.                                                                                  3) FN: The sample is false, and the results obtained are negative.
                                                                                                      4) TN: The sample is true, and the results obtained are negative.
         Table 2. Corpus statistics of DPIL 2016 on Task1
                                                                                                      According to the above measure metrics, the Precision and Recall
                                          Train                               Test                    are defined as follows:
                                                                                                                                          TP
  Language       Hin Mal Pun Tam                           all   Hin Mal Pun Tam             all                        precision                                          (5)
                                                                                                                                        TP  FP
 SampleNum
           2500 2500 1700 2500 9200 900                                 900    500   900 3200                                             TP
    ber                                                                                                                     recall                                         (6)
                                                                                                                                       TP  FN
  Avg blank 32                18            39        24   27     32     19    43     23     28
                                                                                                      The main evaluation metrics adopted by DPIL is Accuracy and
 terms 4gram 126            166            150       175   155   120    181    164   176    160        F1 measure defined as follows:

                                                                                                      2
                                                                                                       http://scikit-learn.org/stable/
                                TP  TN                                           CUSAT                                          0.465
              accuracy                                             (7)                        0.5086      ——      ——      ——              ——       ——     ——
                           TP  FN  FP  TN                                      TEAM                                             8
                     2  precision  recall                                                                                      0.513
              F1                                                   (8)           CUSAT NLP 0.5207         ——      ——      ——              ——       ——     ——
                       precision  recall                                                                                          0

 3.4 Experimental Results
 3.4.1 Experimental results on sub corpora                                        The experimental results show that the proposed method achieves
 Table 4 show the experimental results released by FIRE.                          the best Accuracy on Malayalam of Task 1 and on Malayalam,
                                                                                  Tamil and Punjabi of Task 2. And the highest F1 measure for both
        Table 4. Experimental results on DPIL@FIRE2016                            Task1 and Task2 on Malayalam and Tamil, and the highest F1
                           (a) Task 1 sub corpus                                  measure on Punjabi Task2 in the 2016FIREDetecting Paraphrase
                           Accuracy                      F1 Measure
                                                                                  in Indian Languages task.
   TEAM
               Mal      Tam      Hin     Pun     Mal     Tam     Hin      Pun     3.4.2 Effect of word segmentation
                         0.821 0.896 0.944       0.810   0.790 0.890      0.940   For the word segmentation, we utilize two processing methods.
HIT2016       0.8377
                           1     6     0           0       0     0          0     One is based on the space to do the word segmentation, and the
                                                                                  other is based on n-gram. We compare the two kinds of word
                         0.788 0.906 0.946       0.790   0.750 0.900      0.950
KS_JU         0.8100
                           8     6     0           0       0     0          0
                                                                                  segmentation methods in Table 5.

                         0.833 0.915 0.942       0.790   0.790 0.910      0.940
                                                                                         Table 5. Comparison of two different preprocessing
NLP-NITMZ 0.8344
                           3     5     0           0       0     0          0                            4-gram                             space
                                                                                   Task1
                         0.575 0.822 0.942       0.160   0.090 0.740      0.940             Mal    Tam      Hindi      Pun      Mal      Tam    Hindi      Pun
JU-NLP        0.5900
                           5     2     0           0       0     0          0     Precisio
                                                                                     n     0.8993 0.9587 0.9235 0.9884 0.8771 0.9543 0.9340 0.9911
                                 0.920                           0.910
Anuj           ——        ——              ——      ——      ——               ——       Recall 0.9301 0.9606 0.9187 0.9921 0.9279 0.9574 0.9289 0.9921
                                   0                               0
                                         0.938                            0.940   Accurac
DAVPBI         ——        ——      ——              ——      ——      ——                       0.8957 0.9517 0.9054 0.9885 0.8785 0.9469 0.9178 0.9901
                                                                                     y
                                           0                                0
                                                                                    F1     0.9143 0.9596 0.9210 0.9902 0.9017 0.9558 0.9314 0.9916
BITS-PILANI    ——       —— 0.8977 ——             ——      —— 0.8900 ——
                                                                                                        4-gram                              space
                         0.823                           0.790                     Task2
NLP@KEC        ——                ——      ——      ——              ——       ——                Mal    Tam       Hindi     Pun      Mal   Tam      Hindi      Pun
                           3                               0
                                                                                  Precisio
ASE            ——        ——
                                 0.358
                                         ——      ——      ——
                                                                 0.340
                                                                          ——         n     0.7298 0.7873 0.8499 0.9810 0.7135 0.7917 0.8553 0.9814
                                   8                               0
                                                                                   Recall 0.7370 0.7918 0.8484 0.9808 0.7227 0.7949 0.8545 0.9813
CUSAT                                            0.760
              0.8044     ——      ——      ——              ——      ——       ——      Accurac
TEAM                                               0                                      0.7370 0.7918 0.8484 0.9808 0.7227 0.7949 0.8545 0.9813
                                                                                     y
                                                 0.750
CUSAT NLP 0.7622         ——      ——      ——              ——      ——       ——        F1     0.7309 0.7878 0.8483 0.9808 0.7134 0.7923 0.8541 0.9813
                                                   0
                                                                                  From the experimental results, we can see that the method of 4-
                           (b) Task 2 sub corpus                                  gram segmentation achieves higher F1 Measure than the space
                           Accuracy                      F1 Measure               segmentation, so we use n-gram method in the following
   TEAM                                                                           experiments to deal with the India corpus.
               Mal      Tam      Hin     Pun     Mal     Tam     Hin      Pun
                         0.755 0.900 0.922       0.746   0.739 0.898      0.923   3.4.3 Effects of pre-processing
HIT2016       0.7486
                           0     0     6           0       8     4          0     In our experiment, there are two types of pre-processing methods.
                         0.673 0.852 0.896       0.657   0.664 0.848      0.896   To investigate the different contribution of each pre-processing
KS_JU         0.6614                                                              method on each language, we analyze the effects of pre-
                           5     1     0           8       5     2          0
                                                                                  processing. Taking 4gram word segmentation as example, Table 6
                         0.657 0.785 0.812       0.606   0.630 0.764      0.808   gives the experimental results, where removing all means remove
NLP-NITMZ 0.6243
                           1     7     0           8       7     2          6     the punctuation, the number and the space, and reserving * means
                         0.550 0.685 0.886       0.307   0.431 0.684      0.886   reserving * and removing all others. For example, reserving
JU-NLP        0.4221
                           7     7     6           8       9     1          6     punctuationrepresents the punctuation is reserved and the number
                                 0.901                           0.900
                                                                                  and space are removed.
Anuj           ——        ——              ——      ——      ——               ——
                                   4                               0                              Table 6. Effects of pre-processing
                                         0.746                            0.727                             Reserved       Reserved      Reserved
DAVPBI         ——        ——      ——              ——      ——      ——                 Mal
                                                                                                           punctuation     number         space
                                                                                                                                                  Remove all
                                           6                                4
BITS-PILANI    ——       —— 0.7171 ——             ——      —— 0.7123 ——                        Precision            0.9013     0.8995        0.8992        0.8988
                                                                                               Recall             0.9280     0.9276        0.9325        0.9335
                         0.685                           0.667                     Task1
NLP@KEC        ——                ——      ——      ——              ——       ——                 Accuracy             0.8956     0.8944        0.8966        0.8968
                           7                               4
                                 0.354                           0.353                      F1 Measure            0.9144     0.9133        0.9154        0.9157
ASE            ——        ——              ——      ——      ——               ——
                                   3                               5               Task2     Precision            0.7304     0.7258        0.7253        0.7289
            Recall        0.7380      0.7340     0.7321      0.7362                      (a) The experimental results on Task 1
          Accuracy        0.7380      0.7340     0.7321      0.7362
         F1 Measure       0.7316      0.7273     0.7264      0.7299
                       Reserved     Reserved   Reserved
  Tam                                                   Remove all
                      punctuation   number      space
          Precision       0.9585      0.9591     0.9535      0.9570
            Recall        0.9593      0.9590     0.9558      0.9607
 Task1
          Accuracy        0.9506      0.9507     0.9455      0.9506
         F1 Measure       0.9589      0.9590     0.9546      0.9588
          Precision       0.7855      0.7874     0.7864      0.7871
            Recall        0.7901      0.7915     0.7897      0.7917
 Task2
          Accuracy        0.7901      0.7915     0.7897      0.7917                      (b) The experimental results on Task 2
         F1 Measure       0.7861      0.7880     0.7866      0.7880                         Figure 5. The effects of n-gram
                       Reserved     Reserved   Reserved                According to the above experimental results, 4-gram achieves the
 Hindi                                                  Remove all
                      punctuation   number      space                  best results. So we set n=4 in the testing corpora of DPIL 2016.
          Precision       0.9218      0.9242     0.9310      0.9230
                                                                       3.4.5 Effects of n_estimators
            Recall        0.9136      0.9151     0.9244      0.9195
 Task1                                                                 The parameter n_estimators is the number of iterations of
          Accuracy        0.9018      0.9039     0.9133      0.9054    boosting stage when the classification model trained. It is set
         F1 Measure       0.9176      0.9195     0.9275      0.9211    empirically. Figure 6 shows the results on training datasets.
          Precision       0.8490      0.8502     0.8495      0.8500
                                                                                                    dpil-mal-train-Task1
            Recall        0.8477      0.8481     0.8487      0.8486          0.915
 Task2
          Accuracy        0.8477      0.8481     0.8487      0.8486           0.91

         F1 Measure       0.8475      0.8480     0.8484      0.8484          0.905

                      Reserved      Reserved   Reserved                        0.9
  Pun                                                     Remove all
                      punctuation   number      space                        0.895

          Precision       0.9909      0.9904     0.9867      0.9903           0.89

            Recall        0.9914      0.9908     0.9895      0.9905          0.885
 Task1                                                                               0         20         40          60       80   100
          Accuracy        0.9895      0.9889     0.9859      0.9887
                                                                                                       Accuracy   F1 Measure
         F1 Measure       0.9911      0.9906     0.9881      0.9904
          Precision       0.9810      0.9774     0.9812      0.9812         (a) The experimental results of Malayalam on Task1
            Recall        0.9808      0.9772     0.9810      0.9811
 Task2
          Accuracy        0.9808      0.9772     0.9810      0.9811
         F1 Measure       0.9808      0.9772     0.9810      0.9811


According to the experimental results shown in Table 6, even
thoughwe find that there are few differences when we removing
punctuation, numbers and spaces, we still accept the best pre-
processing method on the test dataset.
3.4.4 Effects of n-gram
For analyze the effects of n, we carry out the experiments from 1-
gram to 10-gram, and with Precision, Recall and F1 measure as                 (b) The experimental results of Tamil on Task1
evaluation indicators. The experimental results are shown in
Figure 5.




                                                                               (c) The experimental results of Hindion Task1
 (d) The experimental results of Punjabi on Task1           (h) The experimental results of Punjabi on Task2
                                                                     Figure 6.Effects of n_estimators
                                                     According to Figure 6, we get the value of the parameter
                                                     n_estimators of each language. Details are shown in Table 7
                                                     which is used in the testing datasets of DPIL.
                                                                       Table 7.N_estimatorssetting
                                                                              Task1                   Task2
                                                        Malayalam               55                      40
                                                          Tamil                 20                      20
                                                          Hindi                 45                      45
                                                         Punjabi                10                      25


(e) The experimental results of Malayalam on Task2
                                                     4. CONCLUSIONS
                                                     We describe an approach to the Detecting Paraphrase problem in
                                                     India Language that makes used of the Gradient Tree Boosting.
                                                     Overall, the approach was very competitive and achieved the
                                                     highest Accuracy and F1 measure among all task participants.

                                                     5. ACKNOWLEDGMENTS
                                                     This work is supported by Youth National Social Science Fund of
                                                     China (No. 14CTQ032), National Natural Science Foundation of
                                                     China (No. 61370170), and Research Project of HeilongjiangProv
                                                     incial Department of Education (No. 12541677, 12541649).
                                                     6. REFERENCES
  (f) The experimental results of Tamil on Task2     [1] Alzahrani, S. M., Salim, N., and Abraham, A. 2012.
                                                         Understanding plagiarism linguistic patterns, textual features,
                                                         and detection methods. IEEE Transactions on Systems, Man,
                                                         and Cybernetics, Part C (Applications and Reviews), 42(2),
                                                         133-149.
                                                     [2] Friedman, J. H., 2001. Greedy function approximation: a
                                                         gradient boosting machine. Annals of statistics, 1189-1232.
                                                     [3] Friedman, J. H., 2002. Stochastic gradient boosting.
                                                         Computational Statistics & Data Analysis, 38(4), 367-378.
                                                     [4] Banerjee, S., andLavie, A., 2005, June. METEOR: An
                                                         automatic metric for MT evaluation with improved
                                                         correlation with human judgments. In Proceedings of the acl
                                                         workshop on intrinsic and extrinsic evaluation measures for
                                                         machine translation and/or summarization. 29: 65-72.
  (g) The experimental results of Hindion Task2      [5] Li, Hang., 2012. Statistical learning methods.Tsinghua
                                                         university press(in Chinese).
                                                     [6] Anand Kumar, M., Singh, S., Kavirajan, B., and Soman, K.P.
                                                         2016. December. DPIL@FIRE2016: Overview of shared
                                                         task on Detecting Paraphrases in Indian Languages, Working
                                                         notes of FIRE 2016 - Forum for Information Retrieval
                                                         Evaluation, Kolkata, India, CEUR Workshop Proceedings,
                                                         CEUR-WS.org.