IIIT-H @ CLScisumm-18 Kritika Agrawal and Aakash Mittal International Institute of Information Technology, Hyderabad kritika.agrawal@research.iiit.ac.in International Institute of Information Technology, Hyderabad aakash.mittal@students.iiit.ac.in Abstract. In this report, we present our system for the 4th Computa- tional Linguistics Scientific Document Summarization Shared Task (CL- SciSumm 2018), which poses the challenge of identifying the spans of text in a reference paper (RP) that most accurately reflect a citation (i.e. ci- tance) from another paper to the RP and identifying the discourse facet of the corresponding text. We modeled Task1A as Classification Prob-lem which classifies a pair of sentences into classes paraphrase and non- paraphrase. We used Convolutional Neural Network along-with transfer learning for this. Task1B is solved as multi label classification problem. Keywords: Paraphrase Detection · Classification · Transfer Learning 1 Methodology 1.1 Task Description CL-SciSumm explores summarization of scientific research, for the computa- tional linguistics research domain. An ideal summary of computational linguis- tics research papers would be able to summarize previous research by drawing comparisons and contrasts between their goals, methods and results, as well as distil the overall trends in the state of the art and their place in the larger aca- demic discourse. There are two tasks in CL-SciSumm 2018. The training dataset contains 40 topics of documents. A topic is consisted of a Reference Paper (RP) and Citing Papers (CPs) that all contain citations to the RP. In each CP, the text spans (citances) have been identified that pertain to a particular citation to the RP. In Task 1A, for each citance, we need to identify the spans of text (cited text spans) in the RP that most accurately reflect the citance. In Task 1B, for each cited text span, we need to identify what facet of the paper it belongs to, from five predefined facets, which are Aim, Method, Results, Implication and Hypothesis. In Task 2, we need to generate a structured summary of the RP from the cited text spans of the RP. 1.2 Task1A In this task, for each citance, we need to identify the spans of text (cited text spans) in the RP that most accurately reflect the citance. We modeled this task as 2 classification problem where we are classifying pair of sentences into Paraphrase and non-paraphrase. We used the method described in ”Text Matching as Im- age Recognition” paper of using CNN, ideally used for images, for text matching problem. Firstly, a matching matrix whose entries represent the similarities be- tween words is constructed and viewed as an image. Then a convolutional neural network is utilized to capture rich matching patterns in a layer-by-layer way. By resembling the compositional hierarchies of patterns in image recognition, the model can successfully identify salient signals such as n-gram and n-term match- ings. We train the MatchPyramid model introduced in ”” on Microsoft Research Para- phrase Detection Dataset. We then re-train the model using transfer learning on the given Training Dataset. 1.3 Task1B In this task, for each cited text span, we need to identify what facet of the paper it belongs to. There are 5 predefined facets - Aim, Hypothesis, Method, Result, Implication. We have modeled this problem as multi label classification problem as for many sentences in annotation, a text belonged to more than one facet. We are using term frequency-inverse document frequency(tf-idf) features and trained NB-SVM on it. NBSVM is an SVM model trained on naive bayes features. 2 Experiments and Results We created 3 datasets from the given Training Set for our approach: a. N4 dataset - For each citance sentence we create a positive sample as the pair of the citance sentence and the corresponding reference sentence. We also create 4 negative samples as pair of citance sentence and 2 sentence before and 2 sentences after the reference sentence. b. N2 dataset - For each citance sentence we create a positive sample as the pair of the citance sentence and the corresponding reference sentence. We also create 2 negative samples as pair of citance sentence and 1 sentence before and 1 sentences after the reference sentence. c. N1 dataset - For each citance sentence we create a positive sample as the pair of the citance sentence and the corresponding reference sentence. We also create 1 negative sample as pair of citance sentence and randomly selecting a sentence from the first 2 sections of the paper which is not a reference sentence. 1. Preprocessing All the words are converted to lower case and stemmed so as to decrease the vocabulary size. 2. Model Our model consist of 2 embedding layers, one for the citance sentence and one 3 for the reference sentence. A cross is taken between the 2 layers to generate a matrix (the matrix will be given as an image to convolution network). A convo- lutional layer is applied on the matrix followed by an dynamic pooling layer. The activation function used is Relu. The output of the dynamic layer is flattened and then fed to a fully connected layer. The activation function used is softmax. We first trained the model on Microsoft Research Paraphrase detection dataset. We followed the fine parameter tuning approach for transfer learning. We retrain all the layers in the model with weights initialized with those learnt on Microsoft paraphrase dataset. We performed experiments on two models: 1. TL: Model initially trained on MSR Paraphrase Detection dataset and then trained on our dataset using Transfer learning as explained above. 2. Non-TL : Model same as mentioned in point 3 but trained directly on our dataset. Parameters used : Kernel Count : 32, Kernel Size :, [3, 3], Dynamic Pooling Size : [3, 10] Table 1. Results. Dataset Unicode Encoding Unicode Encoding ASCII Encoding ASCII Encoding (No TL) (TL) (No TL) (TL) N4 0.767980 0.642020 0.771263 0.648687 N2 0.685290 0.552935 0.694420 0.583804 N1 0.671408 0.599155 0.682160 0.557793 2.1 Task1B First we created a bag of words representation, as a term document matrix, for the citances and the reference text in the annotations. We have used ngrams, as suggested in the NBSVM paper. Then we find the log count ratio for each class. We fit a model for one class at a time. NBSVM is identical to the SVM, except we use x (k) = f (k) , where f (k) = r f (k) is the elementwise product. Here, x(k) is the feature set given to SVM and f (k) are the original features. We had total of 748 samples in the training set. We split this into training and validation set in the ratio 4:1. Both Citance and Reference text are used as features. We obtained the highest accuracy of 90.303% on the validation dataset. 3 Conclusion Submitted system runs for the following models : 1. Model trained on N4 dataset with no transfer learning. 4 2. Model trained on N4 dataset with transfer learning. 3. Model trained on N2 dataset with no transfer learning. 4. Model trained on N1 dataset with transfer learning. References Pang, L., Lan, Y., Guo, J., Xu, J., Wan, S., & Cheng, X. (2016, February). Text Matching as Image Recognition. In Proceedings of the AAAI Annual Meeting (pp. 2793-2799). Wang, S., & Manning, C. D. (2012, July). Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2 (pp. 90-94). Association for Computational Linguistics.