=Paper= {{Paper |id=Vol-3004/paper10 |storemode=property |title=Automatic Generation of Research Highlights from Scientific Abstracts |pdfUrl=https://ceur-ws.org/Vol-3004/paper10.pdf |volume=Vol-3004 |authors=Tohida Rehman,Debarshi Kumar Sanyal,Samiran Chattopadhyay,Plaban Kumar Bhowmick,Partha Pratim Das |dblpUrl=https://dblp.org/rec/conf/jcdl/RehmanSCBD21 }} ==Automatic Generation of Research Highlights from Scientific Abstracts== https://ceur-ws.org/Vol-3004/paper10.pdf
                      EEKE 2021 – Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


    Automatic Generation of Research Highlights from Scientific
                            Abstracts
                 Tohida Rehman                            Debarshi Kumar Sanyal                            Samiran Chattopadhyay
              Jadavpur University                     Indian Association for the Cultivation            TCG CREST; Jadavpur University
                 Kolkata, India                                    of Science                                  Kolkata, India
           tohida.rehman@gmail.com                               Kolkata, India                            samirancju@gmail.com
                                                           debarshisanyal@gmail.com

                                       Plaban Kumar Bhowmick                           Partha Pratim Das
                                             IIT Kharagpur                                 IIT Kharagpur
                                                  India                                         India
                                         plaban@cet.iitkgp.ac.in                        ppd@cse.iitkgp.ac.in

ABSTRACT                                                                        example in the dataset is organized as (abstract, author-written
The huge growth in scientific publications makes it difficult for               research highlights): 8115 pairs are used for training, 1014 pairs for
researchers to keep track of new research even in narrow sub-fields.            validation and 1013 pairs for testing. In this dataset, the average
While an abstract is a traditional way to present a high level view             abstract size is 186 words while that of highlights is 52; for 98%
of the paper, recently it is getting supplemented with research high-           of the papers, highlights are 1.5 times or more shorter than the
lights that explicitly identify the important findings in the paper. In         abstract.
this poster, we aim to automatically construct research highlights                 We have used three deep learning-based models to generate
given the abstract of a paper. We use deep neural network-based                 research highlights. Model 1 is the sequence-to-sequence (seq2seq)
models for this purpose and achieve high ROUGE and METEOR                       model with attention [3]. Each abstract is tokenized and the tokens
scores on a large corpus of computer science papers.                            are converted to 128-dimensional GloVe vectors [4] that are sequen-
                                                                                tially fed into the encoder which is a single-layer bidirectional Long
CCS CONCEPTS                                                                    Short-Term Memory (BiLSTM). The decoder is a single-layer unidi-
                                                                                rectional LSTM. The model uses neural attention [1] to attend to the
• Information systems → Information extraction; Summa-
                                                                                words in the source document while generating the target words
rization.
                                                                                for the summary. Model 2 is a pointer-generator network [8], which
KEYWORDS                                                                        augments the above seq2seq model with a special copying mecha-
                                                                                nism. When generating words, the decoder probabilistically decides
Pointer-generator network, Deep learning, Natural language gen-                 between generating new words from the vocabulary (i.e. from the
eration                                                                         training corpus) and copying words from the input abstract (by sam-
                                                                                pling from the attention distribution). While the generator helps
1    INTRODUCTION                                                               in novel paraphrasing, copying helps to tackle out-of-vocabulary
The count of scientific publications doubles roughly every 9 years              (OOV) words. Model 3 augments the second model with coverage
[10], making it hard for researchers to track even their own fields.            mechanism of Tu et al. [9] to avoid erroneously repeating the same
One recent trend is to provide research highlights – a bulleted list of         words during decode. For all the models, we used the same vocabu-
the main contributions of the paper – along with the abstract and               lary of around 50K tokens, beam search in the decoder with size
the main text. They are potentially easier to read than abstracts, es-          4, maximum input size of 400 tokens and maximum output size of
pecially on mobile devices, and focus more on findings than on back-            100 tokens.
ground. Additionally research highlights could be useful for other
tasks like finding surrogates for access-restricted papers [5, 7] and
keyphrase extraction [6]. We use a pointer-generator network with
coverage mechanism to automatically generate highlights given the               3   RESULTS & ANALYSIS
abstract of a research paper. Distinct from a prior work [2] that
classifies sentences in the full text as highlights or not, our focus is        Results are shown in Table 1 for ROUGE-1, ROUGE-2, ROUGE-L
on generation of highlights.                                                    and METEOR as (R)ecall, (P)recision and (F1)-score. Author-written
                                                                                highlights are used as the golden output. Model 3 (pointer-generator
2    METHODOLOGY                                                                model with coverage mechanism) always achieved highest F1-score.
                                                                                In the case study in Fig. 1, Model 1 generated many OOV words
We use a dataset released by Collins et al. [2] containing URLs of
                                                                                and factual errors . Model 2 generates more meaningful research
10142 computer science publications from ScienceDirect1 . Each
1 https://www.sciencedirect.com/
                                                                                highlights and even relevant novel words that capture the con-
                                                                                text of the paper much better. Model 2 sometimes outputs repeating
EEKE ’21, September 30, 2021, Online
                                                                        words but Model 3 reduces them. The first sentence from Model 3
 Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).



                                                                           69
                    EEKE 2021 – Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents

                                     Table 1: Comparison of models for research highlight generation.

                                                             ROUGE                                                       METEOR
                                 ROUGE-1                     ROUGE-2                  ROUGE-L                     Synonym/paraphrase/stem
                              R     P     F1             R      P     F1           R     P    F1                R      P     F1    Final Score
        Model 1             20.90 20.47 19.90          02.02 02.02 1.93          19.49 19.16 18.58            17.86 17.69 17.78        7.39
        Model 2             30.99 32.07  30.9           7.48   8.06 7.55         28.66 30.34 28.62            25.53 26.61 26.06       11.04
        Model 3             31.6 33.32 31.46           8.52    9.2   8.57        29.2 30.9 29.14              27.64 29.26 28.43       12.01

 Title: Multiscale computation for transient heat conduction problem with radiation boundary condition in porous materials
 Abstract: This paper reports a multiscale asymptotic analysis and computation for predicting heat transfer performance of periodic
 porous materials with radiation boundary condition. In these porous materials thermal radiation effect at micro-scale have an important
 impact on the macroscopic temperature field, which is our particular interest in this study. The multiscale asymptotic expansions for
 computing temperature field of the problem are constructed, and associated explicit convergence rates are obtained on some regularity
 hypothesis. Finally, the corresponding finite element algorithms based on the multiscale method are brought forward and some numerical
 results are given in details. The numerical tests indicate that the developed method is feasible and valid for predicting the heat transfer
 performance of periodic porous materials, and support the approximate convergence results proposed in this paper.
 Author-written highlights: A novel multiscale analysis and computation is proposed. Heat transfer problem of periodic porous materials
 with radiation boundary condition are considered. Error estimates of the multiscale approximate solution are derived on some regularity
 hypothesis. Some numerical results are given in details to validate the multiscale method.
 Output of Model 1: A non-intrusive numerical tool is developed for solar artery supply planning . The results were analyzed, based on
 the fe modeling of finite element model . The approaches provides practical advantages of the cohort and accuracy of concrete equipment .
  Agent-based fe technology neural network procedures are used for assessment assessment. Results obtained from a real composite sample are
 considered and discussed.
 Output of Model 2: This paper reports a multiscale asymptotic analysis and developed protocol. The proposed approach is based
 on regularity hypothesis expansions . The proposed method is robust and can achieve predicting heat transfer performance . The
  proposed method is robust and efficient for given bone microstructure samples .
 Output of Model 3: Reports a                       multiscale asymptotic analysis    without object propagation using minimal porous properties .
    Predicting heat transfer performance of periodic porous materials with radiation boundary condition. Finite element
    algorithms and computation of approximate convergence results .

Figure 1: Original abstract, author-written research highlights and model-generated research highlights.
The meaning of the colors (e.g., green = correct) is explained in main text. Abstract taken from
https://www.sciencedirect.com/science/article/abs/pii/S0168874X15000621


contains words ( ‘without object ... properties’ ) that do not fit into          REFERENCES
the context, but its other highlights are meaningful.                             [1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine
                                                                                      translation by jointly learning to align and translate. In ICLR.
                                                                                  [2] Ed Collins, Isabelle Augenstein, and Sebastian Riedel. 2017. A supervised ap-
                                                                                      proach to extractive summarisation of scientific papers. In CoNLL.
4     CONCLUSION                                                                  [3] Ramesh Nallapati, Bowen Zhou, Caglar Gulcehre, Bing Xiang, et al. 2016. Ab-
                                                                                      stractive text summarization using sequence-to-sequence RNNs and beyond. In
We applied three different deep neural models to generate research                    CoNLL. 280–290.
highlights from the abstract of a research paper. The pointer-generator           [4] Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. GloVe:
network with coverage mechanism achieved the best performance.                        Global vectors for word representation. In EMNLP. 1532–1543.
                                                                                  [5] TYSS Santosh, Debarshi Kumar Sanyal, Plaban Kumar Bhowmick, and Partha Pra-
But the predicted research highlights are not yet perfect. A simple                   tim Das. 2018. Surrogator: A tool to enrich a digital library with open access
post-processing operation could be to remove sentences that con-                      surrogate resources. In JCDL. 379–380.
tain entities that are absent in the given abstract. We are currently             [6] Tokala Yaswanth Sri Sai Santosh, Debarshi Kumar Sanyal, Plaban Kumar
                                                                                      Bhowmick, and Partha Pratim Das. 2020. DAKE: Document-Level Attention for
exploring this and other techniques to improve the system.                            Keyphrase Extraction. In ECIR.
                                                                                  [7] Debarshi Kumar Sanyal, Plaban Kumar Bhowmick, Partha Pratim Das, Sami-
                                                                                      ran Chattopadhyay, and TYSS Santosh. 2019. Enhancing access to scholarly
ACKNOWLEDGMENTS                                                                       publications with surrogate resources. Scientometrics 121, 2 (2019), 1129–1164.
                                                                                  [8] Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point:
This work is supported by research grant from Department of Sci-                      Summarization with pointer-generator networks. In ACL.
                                                                                  [9] Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling
ence and Technology, Government of India at Indian Association for                    coverage for neural machine translation. In ACL.
the Cultivation of Science, Kolkata and National Digital Library of              [10] Richard Van Noorden. 2014. Global scientific output doubles every nine years.
India Project sponsored by the Ministry of Education, Government                      Nature news blog (2014).
of India at IIT Kharagpur.




                                                                            70