<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Automatic Generation of Research Highlights from Scientific Abstracts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tohida Rehman</string-name>
          <email>tohida.rehman@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Debarshi Kumar Sanyal</string-name>
          <email>debarshisanyal@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samiran Chattopadhyay</string-name>
          <email>samirancju@gmail.com</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Plaban Kumar Bhowmick</string-name>
          <email>plaban@cet.iitkgp.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Partha Pratim Das</string-name>
          <email>ppd@cse.iitkgp.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>words but Model 3 reduces them. The first sentence from Model 3</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IIT Kharagpur</institution>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Indian Association for the Cultivation, of Science</institution>
          ,
          <addr-line>Kolkata</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Jadavpur University</institution>
          ,
          <addr-line>Kolkata</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>TCG CREST; Jadavpur University</institution>
          ,
          <addr-line>Kolkata</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>30</volume>
      <issue>2021</issue>
      <fpage>69</fpage>
      <lpage>70</lpage>
      <abstract>
        <p>The huge growth in scientific publications makes it dificult for researchers to keep track of new research even in narrow sub-fields. While an abstract is a traditional way to present a high level view of the paper, recently it is getting supplemented with research highlights that explicitly identify the important findings in the paper. In this poster, we aim to automatically construct research highlights given the abstract of a paper. We use deep neural network-based models for this purpose and achieve high ROUGE and METEOR scores on a large corpus of computer science papers.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Information extraction;
Summarization.</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <sec id="sec-2-1">
        <title>The count of scientific publications doubles roughly every 9 years</title>
        <p>
          [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], making it hard for researchers to track even their own fields.
One recent trend is to provide research highlights – a bulleted list of
the main contributions of the paper – along with the abstract and
the main text. They are potentially easier to read than abstracts,
especially on mobile devices, and focus more on findings than on
background. Additionally research highlights could be useful for other
tasks like finding surrogates for access-restricted papers [
          <xref ref-type="bibr" rid="ref5 ref7">5, 7</xref>
          ] and
keyphrase extraction [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. We use a pointer-generator network with
coverage mechanism to automatically generate highlights given the
abstract of a research paper. Distinct from a prior work [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] that
classifies sentences in the full text as highlights or not, our focus is
on generation of highlights.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>METHODOLOGY</title>
      <sec id="sec-3-1">
        <title>We use a dataset released by Collins et al. [2] containing URLs of</title>
        <p>10142 computer science publications from ScienceDirect1 . Each
example in the dataset is organized as (abstract, author-written
research highlights): 8115 pairs are used for training, 1014 pairs for
validation and 1013 pairs for testing. In this dataset, the average
abstract size is 186 words while that of highlights is 52; for 98%
of the papers, highlights are 1.5 times or more shorter than the
abstract.</p>
      </sec>
      <sec id="sec-3-2">
        <title>We have used three deep learning-based models to generate</title>
        <p>
          research highlights. Model 1 is the sequence-to-sequence (seq2seq)
model with attention [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Each abstract is tokenized and the tokens
are converted to 128-dimensional GloVe vectors [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] that are
sequentially fed into the encoder which is a single-layer bidirectional Long
Short-Term Memory (BiLSTM). The decoder is a single-layer
unidirectional LSTM. The model uses neural attention [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] to attend to the
words in the source document while generating the target words
for the summary. Model 2 is a pointer-generator network [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], which
augments the above seq2seq model with a special copying
mechanism. When generating words, the decoder probabilistically decides
between generating new words from the vocabulary (i.e. from the
training corpus) and copying words from the input abstract (by
sampling from the attention distribution). While the generator helps
in novel paraphrasing, copying helps to tackle out-of-vocabulary
(OOV) words. Model 3 augments the second model with coverage
mechanism of Tu et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] to avoid erroneously repeating the same
words during decode. For all the models, we used the same
vocabulary of around 50K tokens, beam search in the decoder with size
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>4, maximum input size of 400 tokens and maximum output size of 100 tokens.</title>
        <p>3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>RESULTS &amp; ANALYSIS</title>
      <sec id="sec-4-1">
        <title>Results are shown in Table 1 for ROUGE-1, ROUGE-2, ROUGE-L</title>
        <p>and METEOR as (R)ecall, (P)recision and (F1)-score. Author-written
highlights are used as the golden output. Model 3 (pointer-generator
model with coverage mechanism) always achieved highest F1-score.</p>
      </sec>
      <sec id="sec-4-2">
        <title>In the case study in Fig. 1, Model 1 generated many OOV words</title>
        <p>and factual errors . Model 2 generates more meaningful research
highlights and even relevant novel words that capture the
context of the paper much better. Model 2 sometimes outputs repeating</p>
        <p>R
02.02
7.48
8.52</p>
      </sec>
      <sec id="sec-4-3">
        <title>ROUGE</title>
      </sec>
      <sec id="sec-4-4">
        <title>ROUGE-2 P 02.02 8.06</title>
        <p>9.2
F1
1.93
7.55
8.57
R
19.49
28.66
29.2</p>
      </sec>
      <sec id="sec-4-5">
        <title>ROUGE-L P 19.16 30.34</title>
        <p>30.9</p>
        <p>F1
18.58
28.62
29.14
R
17.86
25.53
27.64</p>
      </sec>
      <sec id="sec-4-6">
        <title>METEOR</title>
      </sec>
      <sec id="sec-4-7">
        <title>Synonym/paraphrase/stem</title>
      </sec>
      <sec id="sec-4-8">
        <title>P F1 Final Score</title>
        <p>17.69 17.78 7.39
26.61 26.06 11.04
29.26 28.43 12.01
Title: Multiscale computation for transient heat conduction problem with radiation boundary condition in porous materials
Abstract: This paper reports a multiscale asymptotic analysis and computation for predicting heat transfer performance of periodic
porous materials with radiation boundary condition. In these porous materials thermal radiation efect at micro-scale have an important
impact on the macroscopic temperature field, which is our particular interest in this study. The multiscale asymptotic expansions for
computing temperature field of the problem are constructed, and associated explicit convergence rates are obtained on some regularity
hypothesis. Finally, the corresponding finite element algorithms based on the multiscale method are brought forward and some numerical
results are given in details. The numerical tests indicate that the developed method is feasible and valid for predicting the heat transfer
performance of periodic porous materials, and support the approximate convergence results proposed in this paper.
Author-written highlights: A novel multiscale analysis and computation is proposed. Heat transfer problem of periodic porous materials
with radiation boundary condition are considered. Error estimates of the multiscale approximate solution are derived on some regularity
hypothesis. Some numerical results are given in details to validate the multiscale method.</p>
        <p>Output of Model 1: A non-intrusive numerical tool is developed for solar artery supply planning . The results were analyzed, based on
the fe modeling of finite element model . The approaches provides practical advantages of the cohort and accuracy of concrete equipment .
Agent-based fe technology neural network procedures are used for assessment assessment. Results obtained from a real composite sample are
considered and discussed.</p>
        <p>Output of Model 2: This paper reports a multiscale asymptotic analysis and developed protocol. The proposed approach is based
on regularity hypothesis expansions . The proposed method is robust and can achieve predicting heat transfer performance . The
proposed method is robust and eficient for given bone microstructure samples .</p>
        <p>Output of Model 3: Reports a multiscale asymptotic analysis without object propagation using minimal porous properties .
Predicting heat transfer performance of periodic porous materials with radiation boundary condition. Finite element
algorithms and computation of approximate convergence results .
model-generated
main text.</p>
        <p>research
Abstract</p>
        <p>highlights.
taken from
contains words ( ‘without object ... properties’ ) that do not fit into
the context, but its other highlights are meaningful.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4 CONCLUSION</title>
      <sec id="sec-5-1">
        <title>We applied three diferent deep neural models to generate research</title>
        <p>highlights from the abstract of a research paper. The pointer-generator
network with coverage mechanism achieved the best performance.</p>
      </sec>
      <sec id="sec-5-2">
        <title>But the predicted research highlights are not yet perfect. A simple</title>
        <p>post-processing operation could be to remove sentences that
contain entities that are absent in the given abstract. We are currently
exploring this and other techniques to improve the system.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENTS</title>
      <sec id="sec-6-1">
        <title>This work is supported by research grant from Department of Sci</title>
        <p>ence and Technology, Government of India at Indian Association for
the Cultivation of Science, Kolkata and National Digital Library of</p>
      </sec>
      <sec id="sec-6-2">
        <title>India Project sponsored by the Ministry of Education, Government</title>
        <p>of India at IIT Kharagpur.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Dzmitry</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          , Kyunghyun Cho, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>In ICLR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] Ed Collins, Isabelle Augenstein, and
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Riedel</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A supervised approach to extractive summarisation of scientific papers</article-title>
          .
          <source>In CoNLL.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Ramesh</given-names>
            <surname>Nallapati</surname>
          </string-name>
          , Bowen Zhou, Caglar Gulcehre,
          <string-name>
            <given-names>Bing</given-names>
            <surname>Xiang</surname>
          </string-name>
          , et al.
          <year>2016</year>
          .
          <article-title>Abstractive text summarization using sequence-to-sequence RNNs and beyond</article-title>
          . In CoNLL.
          <fpage>280</fpage>
          -
          <lpage>290</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Jefrey</given-names>
            <surname>Pennington</surname>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher D</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>GloVe: Global vectors for word representation</article-title>
          .
          <source>In EMNLP</source>
          .
          <volume>1532</volume>
          -
          <fpage>1543</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>TYSS</given-names>
            <surname>Santosh</surname>
          </string-name>
          , Debarshi Kumar Sanyal, Plaban Kumar Bhowmick, and
          <string-name>
            <surname>Partha Pratim Das</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Surrogator: A tool to enrich a digital library with open access surrogate resources</article-title>
          .
          <source>In JCDL</source>
          .
          <volume>379</volume>
          -
          <fpage>380</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Tokala</given-names>
            <surname>Yaswanth Sri Sai Santosh</surname>
          </string-name>
          , Debarshi Kumar Sanyal, Plaban Kumar Bhowmick, and
          <string-name>
            <surname>Partha Pratim Das</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>DAKE: Document-Level Attention for Keyphrase Extraction</article-title>
          .
          <source>In ECIR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Debarshi</given-names>
            <surname>Kumar</surname>
          </string-name>
          <string-name>
            <given-names>Sanyal</given-names>
            , Plaban Kumar Bhowmick,
            <surname>Partha Pratim Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Samiran Chattopadhyay</surname>
            , and
            <given-names>TYSS</given-names>
          </string-name>
          <string-name>
            <surname>Santosh</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Enhancing access to scholarly publications with surrogate resources</article-title>
          .
          <source>Scientometrics</source>
          <volume>121</volume>
          ,
          <issue>2</issue>
          (
          <year>2019</year>
          ),
          <fpage>1129</fpage>
          -
          <lpage>1164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Abigail</given-names>
            <surname>See</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peter J Liu</surname>
            , and
            <given-names>Christopher D</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Get to the point: Summarization with pointer-generator networks</article-title>
          .
          <source>In ACL.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Zhaopeng</given-names>
            <surname>Tu</surname>
          </string-name>
          , Zhengdong Lu, Yang Liu, Xiaohua Liu, and
          <string-name>
            <given-names>Hang</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Modeling coverage for neural machine translation</article-title>
          .
          <source>In ACL.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10] Richard Van Noorden.
          <year>2014</year>
          .
          <article-title>Global scientific output doubles every nine years</article-title>
          .
          <source>Nature news blog</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>