=Paper=
{{Paper
|id=Vol-3004/paper10
|storemode=property
|title=Automatic Generation of Research Highlights from Scientific Abstracts
|pdfUrl=https://ceur-ws.org/Vol-3004/paper10.pdf
|volume=Vol-3004
|authors=Tohida Rehman,Debarshi Kumar Sanyal,Samiran Chattopadhyay,Plaban Kumar Bhowmick,Partha Pratim Das
|dblpUrl=https://dblp.org/rec/conf/jcdl/RehmanSCBD21
}}
==Automatic Generation of Research Highlights from Scientific Abstracts==
EEKE 2021 – Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents
Automatic Generation of Research Highlights from Scientific
Abstracts
Tohida Rehman Debarshi Kumar Sanyal Samiran Chattopadhyay
Jadavpur University Indian Association for the Cultivation TCG CREST; Jadavpur University
Kolkata, India of Science Kolkata, India
tohida.rehman@gmail.com Kolkata, India samirancju@gmail.com
debarshisanyal@gmail.com
Plaban Kumar Bhowmick Partha Pratim Das
IIT Kharagpur IIT Kharagpur
India India
plaban@cet.iitkgp.ac.in ppd@cse.iitkgp.ac.in
ABSTRACT example in the dataset is organized as (abstract, author-written
The huge growth in scientific publications makes it difficult for research highlights): 8115 pairs are used for training, 1014 pairs for
researchers to keep track of new research even in narrow sub-fields. validation and 1013 pairs for testing. In this dataset, the average
While an abstract is a traditional way to present a high level view abstract size is 186 words while that of highlights is 52; for 98%
of the paper, recently it is getting supplemented with research high- of the papers, highlights are 1.5 times or more shorter than the
lights that explicitly identify the important findings in the paper. In abstract.
this poster, we aim to automatically construct research highlights We have used three deep learning-based models to generate
given the abstract of a paper. We use deep neural network-based research highlights. Model 1 is the sequence-to-sequence (seq2seq)
models for this purpose and achieve high ROUGE and METEOR model with attention [3]. Each abstract is tokenized and the tokens
scores on a large corpus of computer science papers. are converted to 128-dimensional GloVe vectors [4] that are sequen-
tially fed into the encoder which is a single-layer bidirectional Long
CCS CONCEPTS Short-Term Memory (BiLSTM). The decoder is a single-layer unidi-
rectional LSTM. The model uses neural attention [1] to attend to the
• Information systems → Information extraction; Summa-
words in the source document while generating the target words
rization.
for the summary. Model 2 is a pointer-generator network [8], which
KEYWORDS augments the above seq2seq model with a special copying mecha-
nism. When generating words, the decoder probabilistically decides
Pointer-generator network, Deep learning, Natural language gen- between generating new words from the vocabulary (i.e. from the
eration training corpus) and copying words from the input abstract (by sam-
pling from the attention distribution). While the generator helps
1 INTRODUCTION in novel paraphrasing, copying helps to tackle out-of-vocabulary
The count of scientific publications doubles roughly every 9 years (OOV) words. Model 3 augments the second model with coverage
[10], making it hard for researchers to track even their own fields. mechanism of Tu et al. [9] to avoid erroneously repeating the same
One recent trend is to provide research highlights – a bulleted list of words during decode. For all the models, we used the same vocabu-
the main contributions of the paper – along with the abstract and lary of around 50K tokens, beam search in the decoder with size
the main text. They are potentially easier to read than abstracts, es- 4, maximum input size of 400 tokens and maximum output size of
pecially on mobile devices, and focus more on findings than on back- 100 tokens.
ground. Additionally research highlights could be useful for other
tasks like finding surrogates for access-restricted papers [5, 7] and
keyphrase extraction [6]. We use a pointer-generator network with
coverage mechanism to automatically generate highlights given the 3 RESULTS & ANALYSIS
abstract of a research paper. Distinct from a prior work [2] that
classifies sentences in the full text as highlights or not, our focus is Results are shown in Table 1 for ROUGE-1, ROUGE-2, ROUGE-L
on generation of highlights. and METEOR as (R)ecall, (P)recision and (F1)-score. Author-written
highlights are used as the golden output. Model 3 (pointer-generator
2 METHODOLOGY model with coverage mechanism) always achieved highest F1-score.
In the case study in Fig. 1, Model 1 generated many OOV words
We use a dataset released by Collins et al. [2] containing URLs of
and factual errors . Model 2 generates more meaningful research
10142 computer science publications from ScienceDirect1 . Each
1 https://www.sciencedirect.com/
highlights and even relevant novel words that capture the con-
text of the paper much better. Model 2 sometimes outputs repeating
EEKE ’21, September 30, 2021, Online
words but Model 3 reduces them. The first sentence from Model 3
Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
69
EEKE 2021 – Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents
Table 1: Comparison of models for research highlight generation.
ROUGE METEOR
ROUGE-1 ROUGE-2 ROUGE-L Synonym/paraphrase/stem
R P F1 R P F1 R P F1 R P F1 Final Score
Model 1 20.90 20.47 19.90 02.02 02.02 1.93 19.49 19.16 18.58 17.86 17.69 17.78 7.39
Model 2 30.99 32.07 30.9 7.48 8.06 7.55 28.66 30.34 28.62 25.53 26.61 26.06 11.04
Model 3 31.6 33.32 31.46 8.52 9.2 8.57 29.2 30.9 29.14 27.64 29.26 28.43 12.01
Title: Multiscale computation for transient heat conduction problem with radiation boundary condition in porous materials
Abstract: This paper reports a multiscale asymptotic analysis and computation for predicting heat transfer performance of periodic
porous materials with radiation boundary condition. In these porous materials thermal radiation effect at micro-scale have an important
impact on the macroscopic temperature field, which is our particular interest in this study. The multiscale asymptotic expansions for
computing temperature field of the problem are constructed, and associated explicit convergence rates are obtained on some regularity
hypothesis. Finally, the corresponding finite element algorithms based on the multiscale method are brought forward and some numerical
results are given in details. The numerical tests indicate that the developed method is feasible and valid for predicting the heat transfer
performance of periodic porous materials, and support the approximate convergence results proposed in this paper.
Author-written highlights: A novel multiscale analysis and computation is proposed. Heat transfer problem of periodic porous materials
with radiation boundary condition are considered. Error estimates of the multiscale approximate solution are derived on some regularity
hypothesis. Some numerical results are given in details to validate the multiscale method.
Output of Model 1: A non-intrusive numerical tool is developed for solar artery supply planning . The results were analyzed, based on
the fe modeling of finite element model . The approaches provides practical advantages of the cohort and accuracy of concrete equipment .
Agent-based fe technology neural network procedures are used for assessment assessment. Results obtained from a real composite sample are
considered and discussed.
Output of Model 2: This paper reports a multiscale asymptotic analysis and developed protocol. The proposed approach is based
on regularity hypothesis expansions . The proposed method is robust and can achieve predicting heat transfer performance . The
proposed method is robust and efficient for given bone microstructure samples .
Output of Model 3: Reports a multiscale asymptotic analysis without object propagation using minimal porous properties .
Predicting heat transfer performance of periodic porous materials with radiation boundary condition. Finite element
algorithms and computation of approximate convergence results .
Figure 1: Original abstract, author-written research highlights and model-generated research highlights.
The meaning of the colors (e.g., green = correct) is explained in main text. Abstract taken from
https://www.sciencedirect.com/science/article/abs/pii/S0168874X15000621
contains words ( ‘without object ... properties’ ) that do not fit into REFERENCES
the context, but its other highlights are meaningful. [1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine
translation by jointly learning to align and translate. In ICLR.
[2] Ed Collins, Isabelle Augenstein, and Sebastian Riedel. 2017. A supervised ap-
proach to extractive summarisation of scientific papers. In CoNLL.
4 CONCLUSION [3] Ramesh Nallapati, Bowen Zhou, Caglar Gulcehre, Bing Xiang, et al. 2016. Ab-
stractive text summarization using sequence-to-sequence RNNs and beyond. In
We applied three different deep neural models to generate research CoNLL. 280–290.
highlights from the abstract of a research paper. The pointer-generator [4] Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. GloVe:
network with coverage mechanism achieved the best performance. Global vectors for word representation. In EMNLP. 1532–1543.
[5] TYSS Santosh, Debarshi Kumar Sanyal, Plaban Kumar Bhowmick, and Partha Pra-
But the predicted research highlights are not yet perfect. A simple tim Das. 2018. Surrogator: A tool to enrich a digital library with open access
post-processing operation could be to remove sentences that con- surrogate resources. In JCDL. 379–380.
tain entities that are absent in the given abstract. We are currently [6] Tokala Yaswanth Sri Sai Santosh, Debarshi Kumar Sanyal, Plaban Kumar
Bhowmick, and Partha Pratim Das. 2020. DAKE: Document-Level Attention for
exploring this and other techniques to improve the system. Keyphrase Extraction. In ECIR.
[7] Debarshi Kumar Sanyal, Plaban Kumar Bhowmick, Partha Pratim Das, Sami-
ran Chattopadhyay, and TYSS Santosh. 2019. Enhancing access to scholarly
ACKNOWLEDGMENTS publications with surrogate resources. Scientometrics 121, 2 (2019), 1129–1164.
[8] Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point:
This work is supported by research grant from Department of Sci- Summarization with pointer-generator networks. In ACL.
[9] Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling
ence and Technology, Government of India at Indian Association for coverage for neural machine translation. In ACL.
the Cultivation of Science, Kolkata and National Digital Library of [10] Richard Van Noorden. 2014. Global scientific output doubles every nine years.
India Project sponsored by the Ministry of Education, Government Nature news blog (2014).
of India at IIT Kharagpur.
70