<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Slide Generation for Scientific Papers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Athar Sefid</string-name>
          <email>azs5955@psu.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prasenjit Mitra</string-name>
          <email>pmitra@ist.psu.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Natural Language Processing, Text Mining, Slide Generation, Sum-</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jian Wu</string-name>
          <email>jwu@cs.odu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>C. Lee Giles</string-name>
          <email>giles@ist.psu.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Old Dominion University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Pennsylvania State University</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>marization</institution>
          ,
          <addr-line>Deep Learning</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>19</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>We describe our approach for automatically generating presentation slides for scientific papers using deep neural networks. Such slides can help authors have a starting point for their slide generation process. Extractive summarization techniques are applied to rank and select important sentences from the original document. Previous work identified important sentences based only on a limited number of features that were extracted from the position and structure of sentences in the paper. Our method extends previous work by (1) extracting a more comprehensive list of surface features, (2) considering semantic or meaning of the sentence, and (3) using context around the current sentence to rank the sentences. Once, the sentences are ranked, salient sentences are selected using Integer Linear Programming (ILP). Our results show the eficacy of our model for summarization and the slide generation task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Scientific results are usually disseminated by publications and
research presentations. The latter has been a convention for almost
all scientific domains since it provides an eficient way for others
to grasp the major contribution of a paper. Currently, PowerPoint
automates slide formating and thus significantly reduces the efort
to make professional and useful slides. However, PowerPoint
cannot automatically generate slides based on the content of research
papers. Automatically generating slides would not only save the
presenters’ time in preparing presentations, but it also provides
a way to summarize paper and it can also be used to generate
summaries for papers that do not have publicly available slides.</p>
      <p>
        The two major approaches for summarization are abstractive and
extractive methods. Abstractive approaches summarize text using
words not necessarily appearing in the original document.
Extractive summarization focuses on identifying important constituents
usually at the sentence level and connecting them to generate a
text snippet, which is usually significantly shorter than the original
document. This method has been applied in slide generation tasks,
e.g., [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This is because (1) the selected sentences have correct
grammatical structure and (2) they are scientifically/technically
consistent with the original article.
      </p>
      <p>The two major components of extractive summarization are
sentence scoring and sentence selection. The goal is to keep
salient sentences while excluding redundant information. Sentence
scoring is usually converted to a regression problem. The scores
depend on features extracted from the current sentence and its
contextual sentences.
The main contributions of this paper are the following.
(1) To the best of our knowledge, we are the first to use Deep
Neural models to encode sentences and its context as new
features in sentence ranking to build slides.
(2) We combine regression with integer linear programming
(ILP) to select salient sentences. Then, noun phrases are
extracted from the selected sentences to build the first-level
bullet points in slides.
Here, we focus on the textual components. This approach can later
be enhanced with visual efects, e.g., figures and tables, in order to
make better more complex slides.
2</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>There has been much work on text summarization using both
abstractive and extractive methods. However, automatic generation
of slides, which can be seen as a specific form of summarization of
scholarly papers, has not been well studied.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Summarization</title>
      <p>
        Unsupervised Models. Identifying important sentences for
generating a limited length summary can be formalized as an optimization
problem which can be NP-hard. Maximum Marginal Relevance
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] has been used to heuristically select salient sentences while
keeping redundancy with the selected summary at the lowest point.
Sentence selection could be converted to the Knapsack problem
by maximizing the total scores of the selected sentences given the
length limit and can be solved by Integer Linear Programming (ILP).
ILP is an optimization method with linear constraints and
objective functions [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Graph-based summarization methods model a
document as a graph, in which vertices are sentences and edges
are the similarity of vertices. One example is TextRank [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] which
extends Google’s PageRank. TextRank selects important sentences
that have a high degree of overlapping tokens with other sentences.
LexRank [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a stochastic graph-based method, measures the
sentence importance by eigenvector centrality of the sentences in the
graph.
      </p>
      <p>
        Feature-Based Models. Many traditional extractive summarization
approaches use feature-based models such as term frequency, an
efective feature. Other features include the length and position of
sentences [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The Support Vector Regressor was applied to score
and extract salient sentences [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Others have modeled a document
as a sequence of sentences and label each sentence with 1 or 0 in
which 1 indicates that the corresponding sentence belongs to the
summary [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and a 0 not. Hidden Markov models (HMM) have
been adopted to solve such sequence labeling problems [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Deep Neural Network Models. State-of-the-art models now use deep
neural networks which benefit from both the semantics of the
sentences and their structure [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. Zhou et al. used multiple layers
of bidirectional Gated Recurrent Units (GRU) to jointly learn the
scoring process and the selection of sentences [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. This work
applies bi-GRU to make sentence embeddings and then combines
sentence vectors with another bi-GRU layer to make document
level representations.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Slide Generation</title>
      <p>
        Previously, a method was proposed to generate slides from
documents by identifying important topics and then adding their related
sentences to the summary [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. A tool called PPSGen applies a
Support Vector Regressor for sentence ranking and then ILP to
select important sentences with a set of sophisticated constraints
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Another method generates slides by phrases extracted from
papers [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. The model learns the saliency of each phrase and the
hierarchical relationship between a pair of phrases to make the
bullet points and to determine their place in the slide. However,
their model was tested on a limited set of only 175 paper-slide pairs.
      </p>
      <p>Compared with previous works, we use a combination of feature
based and deep neural network methods to score sentences. Then,
ILP is applied to build the summary. To transform the summary
into a slide format, first-level bullet points are generated using key
phrases extracted from the selected sentences.
3</p>
    </sec>
    <sec id="sec-5">
      <title>MODEL</title>
      <p>The summarization system consists of 2 steps. The first is to train a
model to calculate scores used to identify important sentences. The
second is to extract salient sentences under constraints. The model
is trained on a corpus of aligned pairs of papers and slides. In the
training process, we investigate Convolutional Neural Networks
(CNN), Gated Recurrent Unit (GRU), and Long Short Term Memory
(LSTM) using pre-trained word embeddings (WE) to represent the
semantic feature of sentences. This semantic representation is
enhanced by combining other surface features of the current sentence
and its contextual features. The high-level architecture is shown in
Figure 1.
In this section, we describe how to generate the ground truth by
assigning scores to sentences in a paper given a paper-slide pair.
An academic paper D can be represented as a sequence of sentences
D = {s1, s2, . . . , sn }. Each sentence is represented by the set of its
unigrams and bigrams, such as si = {t1, t2, . . . , tk , t1t2, t2t3, . . . , tk−1tk }
where ti is the i-th token in the sentence. The corresponding
presentation of the document D can be represented as a sequence of
slides P = {p1, p2, . . . , pm }. The salience score of each sentence
is determined by the highest Jaccard similarity of unigrams and
bigrams between a sentence and all slides in the corresponding
presentation.
si Ð pj</p>
      <p>∀si ∈ D, scoresi = max Jaccard(si , pj ) (1)</p>
      <p>pj ∈P
The Jaccard similarity of si and pj sets is defined as Jaccard(si , pj ) =
si Ñ pj</p>
      <p>Having the labeled sentences with their scores, we can start
training diferent models to learn the scoring functions.
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Sentence Ranking</title>
      <p>The goal of the sentence ranking module is to train a model f (si |ϕ)
that minimizes the Mean Square Error (MSE) defined as below:
∀si , 0 ≤ f (si |ϕ) ≤ 1
MSE = Õ
(f (si |ϕ) − scoresi )</p>
      <p>2
si
in which ϕ is sentence embedding combined with syntactic and
contextual features.</p>
      <p>The sentence ranking model consists of three modules. The first
models the meaning of a sentence using a deep neural network
that embeds the sentence itself into a vector. The second
evaluates the capability of the sentence to summarize its contextual
sentences. The last extracts a variety of surface features from the
sentence structure and its position in the paper. The results from
all of the three modules are combined to build the input for the
ifnal Multi-Layer Perceptron (MLP) with two hidden layers which
acts as a regressor to predict the final score for a sentence. Figure
2 depicts the embedding layers for the construction of sentence
representation vectors.
3.2.1 Semantic Embedding. In this layer, the semantic of a sentence
is encoded into a vector. We compare the performance of both
convolutional and recurrent networks in order to choose the best
system for semantic embedding.</p>
      <p>Convolutional Neural Network (CNN): In this layer, the
semantic representation of a sentence si is generated by concatenating
embeddings of bi-grams. The intuition is that bi-grams preserve
the sequential information of the text. Convolutional layers and
element-wise max-pooling on top of the bigram embeddings
capture important patterns in the sentence that match similar patterns
in the reference slides.</p>
      <p>The bigram representations are calculated by concatenating
pretrained word embeddings of tokens in the bigram.</p>
      <p>biдramj = [vj , vj+1]</p>
      <p>In which vj and vj+1 stand for the current and the next word
vectors in sentence si .</p>
      <p>Vbiдr amj = tanh(biдramj ∗ Wc + b)</p>
      <p>Where filter Wc ∈ R2|vj |∗|vj | is applied on a vector of two words
to make new features. The activation function is the Hyperbolic
Tangent and b is bias term.</p>
      <p>
        Vsi = 0&lt;jm&lt; a|sxi |−1 Vbiдr amj
where the max function is an element-wise max pooling and Vsi is
the semantic embedding of sentence si [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <sec id="sec-6-1">
        <title>Recurrent Neural Network (RNN): The convolutional layers</title>
        <p>above can model the sequential patterns up to 2 consecutive tokens.
However, RNNs are able to capture the dependencies of tokens in
long sentences. Both LSTM and GRU have been widely used to
remember long term dependencies by keeping memory/state from
(2)
(3)
(4)
(5)
(6)
previous activations instead of replacing the entire activation like
a vanilla RNN.</p>
        <p>LSTM: The LSTM tracks long term dependencies via input (it ),
forget (ft ), and output (ot ) gates; the input gate regulates how much
of the new cell state to keep, the forget gate forgets a portion of
existing memory and the output gate sends important information
in the cell state to the next layers of the network.</p>
        <p>GRU: The GRU operates using a reset gate and an update gate.
The reset gate decides how much of the past information to forget,
and the update gate helps the model to determine how much of the
past information needs to be passed to the future.</p>
        <p>
          The input to the recurrent units are pre-trained word vectors
and the unit outputs are combined to build the sentence semantic
embeddings.
3.2.2 Context Embedding. The context of a sentence is defined as
sentences before and after the current sentence. Intuitively, the
sentence that includes an abstract of its context is more important
and should have a higher chance to be selected for the summary of
the paper. For instance, some paragraphs are ended by a sentence
that summarizes the whole paragraph. This sentence probably has
a high word overlap with other sentences of the paragraph.
Contextual sentence relations have been used to model attention weights
of sentences and then to identify important sentences [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. The
context information is embedded as the following vectors.
        </p>
        <p>
          cos (Vsi , Vsi−1 ) cos (Vsi , Vsi+1 )
pr evSi = cos (Vsi , Vsi−2 ) , nex tSi = cos (Vsi , Vsi+2 )
cos (Vsi , Vsi−3 ) cos (Vsi , Vsi+3 )
(7)
3.2.3 Syntactic Embedding. The models suggested in the above
focus on representing the meaning of the sentences and their
context into vectors. Although semantic features are strong metrics for
identification of salient sentences, surface features are also proved
to be important [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The surface features we extracted are tabulated
in Table 1.
        </p>
        <p>The “section” feature is a one-hot vector to represent the section
that sentence belongs to. Certain sections, such as conclusion, can
be more important to indicate sentence importance compared with
other sections such as acknowledgment. Table 2 shows the average
scores of the sentences in a set of pre-defined sections calculated
from the labeled paper-slide pairs. The table indicates that
Introduction and Conclusion sections are relatively more important than
the other sections.</p>
        <p>The second feature is the relative position of the sentence in the
section in units of sentences.</p>
        <p>The other features #NP, #VP, #S, and “height” are obtained from
the parse tree of the sentence. #NP shows the number of noun
phrases in the sentence, #VP counts verb phrases, #S indicates the
number of sub-sentences and “height” is the height of the sub-tree.
We used Stanford CoreNLP that outputs tree structures consisting
of constituent noun and verb phrases.</p>
        <p>The stop_word%, #tokens, and “length” are ratios of the stop
words among tokens, number of tokens and number of characters
in the sentences, respectively.</p>
        <p>The “paper_title_similatiry” is the Jaccard similarity between
the tokens in the title of the paper and the sentence. The
“section_title_similarity” is the Jaccard similarity of the section title
and the current sentence.</p>
        <p>The “average_IDF” feature is the average of IDF scores of tokens
of the sentence and “average_TF” takes the average of the TF scores.
3.3</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Sentence Selection</title>
      <p>The model above generates a salience score for each sentence in
the paper. Next, we compared a greedy method and the ILP for
sentence selection.
3.3.1 Greedy Method. In this intuitive approach, sentences are
ranked based on their salience scores and then are added to the
summary while the bigram overlap is less a threshold θ and the
summary size does not exceed a maximum limit. The threshold
parameter is tuned to be θ = 0.7 as higher or lower amounts result
in worse performance.
3.3.2 Integer Linear Programming (ILP). ILP is an optimization
problem with linear constraints and objective functions. ILP
restricts some of the variables to be integers and it is NP-complete.
We adopted the IBM CPLEX Optimizer1, which has been used to
solve ILP problems eficiently.</p>
      <p>We aim to maximize the following objective function:
max Õ liWi xi , subject to: (8)
Õ</p>
      <p>i ∈Ns
li xi &lt; maxLen,
∀i, xi ∈ {0, 1}
(9)
i
where Wi is the sentence scores generated by the deep model
suggested in the paper. The xi is a binary variable showing whether
sentence i is selected for the summary or not. The li is the number
of characters in sentence i and Eq. (9) sets a constraint to limit the
size of the summary to a maximum length maxLen. Existence of li
in the Eq. (8) helps to penalize short sentences.
3.4</p>
    </sec>
    <sec id="sec-8">
      <title>Slide Generation</title>
      <p>Typical presentation slides include a limited number of bullet points
which is often the first-level of the slide structure. These bullet
points are usually noun phrases or shortened versions of sentences.
Some slides may contain second-level bullet points for further
breakdowns. Less than 8% of the content of the presentations in the
ground truth corpus is covered in third-level bullets. Therefore, we
generate slides with up to 2 bullet levels. Slide titles on average
contains 4 words and either Level 1 or Level 2 bullets contains on
average 8 words. Each slide is on average made of 36 words in 5
bullets and each level-1 bullet includes 2 second-level bullets.</p>
      <p>The deep neural model first ranks the sentences and then ILP
helps to select salient sentences by considering the length of the
sentences and length limit for the generated presentation. These
selected sentences are considered as the second-level bullets. The
ifrst-level bullets are the noun phrases extracted from the sentences.
Some of the noun phrases are removed from the candidate set if
(1) They have more than 10 tokens.(2) They have only 1 token. (3)
Their DF is higher than 10. Noun phrases with a high degree of
1https://www.ibm.com/products/ilog-cplex-optimization-studio
document frequency are not informative enough to be added as a
bullet point such as “the model”.</p>
      <p>Once a set of candidate noun phrases is selected, they are sorted
based on the average frequency of their tokens. The noun phrases
and the sentences containing them are added to the slides in order of
their frequency until all of the selected sentences by the model are
covered in the slides. The slide generation process is demonstrated
in Algorithm 1.</p>
      <p>In order to set the titles of the slides, the section of the first
sentence in the slide is determined and then the heading of the
section is borrowed from the paper as the title of the slide. The
heading is truncated to the first 5 tokens. We constrain a maximum
of 4 sentences per slide. If a topic has more than 4 related sentences,
the slide is split into two distinct ones.</p>
      <p>Data: sentences</p>
      <sec id="sec-8-1">
        <title>Result: slides ranked_sents = rank(sentences); selected_sents = ILP(ranked_sents);</title>
        <p>NPs = {} ; // dictionary of NPs and a list of its
sentences
for s ∈ selected_sents do
for np ∈ noun_phrases(s) do</p>
        <p>NPs[np].add(s)
end
end
freq_rank(NPs);
frequency of tokens
slides=[];
while selected_sents , ∅ do
for np, sents ∈ N Ps do
splides[np] = [];
for s ∈ sents do
if s ∈ selected_sents then
slides[np].add(s);
selectet_sents \ {s};
end
end
end</p>
        <p>// rank noun phrases based on
end
distribute_sents(slides) ;
sentences per slide</p>
        <p>Algorithm 1: Bullet Point Generation.</p>
        <p>
          // allow a maximum of 4
The dataset is a collection of 1200 scientific papers and their
corresponding presentation slides made by their authors. This dataset
is adopted from a previous project called PPSGen [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The dataset
contains conference proceedings in computer science in PDF
format. To extract the header metadata and content of the paper from
PDF files, GROBID [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is used and information is transformed
into an XML file with the TEI (Text Encoding Initiative) format, a
comprehensive encoding standard for machine-readable texts.
Feature
section
position
height
#NP
#VP
#S
length
stop_word%
#tokens
paper_title_similarity
section_title_similarity
average_IDF
average_TF
        </p>
        <p>Type
Section (Intro, Related, Model, Result, Conclusion, Acknowledge) of the sentence
Position of the sentence in the section
Height of the Parse tree of the sentence
Number of noun phrases in the sentence
Number of verb phrases in the sentence
Number of sub-sentences in the sentence
Number of characters in sentence
Ratio of the stop words in the sentence
Number of tokens in the sentence
Jaccard similarity of the sentence to the title of the paper
Jaccard similarity of the sentence to the title of the section
Average of the IDF scores of tokens
Average of the frequency of tokens in the paper</p>
        <p>The slides are either in PDF or PPT format. The PPT files are
ifrst converted to PDFs. Then, the text in presentations is extracted
by the LibreOfice pdftotext tool.
4.2</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Model Configuration</title>
      <p>
        Stanford CoreNLP 3.9.2 is used to tokenize and lemmatize sentences
to the constituent words and also to build the parse trees. The
word embeddings are initialized with GloVe 1.2 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] 50-dimensional
vectors trained on Wikipedia. The pre-trained GloVe vectors contain
400,000 words and cover 98.9% of our model vocabulary. The rest
of the word embeddings are randomly initialized using a Gaussian
distribution. Drop out layers with probability p = 0.3 are added to
the recurrent model to randomly remove some of the neurons from
the network in order to prevent over-fitting [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Adam optimizer
with L2-norm regularization and mini-batches of size 100 are used
to train the model. The learning rate was set to 0.0001 and the model
is trained for 50 epochs. We truncate each sentence containing more
than 200 words.
4.3
      </p>
    </sec>
    <sec id="sec-10">
      <title>Evaluation Metric</title>
      <p>
        The Recall-Oriented Understudy for Gisting Evaluation (ROUGE)
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is a standard metric for automatic evaluation of
machinegenerated summaries. ROUGE-N is an n-gram overlap between
a candidate summary and a reference summary. The recall-oriented
score (Re) and the precision-oriented score (Pr ) are computed as
follows:
      </p>
      <p>Re =
P r =
Íд∈Ref er ence Countmatch(д)</p>
      <p>Íд∈Ref er ence Count (д)
Íд∈Ref er ence Countmatch(д)
Íд∈Candidat e Count (д)
(10)
(11)</p>
      <p>where д stands for an n-gram in the reference summary Re f erence,
n is its length, Countmatch(д) is the number of shared n-grams
between the candidate Candidate and the reference summary and
Count (д) is the number of n-grams in reference or candidate
summary. The F-measure score(F ) is harmonic mean of Pr and Re.</p>
      <p>
        ROUGE-L [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is similarly defined based on the Longest Common
Sub-sequence (LCS) and ROUGE-W is Weighted Longest Common
Sub-sequence developed to diferentiate LCSs of diferent spatial
relations. ROUGE-W gives more weight to the sequences that are
spatially closer to each other [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Empirically, ROUGE-2 is more suitable for single document
summaries compared with ROUGE-1 [
        <xref ref-type="bibr" rid="ref10 ref15">10, 15</xref>
        ].
(1) AvgTFIDF: This model ranks sentences based on the Avg(T F )×
Ave(I DF ) named AvgTFIDF scores of the tokens in sentence.
The model greedily adds the sentences to the summary until
it hits the size limit.
(2) TextRank: TextRank is a graph-based algorithm where
vertices are sentences and the edge weights are the cosine
similarity of the sentence embeddings [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
(3) MEAD: MEAD is a publicly available platform for
multilingual summarization. The platform implements a
combination of position-based and centroid-based algorithms to
score the sentences [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>The results for the ROUGE scores are shown in Table 3. Our
model with the CNN neural structure and ILP sentence selector
achieved the best F scores in terms of ROUGE-2, ROUGE-L, and
ROUGE-W compared with the baselines.</p>
      <p>Intuitively, LSTMs should have shown a better performance in
representing sentences with long dependencies. However, they are
prone to over-fitting since there are many parameters to be tuned.</p>
      <p>ILP always outperforms the simple greedy algorithm. The reason
is that it is able to recognize if adding multiple shorter and less
important sentences will result in better ROUGE scores compared
with having fewer long and more important sentences.
MEAD
GRU
LSTM
LSTM
CNN
CNN
selector
Greedy
Greedy
Greedy
Greedy
Greedy</p>
      <p>ROUGE-1</p>
      <p>ROUGE-2</p>
      <p>ROUGE-L</p>
      <p>ROUGE-W
Pr%</p>
      <p>Re%</p>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>Although summarization has been well studied, the task of
automatically generation of slides is relatively new. This paper uses
neural network models to encode syntax, semantics, and context
of sentences as features to automatically generate slide content.
A multilayer perceptron on top of sentence embeddings acts as a
regressor that ranks the sentences. Our model then selects salient
sentences with the constraint of a limit on summary length. Our
method shows higher ROUGE scores compared with the introduced
baselines which we interpret to mean that the slides generated have
a high overlap with manually made slides.</p>
      <p>The software implementation of the model is available at GitHub:
https://github.com/atharsefid/Automatic-Slide-Generation.
6</p>
    </sec>
    <sec id="sec-12">
      <title>ACKNOWLEDGEMENTS</title>
      <p>The National Science Foundation is gratefully acknowledged for
partial support.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Federico</given-names>
            <surname>Barrios</surname>
          </string-name>
          , Federico López, Luis Argerich, and
          <string-name>
            <given-names>Rosa</given-names>
            <surname>Wachenchauzer</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Variations of the Similarity Function of TextRank for Automated Summarization</article-title>
          .
          <source>CoRR abs/1602</source>
          .03606 (
          <year>2016</year>
          ). arXiv:
          <volume>1602</volume>
          .03606 http://arxiv.org/abs/1602.03606
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Jaime</surname>
            <given-names>G</given-names>
          </string-name>
          <string-name>
            <surname>Carbonell and Jade Goldstein</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>The Use of MMR and Diversity-Based Reranking for Reodering Documents</article-title>
          and
          <string-name>
            <given-names>Producing</given-names>
            <surname>Summaries</surname>
          </string-name>
          . (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Ronan</given-names>
            <surname>Collobert</surname>
          </string-name>
          , Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and
          <string-name>
            <given-names>Pavel</given-names>
            <surname>Kuksa</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Natural language processing (almost) from scratch</article-title>
          .
          <source>Journal of machine learning research 12</source>
          ,
          <string-name>
            <surname>Aug</surname>
          </string-name>
          (
          <year>2011</year>
          ),
          <fpage>2493</fpage>
          -
          <lpage>2537</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>John</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Conroy and Dianne P O'leary</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Text summarization via hidden markov models</article-title>
          .
          <source>In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM</source>
          ,
          <volume>406</volume>
          -
          <fpage>407</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Günes</given-names>
            <surname>Erkan and Dragomir R Radev</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Lexrank: Graph-based lexical centrality as salience in text summarization</article-title>
          .
          <source>Journal of artificial intelligence research 22</source>
          (
          <year>2004</year>
          ),
          <fpage>457</fpage>
          -
          <lpage>479</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Yue</given-names>
            <surname>Hu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Xiaojun</given-names>
            <surname>Wan</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>PPSGen: Learning-based presentation slides generation for academic papers</article-title>
          .
          <source>IEEE transactions on knowledge and data engineering 27</source>
          ,
          <issue>4</issue>
          (
          <year>2015</year>
          ),
          <fpage>1085</fpage>
          -
          <lpage>1097</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Hayato</given-names>
            <surname>Kobayashi</surname>
          </string-name>
          , Masaki Noguchi, and
          <string-name>
            <given-names>Taichi</given-names>
            <surname>Yatsuka</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Summarization based on embedding distributions</article-title>
          .
          <source>In Proceedings of the 2015 conference on empirical methods in natural language processing</source>
          . 1984-
          <fpage>1989</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Chen</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Xian</given-names>
            <surname>Qian</surname>
          </string-name>
          , and Yang Liu.
          <year>2013</year>
          .
          <article-title>Using supervised bigram-based ILP for extractive summarization</article-title>
          .
          <source>In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , Vol.
          <volume>1</volume>
          .
          <fpage>1004</fpage>
          -
          <lpage>1013</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Sujian</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>You</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Wei</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Bin</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Multi-document summarization using support vector regression</article-title>
          .
          <source>In Proceedings of DUC. Citeseer.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Chin-Yew Lin</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Rouge: A package for automatic evaluation of summaries</article-title>
          .
          <source>Text Summarization Branches Out</source>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Patrice</given-names>
            <surname>Lopez</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications</article-title>
          .
          <source>In Proceedings of the 13th European Conference on Research and Advanced Technology for Digital Libraries (ECDL'09)</source>
          .
          <fpage>473</fpage>
          -
          <lpage>474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Utiyama</given-names>
            <surname>Masao</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hasida</given-names>
            <surname>Kôiti</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Automatic slide presentation from semantically annotated documents</article-title>
          .
          <source>In Proceedings of the Workshop on Coreference and its Applications</source>
          .
          <source>Association for Computational Linguistics</source>
          ,
          <fpage>25</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Ryan</given-names>
            <surname>McDonald</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>A study of global inference algorithms in multi-document summarization</article-title>
          .
          <source>In European Conference on Information Retrieval</source>
          .
          <fpage>557</fpage>
          -
          <lpage>564</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Rada</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          and
          <string-name>
            <given-names>Paul</given-names>
            <surname>Tarau</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Textrank: Bringing order into text</article-title>
          .
          <source>In Proceedings of the 2004 conference on empirical methods in natural language processing.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Karolina</surname>
            <given-names>Owczarzak</given-names>
          </string-name>
          , John M Conroy, Hoa Trang Dang, and
          <string-name>
            <given-names>Ani</given-names>
            <surname>Nenkova</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>An assessment of the accuracy of automatic evaluation in summarization</article-title>
          .
          <source>In Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization. Association for Computational Linguistics</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Jefrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Glove: Global vectors for word representation</article-title>
          .
          <source>In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          .
          <volume>1532</volume>
          -
          <fpage>1543</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Pengjie</surname>
            <given-names>Ren</given-names>
          </string-name>
          , Zhumin Chen, Zhaochun Ren, Furu Wei, Jun Ma, and Maarten de Rijke.
          <year>2017</year>
          .
          <article-title>Leveraging contextual sentence relations for extractive summarization using a neural attention model</article-title>
          .
          <source>In Proceedings of the 40th International SIGIR Conference on Research and Development in Information Retrieval</source>
          .
          <fpage>95</fpage>
          -
          <lpage>104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Pengjie</surname>
            <given-names>Ren</given-names>
          </string-name>
          , Furu Wei, CHEN Zhumin, MA Jun, and
          <string-name>
            <given-names>Ming</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>A redundancy-aware sentence regression framework for extractive summarization</article-title>
          .
          <source>In Proceedings of COLING</source>
          <year>2016</year>
          ,
          <source>the 26th International Conference on Computational Linguistics: Technical Papers</source>
          .
          <fpage>33</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Dou</surname>
            <given-names>Shen</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jian-Tao</surname>
            <given-names>Sun</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Hua</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Qiang</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Zheng</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Document summarization using conditional random fields.</article-title>
          .
          <string-name>
            <surname>In</surname>
            <given-names>IJCAI</given-names>
          </string-name>
          , Vol.
          <volume>7</volume>
          .
          <fpage>2862</fpage>
          -
          <lpage>2867</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Nitish</surname>
            <given-names>Srivastava</given-names>
          </string-name>
          , Geofrey Hinton, Alex Krizhevsky, Ilya Sutskever, and
          <string-name>
            <given-names>Ruslan</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Dropout: a simple way to prevent neural networks from overfitting</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          <volume>15</volume>
          ,
          <issue>1</issue>
          (
          <year>2014</year>
          ),
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>DR</given-names>
            <surname>Timothy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T</given-names>
            <surname>Allison</surname>
          </string-name>
          ,
          <article-title>S Blair-goldensohn,</article-title>
          <string-name>
            <surname>J Blitzer</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          <string-name>
            <surname>Elebi</surname>
            ,
            <given-names>S</given-names>
          </string-name>
          <string-name>
            <surname>Dimitrov</surname>
            ,
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>Drabek</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          <string-name>
            <surname>Hakim</surname>
            ,
            <given-names>W</given-names>
          </string-name>
          <string-name>
            <surname>Lam</surname>
            ,
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
          </string-name>
          , et al.
          <year>2004</year>
          .
          <article-title>MEAD a platform for multidocument multilingual text summarization</article-title>
          .
          <source>In International Conference on Language Resources and Evaluation.</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Sida</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiaojun Wan</surname>
            , and
            <given-names>Shikang</given-names>
          </string-name>
          <string-name>
            <surname>Du</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Phrase-based presentation slides generation for academic papers</article-title>
          .
          <source>In 31st AAAI Conference.</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Qingyu</surname>
            <given-names>Zhou</given-names>
          </string-name>
          , Nan Yang, Furu Wei, Shaohan Huang,
          <string-name>
            <surname>Ming Zhou</surname>
            , and
            <given-names>Tiejun</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Neural document summarization by jointly learning to score and select sentences</article-title>
          . arXiv preprint arXiv:
          <year>1807</year>
          .
          <volume>02305</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>