<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Generation of Review Matrices as Multi-document Summarization of Scienti c Papers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hayato Hashimoto</string-name>
          <email>hayat.hashimoto@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kazutoshi Shinoda</string-name>
          <email>kazutoshi.shinoda0516@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hikaru Yokono</string-name>
          <email>yokono.hikaru@jp.fujitsu.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Akiko Aizawa</string-name>
          <email>aizawa@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fujitsu Laboratories Ltd</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Institute of Informatics, The University of Tokyo</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The University of Tokyo</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>A synthesis matrix is a table that summarizes various aspects of multiple documents. In our work, we speci cally examine a problem of automatically generating a synthesis matrix for scienti c literature review. As described in this paper, we rst formulate the task as multidocument summarization and question-answering tasks given a set of aspects of the review based on an investigation of system summary tables of NLP tasks. Next, we present a method to address the former type of task. Our system consists of two steps: sentence ranking and sentence selection. In the sentence ranking step, the system ranks sentences in the input papers by regarding aspects as queries. We use LexRank and also incorporate query expansion and word embedding to compensate for tersely expressed queries. In the sentence selection step, the system selects sentences that remain in the nal output. Speci cally emphasizing the summarization type aspects, we regard this step as an integer linear programming problem with a special type of constraint imposed to make summaries comparable. We evaluated our system using a dataset we created from the ACL Anthology. The results of manual evaluation demonstrated that our selection method using comparability improved performance.</p>
      </abstract>
      <kwd-group>
        <kwd>multi-document summarization</kwd>
        <kwd>review matrix</kwd>
        <kwd>scienti c paper mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Literature surveys are a fundamentally important part of research.
Nevertheless, the increasing amounts of scienti c literature demand a great deal of time
for nding and reading all relevant papers. Although survey articles are often
⋆ Currently at Google.
available for major topics, they are not always available for new or small topics.
To address and mitigate these issues in surveying, scienti c summarization has
been widely studied. In scienti c summarization, the input is scienti c papers
related to a certain topic. The goal is to generate a summary of them.</p>
      <p>
        A synthesis matrix, or a (literature) review matrix, is a table showing a
summary of multiple sources in different aspects. Synthesis matrices, which
are regarded as effective tools for literature review [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], allow readers to
analyze and compare source documents from different points of view. For example,
an overview paper for a shared task typically includes a table that presents
comparison of systems participating in the task (e.g., [16]).
      </p>
      <p>Table 1 represents one example of a synthesis matrix. In this matrix, each row
corresponds to a paper; each column corresponds to an aspect. For instance, the
Approach column shows roughly what type of approach is used in each system
by categorizing approaches into four types, whereas the Description of Approach
column presents details of the approaches.</p>
      <p>Our goal is to generate such matrices automatically. We formulate the task of
automatic synthesis matrix generation as text summarization. Then we propose
a model for the task. What makes synthesis matrix generation different from
general summarization is that documents are mutually compared in summaries.
We propose a system that is designed to capture this characteristic.</p>
      <p>Our system is based on query-focused summarization (QFS), a variant of text
summarization in which a generated summary provides an answer or a support
to a query. A QFS-based approach alone, however, cannot achieve the
characteristic described above because it only processes a single document at a time. To
make summaries comparable, we incorporate the idea of comparative
summarization, which aims to clarify and emphasize differences among documents. The
proposed system consists of two steps: sentence ranking and sentence selection.
The former step ranks sentences using the query focused version of LexRank
[17]. The latter selects sentences using integer linear programming (ILP) with
an objective function that re ects comparability.</p>
      <p>For evaluation, we created a dataset consisting of synthesis matrices taken
from overview papers of shared tasks in the ACL Anthology, a database of
papers on NLP. We conducted automatic evaluation using the evaluation metric
ROUGE as well as manual evaluation by comparing the system output to the
references. We experimented with various combinations of query relevance and
query expansion to see the effectiveness of these techniques. We also compared
our ILP-based sentence selection method with multiple greedy baseline methods.
Results showed that our method is effective for synthesis matrix generation.</p>
      <p>Our contributions can be summarized as the following. (1) We analyzed
synthesis matrices in NLP and formulated the task of synthesis matrix generation.
(2) We proposed a system based on LexRank and ILP for the task. (3) We showed
that consideration of comparability between papers improves the performance
of the proposed system.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Analysis of Synthesis Matrices and Task Formulation</title>
      <sec id="sec-2-1">
        <title>Dataset Construction</title>
        <p>We created a dataset from papers on the ACL Anthology5, a full-text archive of
papers on natural language processing 6. In the construction of the dataset, we
rst selected the eight shared tasks listed in Table 1. For each shared task, we
extracted a summary table of the participating systems, and (ii) corresponding
system description papers. Here, we consider the summary table as a golden
synthesis matrix for the description papers.</p>
        <p>
          Next, we extracted sentences from the system description papers. We used
XML format les that had been converted automatically from their original
PDF versions using the SideNoter Project [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Because the XML les include
the section structure of the papers, all extracted sentences were associate with
the section titles in which they appear. Text that appears in speci c regions,
such as captions, footnotes, or references, was excluded. The Genia sentence
splitter (GeniaSS)7 was used for sentence splitting. Table 1 shows the
fundamental statistics of our dataset. We also used the text corpus obtained from the
entire ACL Anthology to calculate word embeddings used in Equations 5 and 8.
2.2
        </p>
        <p>Aspect Phrasing
In the synthesis matrices we analyzed in this paper, an aspect is always phrased
as a noun phrase: e.g., System Architecture and Verb. Because aspects are used
in the header of a synthesis matrix, they are often very brief and ambiguous</p>
        <sec id="sec-2-1-1">
          <title>5 http://aclanthology.info/</title>
          <p>6 Because our method does not rely on any external knowledge related to the domain,
we expect that the proposed framework is applicable to other domains as well.
7 http://www.nactem.ac.uk/y-matsu/geniass/
(e.g. Syntax and Error ). Such aspects are sometimes extremely difficult to
understand, even for humans, when presented with no context. We can regard these
phrases as shortened, condensed version of the actual aspects, which can be
expressed precisely in longer phrases or sentences: Error in the previous example
actually is a short version of Error types the system handles, or more speci cally,
Grammatical error types the system attempts to detect and correct.</p>
          <p>
            When considering a system that generates a synthesis matrix, users would
give more speci c aspects rather than such header-style aspects. In fact,
questions or queries in datasets for query-focused summarization are worded much
more clearly and in greater detail as examples from the DUC 2006 dataset [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]
shows: Describe theories related to the causes and effects of global warming and
arguments against these theories. If brief, unclear aspects are the only clue about
what a system is presumed to nd. It would be safe to say that the task of
synthesis matrix generation is a considerably difficult task to address. In this work,
however, we use header-style aspects in the dataset we created for experiments
because it is not trivial how we should elaborate the original aspects.
2.3
          </p>
          <p>Aspect Types
First, we analyzed the synthesis matrices to ascertain what kind of aspect
synthesis matrices typically have, and what kind of answer they expect. We categorized
aspects into the following four types:
1. Description: Sentences or phrases are anticipated as an answer. (36%)
e.g., Description of approach { Phrase-based translation optimized for...
2. Item: Identify entities or concepts given a factoid type question. This
includes numerical entities such as performance scores. (31%)</p>
          <p>e.g., Learning method [used in the system] { Naive Bayes, MaxEnt
3. Choice: Selection from a prede ned vocabulary set. Multiple choice is often
allowed. (24%)</p>
          <p>e.g., Error [types that the system handles] { SVA, Vform, Wform
4. Binary: The answer is yes or no. (9%)</p>
          <p>e.g., [Whether the system uses] external resources { No
The examples presented above are actual aspect{answer pairs from the matrices.
Words in brackets are added for clari cation.</p>
          <p>Description and Item, the two most frequent types, can be handled within
a summarization framework. Description can naturally be regarded as
abstractive summarization. For Item, sentences that provide information about
the answer can be extracted in a summarization approach. For instance, if the
expected answer to an aspect external resources used is Wikipedia, then a
sentence including the information that Wikipedia is used as an external resource
can also be regarded as an answer. Based on the observation, we speci cally
examine Description and Item type aspects in this paper.</p>
          <p>In total, we collected 218 summaries, which we divided into a development
set (4 matrices, 101 summaries) for parameter tuning and a test set (4 matrices,
117 summaries). The development set has four Description queries and three
Item queries. The test set has seven Description queries and ve Item queries.
The average length of a query is 1.7 words for the development set and 2.2 words
for the test set. The average length of a reference summary is 5.9 words for the
development set and 8.9 words for the test set.
3
3.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <sec id="sec-3-1">
        <title>Overview of the Proposed Method</title>
        <p>Assuming that aspects are mutually independent, we de ne the task of synthesis
matrix generation as described below.</p>
        <p>{ Input: K documents fDig1 i K and an aspect aj
{ Output: K summaries of input documents based on aj (1
j</p>
        <p>K)
Our method is based on extractive summarization, where the objective is to
select a set of sentences in a document given the maximum length of the summary.</p>
        <p>Figure 2 presents an overview of the proposed framework. Our system consists
of two steps: sentence ranking and sentence selection. In the sentence-ranking
step, the system ranks sentences in the input papers by regarding aspects as
queries. In the sentence selection step, the system selects sentences that remain
in the nal output from the rankings.
3.2</p>
        <p>
          Sentence Ranking
Query-Focused LexRank LexRank, a graph-based sentence ranking method
presented by Erkan and Radev [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], is widely used for summarization. This
method rst constructs a graph in which each node represents a sentence. Then
to each edge it assigns similarity between the sentences the adjacent nodes
represent. It then ranks the nodes by considering a random walk on the graph and
by nding the stationary distribution.
        </p>
        <p>Actually, LexRank was demonstrated as useful for query-focused
summarization with a small modi cation to the algorithm [17], which we will call
QLexRank. Q-LexRank adds query relevance to edge weights to value sentences
that are related to the query. The score p(s j q) of a sentence s given a query q
is de ned as
p(s j q) = d ∑</p>
        <p>rel(s j q)
s′2D rel(s′ j q)
+ (1
d) ∑
s′2D</p>
        <p>sim(s; s′)
∑s′′2D sim(s′; s′′)
p(s′ j q);
(1)
where D is the input document. The rst term represents how relevant the
sentence s is to the query q. The second term represents how similar s is to the
other sentences. Here, d functions as a query bias, which balances these terms.</p>
        <p>We use the cosine measure de ned in the original LexRank to compute
sentence similarity as
(2)
(3)
sim0(x; y) =</p>
        <p>∑w2x;y tfw;xtfw;yidf2w
√∑w2x(tfw;xidfw)2√∑w2y(tfw;yidfw)2
;
where x and y are sentences, tfw;x is the number of times w appears in x, and
idfw = log
(</p>
        <p>n + 1
0:5 + jfs 2 D j w 2 sgj
)
:
When the model constructs a graph, this similarity value is set to zero when
it is less than a similarity threshold t: Using the Iverson bracket, sim(x; y) =
[sim0(x; y) t](sim0(x; y)). We used the query bias d = 0:95 and the similarity
threshold t = 0:2 following the original Q-LexRank.</p>
        <p>Query Expansion Query expansion is a commonly used information retrieval
technique. It is expected to help the system nd related sentences that have low
relevance to the original query. We test two query expansion methods:
{ Add words that frequently co-occur with the query words in document Di
the system is processing (cooccur).
{ In addition to the words added to cooccur, add frequently co-occurring
words in the entire document set D1; : : : ; DK (cooccur+).</p>
        <p>We add the ve most frequently co-occurring words for cooccur. For
cooccur+, we add ve words for the current document and ve words for the entire
document set.</p>
        <p>Use of Word Embedding in Query Relevance The query relevance of a
sentence s to a query q is de ned as follows in the original Q-LexRank paper
[17] using tf-idf values:
relt df (s j q) =
∑ log(tfw;s + 1)
w2q
log(tfw;q + 1)
idfw:
(4)
One problem of this measure is that it becomes non-zero only when s includes
at least one word in q, which yields a very small number of sentences with a
non-zero query relevance value.</p>
        <p>We use a query relevance measure based on word embedding to address this
problem. We de ne query relevance measures using word vectors as
1
n
relembn (s j q) =
sumLargestnfcos(vw; vu) j w 2 s; u 2 qg
(5)
where sumLargestn is a function that returns the sum of the n largest values. We
only use the largest values because smaller cosine values do not usually convey
precise information about word similarity.
3.3</p>
        <p>
          Sentence Selection
Integer Linear Programming Based Sentence Selection In the sentence
selection step, the system selects sentences from the rankings computed in the
ranking step to reduce the redundancy of the resulting summaries. We use an
ILP-based model proposed by McDonald [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. This method selects sentences
by maximizing the sum of the scores of the selected sentences and minimizing
similarity between them as
for 1 i &lt; j N . Here, len(si) is the length of the sentence si; also, L is
the maximum length of the resulting summary. The importance bias 2 [0; 1]
is tuned in the experiments. We designate this method as ilp. To reduce the
number of variables, we keep only the top 20 sentences in the rankings, i.e.,
N = 20 in Equation 6.
        </p>
        <p>Comparative Summarization The goal of comparative summarization is to
highlight differences among given documents. Most earlier studies treat
comparative summarization as an optimization problem with an objective function
that measures comparability. Comparability is typically measured as similarity
between a summary pair.</p>
        <p>Because summaries for the input papers must speci cally examine the given
aspect, we can expect them to have structurally and semantically similar
sentences. For example, if the aspect is Approach, then summaries are likely to
include sentences describing what is used or what is applied. Even though what
is used differs for each paper, it is true for all papers that something is used.
We propose action-based similarity and incorporate it into the objective function
to capture this nature of comparability and to align topics of summaries for a
certain aspect.</p>
        <p>Although it might not be readily apparent, we can identify the action of a
sentence. We adopt a simple heuristic using dependency trees to ascertain which
words describe the action. In the Universal Dependency Treebank for English8,
a dataset of dependency trees, 57% of sentence heads are verbs, 17% are nouns,
10% are adjectives, and 9% are proper nouns. Exploiting this knowledge, we use
the sentence head head(s) of a sentence s in our system. We de ne action-based
similarity simact using word embedding as</p>
        <p>simact(x; y) = cos(vhead(x); vhead(y)) :</p>
        <p>We incorporate the action-based similarity in the objective function because
it greatly increases the number of variables to assign to sentences in all input
documents. Because it optimizes summaries for all documents simultaneously, we
consider optimizing a single summary at a time. The system processes the input
documents D1; : : : ; DK in that order. For document Dl, we modify Equation 6
as
where Sl = S1 [ : : : [ Sl 1 and ; 2 [0; 1] ( + 1). This model maximizes
similarity between the summary for the current document and the summaries
for the already- processed documents. We designate this method as ilp+.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <sec id="sec-4-1">
        <title>Experimental Setup</title>
        <p>The objectives of the experiments are the following: The rst is to identify the
best combination of query expansion (no expansion, cooccur and cooccur+
8 http://universaldependencies.org/
) and relevance measure calculation (relt df and relemb8 ). The second is to
investigate the applicability of the proposed comparative summarization method
(ilp+) by comparing the result with (ilp) and also comparing it with two
baseline methods:
{ best: Select sentences from the top of the rankings until the summary length
reaches L, skipping a sentence if adding it makes the summary exceed the
limit.
{ greedy: Select sentences such as best but skip sentences similar to any of
the already-selected sentences within the summary. We set the threshold for
this to 0.6, i.e., sentences s and s′ are similar when sim0(s; s′) &gt; 0:6.</p>
        <p>Before the ranking step, input sentences and queries are tokenized,
lowercased, and stemmed. Stopwords are removed from both sentences and queries.
We set the maximum summary length L to 30. Word vectors were learned on
the entire ACL Anthology using word2vec9 with the default parameters.
4.2</p>
        <p>
          Evaluation Methods
For performance evaluation, we rst applied ROUGE10 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], a metric set for
evaluation of summarization. This report describes ROUGE-2 and ROUGE-SU4
scores.
        </p>
        <p>
          We also evaluated the system manually, similarly to the pyramid method
[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. We rst reviewed the reference summaries manually and identi ed
summary content units (SCUs) for each summary. Actually, SCUs are semantically
cohesive text units that are not longer than a sentence. Each item in the list is
considered an SCU in a Item-typed summary. For the Description type, we
made SCUs as small as possible if they make sense alone because it is possible
that a system summary includes only some part of the information the reference
summary has. We believe that such summaries should be evaluated positively.
        </p>
        <p>We evaluated system summaries using SCUs by manually counting how many
SCUs each system summary includes. This report describes macro-average and
micro-average of coverage for the entire test set (SCUmacro and SCUmicro,
respectively). However, judging whether a summary covers an SCU is not trivial.
We ascertained that a summary covers an SCU when the summary implies what
the SCU indicates in context. Words in the SCU appearing in the summary but
in different context were not counted.
4.3</p>
        <p>Results: Sentence Ranking Parameters
Table 2 presents the system performance in different combinations of a query
relevance measure and a query expansion method. Here, best is used as a selection
method to examine the results of sentence ranking speci cally. The parameters</p>
        <sec id="sec-4-1-1">
          <title>9 https://code.google.com/archive/p/word2vec/ 10 http://www.berouge.com/</title>
          <p>n in relembn were tuned in terms of ROUGE scores on the development set using
grid search and set to 8.</p>
          <p>The two best-performing combinations in terms of ROUGE scores were relt df
and cooccur+, and relemb8 and no query expansion. These two combinations
also performed the best in both micro-average and macro-average coverage.</p>
          <p>As presented in Table 2, relt df worked well with query Expansion, although
relemb8 worked best without query expansion. Both word-embedding-based query
relevance and query expansion aim to overcome simplicity of queries:
Wordembedding-based query relevance measures attempt to nd sentences that have
no query words in them but which are relevant by assigning high scores to them.
In contrast, query expansion strives to do the same thing by adding words related
to the query itself. Using both, sentences including words similar to added query
words are deemed relevant by the model, which leads to not-so-relevant sentences
being ranked highly.
We tuned the importance bias and in terms of ROUGE scores for ilp and
ilp+ using the development set. We used = 1:0 for ilp and ( ; ) = (0:8; 0:2)
for ilp+ in the following experiments. We picked the best two combinations in
the previous section: (A) relt df &amp; cooccur+ and (B) relemb8 alone.</p>
          <p>Table 3 presents the performance of the systems using different sentence
selection methods. Unlike the ROUGE scores, manual evaluation suggests that
ilp+ is the best of all methods. Results show that ilp came between ilp+ and
best/greedy for combination B but performed the worst for combination A.
Effect of Comparative Summarization Results showed that the term added
for redundancy prevention was not effective. Redundancy reduction might not
be necessary in this task because input documents typically have more than a
hundred sentences and because they have few redundant sentences.</p>
          <p>
            Results show that ilp+ performed better than the baseline methods in
manual evaluation. Unlike the other three methods, ilp+ uses information of the
summaries for the other input documents. Such information might be helpful
for the system to generate a cohesive set of summaries. However, ilp+ does not
consider comparability globally at the same time. It relies on already-generated
summaries for other documents, which means the output depends on the order
of the documents that are processed. A fast global optimization algorithm would
provide better performance.
ilp+ picks sentences with similar actions: Sentences include verbs such as use
and apply, which are often used to describe approaches.
have used citation networks [
            <xref ref-type="bibr" rid="ref2">20, 2</xref>
            ], which are based on the idea that sentences
describing a cited paper have crucial information related to the cited paper. Some
other works speci cally examine the surmounting of the incoherence obstacle
posed by summaries generated from multiple documents. Surveyor [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] combines
content and discourse models to generate coherent summaries. Parveen et al. [18]
proposed a graph-based approach that extracts coherence patterns from a corpus
and uses them.
          </p>
          <p>
            Actually, QFS was a shared task at the Document Understanding
Conferences 2005-2006. A number of methods have been proposed for the task. The
BayeSum [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] algorithm is based on a Bayesian statistical model. Liu et al. [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ]
proposed an unsupervised deep learning architecture and demonstrated its
effectiveness. Fisher and Roark [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] used feature similarity and centrality metrics as
well as query relevance and applied machine learning. Although most QFS
approaches are extractive, Wang et al. [23] proposed an abstractive QFS framework
using sentence compression.
          </p>
          <p>
            A small amount of research has been done for comparative summarization.
Huang et al. [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] proposed a linear-programming-based approach to comparative
news summarization. Wang et al. [22] formulated a task of comparative
summarization, which aims to highlight differences between multiple document groups,
and proposed a discriminative sentence selection approach. Although contrastive
summarization refers mainly to opinion summarization, similar ideas can be
found in it. We found a limited number of studies of contrastive summarization
for product reviews [
            <xref ref-type="bibr" rid="ref11">11, 21</xref>
            ] and for controversial topics [
            <xref ref-type="bibr" rid="ref8">19, 8</xref>
            ].
6
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We analyzed synthesis matrices in NLP-related papers and formulated the task
of synthesis matrix generation, and proposed a system for the task using
queryfocused and comparative summarization techniques. For sentence ranking, we
adopted query-focused LexRank with modi cations to redeem tersely expressed
queries. For sentence selection, we incorporated the idea of comparability in
an ILP-based sentence selection framework. By measuring sentence similarity,
we attempted to align summaries for different papers to make them mutually
contrastive. The results of automatic and manual evaluation suggest that our
selection method, which considers comparability, is effective for the task.</p>
      <p>We believe that our task formulation of automatic review matrix generation
is worthy of additional effort. In our framework, an aspect is expressed only as
a short noun phrase. As compensation, we used frequently co-occurring words
or word embeddings in our query-sentence relevance calculation. We observed
that using such techniques sometimes produces an unexpected sentence ranking.
Introduction of more descriptive aspects or domain ontologies is one avenue that
demands further investigation. In addition, this paper presents no consideration
of Choice and Binary type aspects (Sec. 2.3). How to formalize these types as
question-answering tasks is another issue to address.</p>
      <p>This work was supported by JSPS KAKENHI Grant Numbers 16K12546, 16H01756.
16. Tou Hwee Ng, Mei Siew Wu, Ted Briscoe, Christian Hadiwinoto, Hendy Raymond
Susanto, and Christopher Bryant. The CoNLL-2014 shared task on grammatical
error correction. In Proceedings of the Eighteenth Conference on Computational
Natural Language Learning: Shared Task, 2014.
17. Jahna Otterbacher, Gunes Erkan, and Dragomir R. Radev. Using random walks
for question-focused sentence retrieval. In Proceedings of the Conference on
Human Language Technology and Empirical Methods in Natural Language Processing,
HLT-EMNLP '05, pages 915{922, 2005.
18. Daraksha Parveen, Mohsen Mesgar, and Michael Strube. Generating coherent
summaries of scienti c articles using coherence patterns. In Proceedings of the
2016 Conference on Empirical Methods in Natural Language Processing, November
2016.
19. Michael J. Paul, ChengXiang Zhai, and Roxana Girju. Summarizing contrastive
viewpoints in opinionated text. In Proceedings of the 2010 Conference on Empirical
Methods in Natural Language Processing, EMNLP '10, pages 66{76, 2010.
20. Vahed Qazvinian and Dragomir R. Radev. Scienti c paper summarization using
citation summary networks. In Proceedings of the 22ndd International Conference
on Computational Linguistics - Volume 1, COLING '08, pages 689{696, 2008.
21. Ruben Sipos and Thorsten Joachims. Generating comparative summaries from
reviews. In Proceedings of the 22Nd ACM International Conference on Information
&amp; Knowledge Management, CIKM '13, 2013.
22. Dingding Wang, Shenghuo Zhu, Tao Li, and Yihong Gong. Comparative document
summarization via discriminative sentence selection. ACM Trans. Knowl. Discov.</p>
      <p>Data, 6(3):12:1{12:18, October 2012.
23. Lu Wang, Hema Raghavan, Vittorio Castelli, Radu Florian, and Claire Cardie. A
sentence compression based framework to query-focused multi-document
summarization. In Proceedings of the 51st Annual Meeting of the Association for
Computational Linguistics, 2013.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Takeshi</given-names>
            <surname>Abekawa</surname>
          </string-name>
          and
          <string-name>
            <given-names>Akiko</given-names>
            <surname>Aizawa</surname>
          </string-name>
          . Sidenoter:
          <article-title>Scholarly paper browsing system based on PDF restructuring and text annotation</article-title>
          .
          <source>In COLING</source>
          <year>2016</year>
          , 26th International Conference on Computational Linguistics,
          <source>Proceedings of the Conference System Demonstrations, December 11-16</source>
          ,
          <year>2016</year>
          , Osaka, Japan, pages
          <volume>136</volume>
          {
          <fpage>140</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Arman</given-names>
            <surname>Cohan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nazli</given-names>
            <surname>Goharian</surname>
          </string-name>
          .
          <article-title>Scienti c article summarization using citationcontext and article's discourse structure</article-title>
          .
          <source>In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Hoa</given-names>
            <surname>Trang</surname>
          </string-name>
          <article-title>Dang. Overview of duc 2006</article-title>
          .
          <source>In Proceedings of DUC 2006: Document Understanding Workshop</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hal</surname>
            <given-names>Daume</given-names>
          </string-name>
          , III and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Marcu</surname>
          </string-name>
          .
          <article-title>Bayesian query-focused summarization</article-title>
          .
          <source>In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL44</source>
          , pages
          <fpage>305</fpage>
          {
          <fpage>312</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Gunes Erkan and
          <string-name>
            <surname>Dragomir R Radev.</surname>
          </string-name>
          <article-title>LexRank: Graph-based lexical centrality as salience in text summarization</article-title>
          .
          <source>Journal of Arti cial Intelligence Research</source>
          ,
          <volume>22</volume>
          :
          <fpage>457</fpage>
          {
          <fpage>479</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Seeger Fisher and
          <string-name>
            <given-names>Brian</given-names>
            <surname>Roark</surname>
          </string-name>
          .
          <article-title>Query-focused summarization by supervised sentence ranking and skewed word distributions</article-title>
          .
          <source>In Proceedings of the Document Understanding Conference, DUC-2006</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Judith</given-names>
            <surname>Garrard</surname>
          </string-name>
          .
          <article-title>Health sciences literature review made easy: the matrix method</article-title>
          .
          <source>Aspen Publishers</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Jinlong</given-names>
            <surname>Guo</surname>
          </string-name>
          , Yujie Lu, Tatsunori Mori, and
          <string-name>
            <given-names>Catherine</given-names>
            <surname>Blake</surname>
          </string-name>
          .
          <article-title>Expert-guided contrastive opinion summarization for controversial issues</article-title>
          .
          <source>In Proceedings of the 24th International Conference on World Wide Web, WWW '15 Companion</source>
          , pages
          <volume>1105</volume>
          {
          <fpage>1110</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Xiaojiang</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Xiaojun</given-names>
            <surname>Wan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jianguo</given-names>
            <surname>Xiao</surname>
          </string-name>
          .
          <article-title>Comparative news summarization using linear programming</article-title>
          .
          <source>In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Rahul</surname>
            <given-names>Jha</given-names>
          </string-name>
          , Reed Coke, and
          <string-name>
            <given-names>Dragomir</given-names>
            <surname>Radev</surname>
          </string-name>
          .
          <article-title>Surveyor: A system for generating coherent survey articles for scienti c topics</article-title>
          .
          <source>In Proceedings of the Twenty-Ninth AAAI Conference on Arti cial Intelligence</source>
          ,
          <source>AAAI'15</source>
          , pages
          <fpage>2167</fpage>
          {
          <fpage>2173</fpage>
          . AAAI Press,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Kevin</given-names>
            <surname>Lerman</surname>
          </string-name>
          and
          <string-name>
            <surname>Ryan McDonald</surname>
          </string-name>
          .
          <article-title>Contrastive summarization: An experiment with consumer reviews</article-title>
          .
          <source>In Proceedings of Human Language Technologies</source>
          :
          <article-title>The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics</article-title>
          ,
          <year>June 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Chin-Yew Lin</surname>
          </string-name>
          .
          <article-title>ROUGE: A package for automatic evaluation of summaries</article-title>
          .
          <source>In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop</source>
          ,
          <year>July 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Yan</surname>
            <given-names>Liu</given-names>
          </string-name>
          , Sheng-hua
          <string-name>
            <surname>Zhong</surname>
            , and
            <given-names>Wenjie</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Query-oriented multi-document summarization via unsupervised deep learning</article-title>
          .
          <source>In Proceedings of the Twenty-Sixth AAAI Conference on Arti cial Intelligence</source>
          ,
          <source>AAAI'12</source>
          , pages
          <fpage>1699</fpage>
          {
          <fpage>1705</fpage>
          . AAAI Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Ryan McDonald</surname>
          </string-name>
          .
          <article-title>A study of global inference algorithms in multi-document summarization</article-title>
          .
          <source>In Proceedings of the 29th European Conference on IR Research</source>
          , ECIR'
          <volume>07</volume>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ani</surname>
            <given-names>Nenkova</given-names>
          </string-name>
          , Rebecca Passonneau, and
          <string-name>
            <surname>Kathleen McKeown</surname>
          </string-name>
          .
          <article-title>The pyramid method: Incorporating human content selection variation in summarization evaluation</article-title>
          .
          <source>ACM Trans. Speech Lang. Process.</source>
          ,
          <volume>4</volume>
          (
          <issue>2</issue>
          ), May
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>