<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Redundancy reduction for multi-document summaries using A* search and discriminative training</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ahmet Aker</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Trevor Cohn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Gaizauskas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Sheffield</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we address the problem of optimizing global multidocument summary quality using A* search and discriminative training. Different search strategies have been investigated to find the globally best summary. In them the search is usually guided by an existing prediction model which can distinguish between good and bad summaries. However, this is problematic because the model is not trained to optimize the summary quality but some other peripheral objective. In this work we tackle the global optimization problem using A* search with the training of prediction model intact and demonstrate our method to reduce redundancy within a summary. We use the framework proposed by Aker et al. [1] as a baseline and adapt it to globally improve the summary quality. Our results show significant improvements over the baseline.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Extractive multi-document summarization (MDS) aims to present the most important
parts of multiple documents to the user in a condensed form [
        <xref ref-type="bibr" rid="ref13 ref9">9, 13</xref>
        ]. This is achieved by
identifying a subset of sentences from the document collection which are concatenated
to form the summary. Two common challenges in extractive MDS are: search – finding
the best scoring summary from the documents – and training – learning the system
parameters to best describe a training set consisting of pairs of documents and reference
summaries.
      </p>
      <p>
        In previous work the search problem is typically decoupled from the training
problem. McDonald [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], for example, addresses the search problem by using Integer
Linear Programming (ILP). In his ILP problem formulation he adopts the idea of Maximal
Marginal Relevance (MMR) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to maximize the amount of relevant information in the
summary and at the same time to reduce the redundancy within it. Others have also
addressed the search problem using a variation of ILP [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] but as well as using
different approaches such as stack decoding algorithms [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], genetic algorithms [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and
submodular set function optimisation [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>By separating search from training these approaches assume the existence of a
predictive model which can distinguish between good and bad summaries. This is
problematic because the model is not trained to optimize the summary quality but some
other peripheral objective. The disconnect between the training and prediction settings
compromises the predictive performance of the approach.</p>
      <p>
        An exception is the work of Aker et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which proposes an integrated
framework that trains the full prediction model directly with the search algorithm intact.
Their training algorithm learns parameters such that the best scoring whole summary
under the model has a high score under an evaluation metric. However they only
optimize the summary quality locally and do not take into account global features such as
redundancy within the summary.
      </p>
      <p>
        This paper addresses the redundancy problem within the integrated framework
proposed by Aker et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and thus presents a novel approach to global optimization of
summary quality. We present and evaluate our approach for incorporating a redundancy
criterion into the framework. Our approach adapts the A* search to global optimization.
The core idea of this approach is that redundant sentences are excluded from the
summary if their redundancy with respect to the summary created so far exceeds a threshold.
In our experiments this threshold is learned automatically from the data instead of being
set manually as proposed in previous work.
      </p>
      <p>
        The paper is structured as follows. Section 2 presents the work of Aker et al., [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
in detail. In Section 3 we describe our modifications to the framework proposed by
Aker et al. and our proposed approach to address redundancy in extractive
summarization. Section 4 describes our experimental setup to evaluate the proposed approach, and
Section 5 the results. Finally, we conclude in Section 6.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        In this section we first review the work of Aker et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in detail, which is essential for
the understanding of our modifications to their framework.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Summarization Model</title>
        <p>A summarization model is used to score summaries. Summaries are ranked according
to these scores, so that in search, the summary with the highest score can be selected.
Aker et al. use the summarization model s to score a summary:
where x is the document set, composed of k sentences, y f1 : : : kg is the set of
indexes selected for the summary, ( ) is a feature function that returns a set of features
values for each candidate summary and is the weight vector associated with the set of
features. In search we use the summarization model to find the maximum summary y^:
s(yjx) = X</p>
        <p>(xi)
i2y
y^ = arg max s(yjx)</p>
        <p>y
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Search</title>
        <p>In Aker et al. the creation of a multi-document summary is formulated as a search
problem in which the aim is to find a subset of sentences from the entire set to form
a summary. The search is also constrained so that the subset of sentences does not
exceed the summary length threshold. In search, a search graph is constructed with
edges representing the connections between the sentences and states with summaries.
(1)
(2)
Each node is associated with the information about the summary length and summary
score. The authors start with an empty summary (start state) with length 0 and score 0
and follow an outgoing edge to expand it. A new state is created when a new sentence
is added to the summary. The new state’s length is updated with the number of words
of the new sentence. The score of the state is computed under the summarization model
described in the previous section. A goal state is any state or summary where it is not
possible to add another sentence without exceeding the summary length threshold. The
summarization problem is then finding the best scoring path (sum over the sentence
scores on this path) between the start state and a goal state.</p>
        <p>
          Aker et al. use the A* search algorithm [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] to efficiently traverse the search graph
and accurately find the best scoring path. In A* search a best-first strategy is applied
to traverse the graph from a starting state to a goal state. The search requires a scoring
function for each state, here s(yjx) from Equation 1, and a heuristic function that
estimates the additional score to get from a given state to a goal state. The search algorithm
is guaranteed to converge to the optimal solution if the heuristic function is admissible,
that is, if the function used to estimate the cost from the current node to the goal never
overestimates the actual cost. The authors propose different heuristics with different
run-time performances. The reported best performing heuristic is the “final aggregated
heuristic”. We use this heuristic as baseline and for our modification purposes.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Training</title>
        <p>
          In Aker et al. training problem is formulated as one of finding model parameters, ,
such that the predicted output, y^ closely matches the gold standard, r. The quality of the
match is measured using ROUGE [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. In the training the standard machine learning
terminology of loss functions, which measure the degree of error in the prediction,
(y^; r) is adopted. The loss is formulated as 1 R with R as being the ROUGE score.
The training problem is to solve
= arg min (y^; r)
(3)
where y^ and r are taken to range over the corpus of many document-sets and
summaries. The prediction model is trained using the minimum error rate training (MERT)
technique [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. MERT is a first order optimization method using Powell search to find
the parameters which minimize the loss on the training data [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. MERT requires
nbest lists which it uses to approximate the full space of possible outcomes. A* search is
used to construct these n-best lists and MERT to optimize the objective metric such as
ROUGE that is used to measure the summary quality.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Addressing redundancy</title>
      <p>
        To address redundancy within a summary we adopt the framework of Aker et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
described in the previous section in that we re-use their summarization and training of
the prediction model.
3.1
      </p>
      <sec id="sec-3-1">
        <title>A* search with redundancy reduction</title>
        <p>In this section we present our approach to dealing with redundancy within multi-document
summaries, which implement the idea of omitting or jumping over redundant sentences
when selecting summary-worthy sentences from the input documents. When sentences
from the input documents are merged and sorted in a list according to their
summaryworthiness, the generation of a summary starts by first including a top summary-worthy
sentence into the summary, then the next one until a desired summary length is reached.
If a sentence from the list is found to be similar to the ones already included in the
summary (i.e. to be redundant), then this sentence should not be included into the summary,
but rather jumped over. We integrate the idea of jumping over redundant sentences into
the A* search algorithm described by Aker et al. The difference between our
implementation and the one of Aker et al. is the integration of a function jump(y; y) into the
search process. We use this function to jump over a sentence with the index y when it
is redundant with respect to the summary y. Thus compared to Aker et al. we do not
only skip a sentence if it is too long as it is the case in Aker et al., but also when it is
redundant compared to the summary created so far. In our work we replace the jump
conditions of Aker et al. with:
lengthConstraintsOK ^ jump(y; y) == F
(4)
where lengthConstraintsOK represents the situation when the next sentence does not
violate the summary length in Aker et al. and jump(y; y) == F the case where the next
sentence is not redundant and therefore not to be jumped over.</p>
        <p>Jump based on redundancy threshold (JRT): We use the similarity score of a sentence
xi with respect to the summary y and a similarity or redundancy threshold R to decide
whether to jump over the sentence or not. In general we jump over a sentence xi if its
similarity score is above R (see Algorithm in 1). The similarity scores are computed
using the sim(:; :) function shown in Equation 5.</p>
        <p>Algorithm 1 Jump when similarity score is above a threshold R, jump(y; xi)
Require: require a similarity or redundancy threshold R
1: if sim(y; xi) R then
2: return F
3: end if
4: return T
sim(y; xj ) =</p>
        <p>
          n
1 X jngrams(y; l) T ngrams(xj ; l)j
n l=1 jngrams(xj ; l)j
(5)
where ngrams(y; n) is the set of n-grams in summary y and ngrams(xj ; n) in sentence
xj respectively. This method returns 0 if y and xj do not share any n-grams. When
all n-grams of xj are found in the list of n-grams of y the method returns 1. Note that
we use this function to only see how many n-grams of xj are found in y. The other
direction is less important for our purpose. The idea of omitting redundant sentences
if their redundancy score exceeds a threshold has already been introduced in previous
work [
          <xref ref-type="bibr" rid="ref11 ref18 ref19 ref4">4, 11, 18, 19</xref>
          ]. However, in contrast to these studies, in which the redundancy
threshold is set manually, we learn it automatically.
        </p>
        <p>To learn the redundancy threshold R we make use of the entire framework (search
and training) and proceed as shown in Figure 1. In the beginning (the top left of the
figure) we create a random R 2 (0; 1]. In addition to this R we generate two further
values: R + 0:1 1 and R 0:1 &gt; 0. These two additional numbers are used to move
R towards its optimum value. All three Rs are used to generate n best summaries using
A* search. In the A* search we also require a prediction model to score the sentences.
For this we start with an initial prediction model (initial feature weights W ). For each
of the R values (denoted with r in the figure) we then create an n best list using A*
search leading to 3 n summaries. If there are summaries from a previous step we
extend the new n best list with them, so that in training the entire history of n best lists
is provided. For each summary its corresponding R value is known. Next, these n best
summaries are input to MERT to train new weights W 0, i.e. a new prediction model.
After obtaining W 0 we can pick up the summary from the n best summaries created for
each document set MERT has used to come up with W 0. We sum the R values of those
summaries (in total m for m document sets) and divide the sum by m to obtain the new
R0. We replace R with R0 and W with W 0 and repeat the entire process until no new
summaries are added to the n best list, when the process stops. Depending on which R
was used to generate the best summaries (R, R + 0:1 or R 0:1), the optimal value for
R ((R that leads to best summaries under the ROUGE metric)) will choose its direction
either towards &gt; 0 or 1.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental settings</title>
      <p>In this section we describe the data used in the experiments, our summarization system
and the training and testing procedure.
4.1</p>
      <sec id="sec-4-1">
        <title>Data</title>
        <p>
          For training and testing we use the freely available image corpus described in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The
corpus contains 296 images of static located objects (e.g Eiffel Tower, Mont Blanc) each
with a manually assigned place name and object type category (e.g. church, mountain).
For each place name there are up to four model summaries that were extracted manually
from existing image descriptions taken from the VirtualTourist travel community
website. Each summary contains a minimum of 190 and a maximum of 210 words.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Summarization system</title>
        <p>
          To generate summaries for each of the 296 document sets we use an extractive,
querybased multi-document summarization system. It is given three inputs: a query (place
name, e.g. Westminster Abbey), the object type associated with an image (e.g. church)
and a set of web-documents retrieved using the place name as query. The summarizer
uses the following features described in [
          <xref ref-type="bibr" rid="ref1 ref2">2, 1</xref>
          ]:
– sentencePosition: Position of the sentence within its document. The first sentence
in the document gets the score 1 and the last one gets n1 where n is the number of
sentences in the document.
– inFirst5: Binary feature indicating whether the sentence is one of the first 5
sentences of the document.
– isStarter: A sentence gets a binary score if it starts with the query term (e.g.
Westminster Abbey) or with the object type, e.g. The church.
– LMProb: The probability of the sentence under a bi-gram language model. We
trained a separate language model on Wikipedia articles about locations for each
object type, e.g., church, bridge, etc. When we generate a summary about a location
of type church, for instance, then we apply the church language model on the related
input documents.1
– DepSim: Similar to LMProb we trained a separate dependency pattern model using
Wikipedia articles about locations for each object type. As in LMProb we use these
models to score the input sentences. A sentence is scored based on the number of
patterns it contains from the model.
– sentenceCount: Each sentence gets assigned a value of 1. This feature is used to
learn whether summaries with many sentences are better than summaries with few
sentences or vice versa.
– wordCount: Number of words in the summary, to decide whether the model should
favor long summaries or short ones.
1 For our training and testing sets we manually assigned each location to its corresponding object
type.
        </p>
        <p>Norwegian Royalty have been buried in the Royal Mausoleum in the castle. During the 17th and 18th century the castle
fell into decay, and restoration work only started in 1899. The Akershus castle and fortress are located on the eastern
side of the Oslo harbor. The fortress was first used in battle in 1306. The original Akershus Castle is located inside the
fortress. Akershus Fortress (Norwegian: Akershus Festning) is the old castle built to protect Oslo, the capital of Norway.
The fortress was built in 1299, and the meaning of the name is ’the (fortified) house of (the district) Aker’. In the 1600s a
castle (or in norsk, “slott”) was built. In the reign of Christian IV the medieval stronghold was converted into a Renaissance
castle and the fortifications were extended. Guided tours of the fortress in the summer, all year on request. The services are
announced in the newspapers and are open to all. During World War II, several people were executed here by the German
occupiers. The fortress was reconstructed several times to withstand increasing fighting power. The castle is well positioned
overlooking Oslo’s harbour. The fortress was strategically important for Oslo and therefore for Norway as well.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>We use 191 document sets for training and 105 for testing. When training the prediction
model we use ROUGE as a metric to maximize because it is also used for automatic
summary evaluation in DUC2 and TAC.3 In particular, following DUC and TAC we use
ROUGE 2 (R-2) and ROUGE SU4 (R-SU4) for both in training and testing. R-2
computes the number of bi-gram overlaps between the automatic and model summaries.
R-SU4 measures uni-gram overlaps between two text units but also bi-grams composed
of non-contiguous words, with a maximum of four words between the words. The
results of our experiments are shown in Table 1.</p>
      <p>As shown in Table 1 the results achieved with the J RT method where we learn a
redundancy threshold R automatically are better than the ones obtained using the setting
without the idea of jump. The J RT method significantly4 (p &lt; 0:001) outperforms the
method of Aker et al..5</p>
      <p>The values of the learnt redundancy threshold R differ for different ROUGE
metrics: for R2 this is 0:5338 and for RSU4 0:4675. The different R values are expected
given the different properties of R2 and RSU4. Compared to R2 the redundancy
threshold for RSU4 is more strict which reflects the way RSU4 works. As mentioned in
Section 4, RUS4 measure the uni-gram overlap between two text units but also bi-grams
where gaps of up to four words are allowed between the words. This means that RSU4
is able to capture more similarities between sentences than R2, where single word
overlaps are not captured. In R2 gaps within a bi-gram are allowed. For example bi-grams
2 http://duc.nist.gov/
3 http://www.nist.gov/tac/
4 We use a two-tail paired T-test to compute significance test.
5 We have also studied different alternative methods to the J RT one to be used in the jump(:; :)
function such as favoring the following sentence to the current one if it is less redundant than
the current one or combining the redundancy scores with the actual raw scores of the sentences
and jumping only over the current sentence if the combined score is less than the combined
score of the following sentence. However, the results by these alternative methods led only to
moderate improvement over the baseline. For this reason we do not report those results.
AB and A??B are identical in RSU4, but not in R2. Consequently, a stricter redundancy
threshold is required in RSU4 than in R2. This fact illustrates also that there cannot be a
single R for every ROUGE metric and highlights the importance of learning it for each
of the ROUGE metrics separately.</p>
      <p>From the example summary about the query Akershus Castle shown in Table 2 we
can see that the summary does capture a variety of facts about the castle such as when
the castle was built, where it is located, etc. This type of essential information about the
castle occurs only once in the summary. What is repeated in most of the sentences are
referring expressions such as the name of the place (Akershus Castle) or the object type
(the castle or the fortress). Sentences containing referring expressions are more likely
to contain relevant information about the castle in the model summaries than sentences
which do not contain such expressions. The redundancy thresholds are set to allow
some repetition in the summary, which means that MERT learned to allow referring
expressions to be repeated in the summary, so it can maximize the ROUGE metrics.</p>
      <p>
        We also evaluated our summaries using a readability assessment as in DUC and
TAC. DUC and TAC manually assess the quality of automatically generated summaries
by asking human subjects to score each summary using five criteria – grammaticality,
redundancy, clarity, focus and structure. Each criterion is scored on a five point scale
with high scores indicating a better result [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In the evaluation we asked three people to
assess the summaries. Each person was shown 100 summaries (50 from each summary
type selected randomly from the entire test set of 105 places). The summaries were
shown in a random way. The results of the manual evaluation are shown in Table 3.
Table 4 shows percentage values of summaries which achieved scores at levels four or
above.
      </p>
      <p>We see from Table 3 that J RT type summaries perform much better than in the
Aker et al. setting where summaries are generated without redundancy detection. The
percentage values at levels 5 and 4 (see Table 4) show that the J RT summaries have
more clarity (95.9% of the summaries), are more coherent (71.5% of the summaries),
have better focus (87.7% of the summaries) and grammar (79.5% of the summaries)
and contain less redundant information (69.4% of the summaries) than the ones
generated in the wordLimit setting (47.9%, 25%, 39.5%, 30.2% and 12.5%). The substantial
improvement in redundancy from the Aker et al. setting to JRT demonstrated that
incorporating a jump into a summarization system adds to redundancy reduction but also
improves other quality aspects of the summary.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>
        In this paper we proposed and evaluated an automatic method for improving the global
quality of extractive multi-document summaries by means of reducing the redundancy
within summaries. We used the framework proposed by Aker et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as a baseline
because it uses a combined search and training approach to maximize the summary
quality locally and adapted it for global optimization. We demonstrated that our
proposed method, J RT , for redundancy reduction improves the quality of the summary
over the baseline as indicated by the ROUGE metric and manual evaluation. In J RT
we jump over sentences which are more similar than a similarity threshold R learnt
automatically. We have seen that the properties of different ROUGE metrics require
different redundancy thresholds, so that R must be learned for each ROUGE metric
separately. The automatically determined R values appeared to be neither too strict nor
too generous as they allow referring expressions to be redundant in the output summary
but not whole factual assertions. This reflects the fact that in the model summaries the
sentences containing referring expressions are also those which contain the most
relevant information about a query.
      </p>
      <p>In future work we intend to address several issues arising from this work. First, we
intend to incorporate semantic knowledge into computation of the redundancy scores.
Currently, when learning the R value we purely use surface level comparison and
compute the redundancy score between a sentence and a summary using uni and bi-gram
lexical overlaps. By doing this we can only capture the repetition of information units if
they are expressed in the same way. We believe that the results can be further improved
if techniques to detect semantic overlaps are also used. Second, we aim to address the
issue of information flow, which is currently missing in the output summaries. From
the example summary we can see that the summary reads like the bag of sentences. By
integrating flow into the A* search algorithm we hope to improve the readability of the
summaries.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aker</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohn</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaizauskas</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>: Multi-document summarization using A* search and discriminative training</article-title>
          .
          <source>In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <fpage>482</fpage>
          -
          <lpage>491</lpage>
          . Association for Computational Linguistics (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Aker</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaizauskas</surname>
          </string-name>
          , R.:
          <article-title>Generating image descriptions using dependency relational patterns</article-title>
          .
          <source>Proc. of the ACL</source>
          <year>2010</year>
          , Upsala, Sweden (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Aker</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaizauskas</surname>
          </string-name>
          , R.:
          <article-title>Model Summaries for Location-related Images</article-title>
          .
          <source>In: Proc. of the LREC-2010 Conference</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Barzilay</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McKeown</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elhadad</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Information fusion in the context of multidocument summarization</article-title>
          .
          <source>In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics</source>
          . pp.
          <fpage>550</fpage>
          -
          <lpage>557</lpage>
          . Association for Computational Linguistics (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Carbonell, J.,
          <string-name>
            <surname>Goldstein</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The use of mmr, diversity-based reranking for reordering documents and producing summaries</article-title>
          .
          <source>In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval</source>
          . pp.
          <fpage>335</fpage>
          -
          <lpage>336</lpage>
          . ACM (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dang</surname>
          </string-name>
          , H.:
          <article-title>Overview of DUC 2005</article-title>
          . DUC 05 Workshop at HLT/EMNLP (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gillick</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Favre</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A scalable global model for summarization</article-title>
          .
          <source>In: Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing</source>
          . pp.
          <fpage>10</fpage>
          -
          <lpage>18</lpage>
          . Association for Computational Linguistics (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gillick</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedhammer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Favre</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hakkani-Tu¨</surname>
          </string-name>
          r, D.:
          <article-title>A global optimization framework for meeting summarization</article-title>
          .
          <source>In: Acoustics, Speech and Signal Processing</source>
          ,
          <year>2009</year>
          .
          <article-title>ICASSP 2009</article-title>
          . IEEE International Conference on. pp.
          <fpage>4769</fpage>
          -
          <lpage>4772</lpage>
          . IEEE (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Automatic summarizing: factors and directions</article-title>
          .
          <source>Advances in Automatic Text</source>
          Summarization pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.Y.</given-names>
          </string-name>
          :
          <article-title>Rouge: A package for automatic evaluation of summaries</article-title>
          .
          <source>Text Summarization Branches Out: Proc. of the ACL-04</source>
          Workshop pp.
          <fpage>74</fpage>
          -
          <lpage>81</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
          </string-name>
          , E.:
          <article-title>From single to multi-document summarization: A prototype system and its evaluation</article-title>
          .
          <source>In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics</source>
          . pp.
          <fpage>457</fpage>
          -
          <lpage>464</lpage>
          . Association for Computational Linguistics (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bilmes</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Multi-document summarization via budgeted maximization of submodular functions</article-title>
          . In: Human Language Technologies:
          <article-title>The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics</article-title>
          . pp.
          <fpage>912</fpage>
          -
          <lpage>920</lpage>
          . Association for Computational Linguistics (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mani</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maybury</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Advances in automatic text summarization</article-title>
          . the MIT Press (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>A study of global inference algorithms in multi-document summarization</article-title>
          .
          <source>Advances in Information</source>
          Retrieval pp.
          <fpage>557</fpage>
          -
          <lpage>564</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Och</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Minimum error rate training in statistical machine translation</article-title>
          .
          <source>Proc. of the 41st Annual Meeting on Association for Computational Linguistics-Volume</source>
          <volume>1</volume>
          p.
          <volume>167</volume>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Riedhammer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gillick</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Favre</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hakkani-Tu¨</surname>
          </string-name>
          r, D.:
          <article-title>Packing the meeting summarization knapsack</article-title>
          .
          <source>Proc. Interspeech</source>
          , Brisbane, Australia (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Russell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Norvig</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Canny</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malik</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Edwards</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Artificial intelligence: a modern approach</article-title>
          . Prentice hall Englewood Cliffs, NJ (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Saggion</surname>
          </string-name>
          , H.:
          <article-title>A robust and adaptable summarization tool</article-title>
          .
          <source>Traitement Automatique des Langues</source>
          <volume>49</volume>
          (
          <issue>2</issue>
          ) (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Sauper</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barzilay</surname>
          </string-name>
          , R.:
          <article-title>Automatically generating wikipedia articles: A structure-aware approach</article-title>
          .
          <source>In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-</source>
          Volume 1. pp.
          <fpage>208</fpage>
          -
          <lpage>216</lpage>
          . Association for Computational Linguistics (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Yih</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderwende</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzuki</surname>
          </string-name>
          , H.:
          <article-title>Multi-document summarization by maximizing informative content-words</article-title>
          .
          <source>In: Proceedings of IJCAI</source>
          . vol.
          <volume>7</volume>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>