<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Redundancy reduction for multi-document summaries using A* search and discriminative training</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ahmet</forename><surname>Aker</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Sheffield</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Trevor</forename><surname>Cohn</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Sheffield</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Robert</forename><surname>Gaizauskas</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Sheffield</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Redundancy reduction for multi-document summaries using A* search and discriminative training</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">DAB9B55684B7E7E55A704843563058F1</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T09:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper we address the problem of optimizing global multidocument summary quality using A* search and discriminative training. Different search strategies have been investigated to find the globally best summary. In them the search is usually guided by an existing prediction model which can distinguish between good and bad summaries. However, this is problematic because the model is not trained to optimize the summary quality but some other peripheral objective. In this work we tackle the global optimization problem using A* search with the training of prediction model intact and demonstrate our method to reduce redundancy within a summary. We use the framework proposed by Aker et al.</p><p>[1] as a baseline and adapt it to globally improve the summary quality. Our results show significant improvements over the baseline.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Extractive multi-document summarization (MDS) aims to present the most important parts of multiple documents to the user in a condensed form <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b12">13]</ref>. This is achieved by identifying a subset of sentences from the document collection which are concatenated to form the summary. Two common challenges in extractive MDS are: search -finding the best scoring summary from the documents -and training -learning the system parameters to best describe a training set consisting of pairs of documents and reference summaries.</p><p>In previous work the search problem is typically decoupled from the training problem. McDonald <ref type="bibr" target="#b13">[14]</ref>, for example, addresses the search problem by using Integer Linear Programming (ILP). In his ILP problem formulation he adopts the idea of Maximal Marginal Relevance (MMR) <ref type="bibr" target="#b4">[5]</ref> to maximize the amount of relevant information in the summary and at the same time to reduce the redundancy within it. Others have also addressed the search problem using a variation of ILP <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref> but as well as using different approaches such as stack decoding algorithms <ref type="bibr" target="#b19">[20]</ref>, genetic algorithms <ref type="bibr" target="#b15">[16]</ref> and submodular set function optimisation <ref type="bibr" target="#b11">[12]</ref>.</p><p>By separating search from training these approaches assume the existence of a predictive model which can distinguish between good and bad summaries. This is problematic because the model is not trained to optimize the summary quality but some other peripheral objective. The disconnect between the training and prediction settings compromises the predictive performance of the approach.</p><p>An exception is the work of Aker et al. <ref type="bibr" target="#b0">[1]</ref>, which proposes an integrated framework that trains the full prediction model directly with the search algorithm intact.</p><p>Their training algorithm learns parameters such that the best scoring whole summary under the model has a high score under an evaluation metric. However they only optimize the summary quality locally and do not take into account global features such as redundancy within the summary.</p><p>This paper addresses the redundancy problem within the integrated framework proposed by Aker et al. <ref type="bibr" target="#b0">[1]</ref> and thus presents a novel approach to global optimization of summary quality. We present and evaluate our approach for incorporating a redundancy criterion into the framework. Our approach adapts the A* search to global optimization. The core idea of this approach is that redundant sentences are excluded from the summary if their redundancy with respect to the summary created so far exceeds a threshold. In our experiments this threshold is learned automatically from the data instead of being set manually as proposed in previous work.</p><p>The paper is structured as follows. Section 2 presents the work of Aker et al., <ref type="bibr" target="#b0">[1]</ref>, in detail. In Section 3 we describe our modifications to the framework proposed by Aker et al. and our proposed approach to address redundancy in extractive summarization. Section 4 describes our experimental setup to evaluate the proposed approach, and Section 5 the results. Finally, we conclude in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Background</head><p>In this section we first review the work of Aker et al. <ref type="bibr" target="#b0">[1]</ref> in detail, which is essential for the understanding of our modifications to their framework.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Summarization Model</head><p>A summarization model is used to score summaries. Summaries are ranked according to these scores, so that in search, the summary with the highest score can be selected. Aker et al. use the summarization model s to score a summary:</p><formula xml:id="formula_0">s(y|x) = i∈y φ(x i )λ<label>(1)</label></formula><p>where x is the document set, composed of k sentences, y ⊆ {1 . . . k} is the set of indexes selected for the summary, φ(•) is a feature function that returns a set of features values for each candidate summary and λ is the weight vector associated with the set of features. In search we use the summarization model to find the maximum summary ŷ:</p><formula xml:id="formula_1">ŷ = arg max y s(y|x)<label>(2)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Search</head><p>In Aker et al. the creation of a multi-document summary is formulated as a search problem in which the aim is to find a subset of sentences from the entire set to form a summary. The search is also constrained so that the subset of sentences does not exceed the summary length threshold. In search, a search graph is constructed with edges representing the connections between the sentences and states with summaries.</p><p>Each node is associated with the information about the summary length and summary score. The authors start with an empty summary (start state) with length 0 and score 0 and follow an outgoing edge to expand it. A new state is created when a new sentence is added to the summary. The new state's length is updated with the number of words of the new sentence. The score of the state is computed under the summarization model described in the previous section. A goal state is any state or summary where it is not possible to add another sentence without exceeding the summary length threshold. The summarization problem is then finding the best scoring path (sum over the sentence scores on this path) between the start state and a goal state. Aker et al. use the A* search algorithm <ref type="bibr" target="#b16">[17]</ref> to efficiently traverse the search graph and accurately find the best scoring path. In A* search a best-first strategy is applied to traverse the graph from a starting state to a goal state. The search requires a scoring function for each state, here s(y|x) from Equation <ref type="formula" target="#formula_0">1</ref>, and a heuristic function that estimates the additional score to get from a given state to a goal state. The search algorithm is guaranteed to converge to the optimal solution if the heuristic function is admissible, that is, if the function used to estimate the cost from the current node to the goal never overestimates the actual cost. The authors propose different heuristics with different run-time performances. The reported best performing heuristic is the "final aggregated heuristic". We use this heuristic as baseline and for our modification purposes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Training</head><p>In Aker et al. training problem is formulated as one of finding model parameters, λ, such that the predicted output, ŷ closely matches the gold standard, r. The quality of the match is measured using ROUGE <ref type="bibr" target="#b9">[10]</ref>. In the training the standard machine learning terminology of loss functions, which measure the degree of error in the prediction, ∆(ŷ, r) is adopted. The loss is formulated as 1 − R with R as being the ROUGE score.</p><p>The training problem is to solve</p><formula xml:id="formula_2">λ = arg min λ ∆(ŷ, r)<label>(3)</label></formula><p>where ŷ and r are taken to range over the corpus of many document-sets and summaries. The prediction model is trained using the minimum error rate training (MERT) technique <ref type="bibr" target="#b14">[15]</ref>. MERT is a first order optimization method using Powell search to find the parameters which minimize the loss on the training data <ref type="bibr" target="#b14">[15]</ref>. MERT requires nbest lists which it uses to approximate the full space of possible outcomes. A* search is used to construct these n-best lists and MERT to optimize the objective metric such as ROUGE that is used to measure the summary quality.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Addressing redundancy</head><p>To address redundancy within a summary we adopt the framework of Aker et al. <ref type="bibr" target="#b0">[1]</ref> described in the previous section in that we re-use their summarization and training of the prediction model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">A* search with redundancy reduction</head><p>In this section we present our approach to dealing with redundancy within multi-document summaries, which implement the idea of omitting or jumping over redundant sentences when selecting summary-worthy sentences from the input documents. When sentences from the input documents are merged and sorted in a list according to their summaryworthiness, the generation of a summary starts by first including a top summary-worthy sentence into the summary, then the next one until a desired summary length is reached. If a sentence from the list is found to be similar to the ones already included in the summary (i.e. to be redundant), then this sentence should not be included into the summary, but rather jumped over. We integrate the idea of jumping over redundant sentences into the A* search algorithm described by <ref type="bibr">Aker et</ref>  </p><p>where lengthConstraintsOK represents the situation when the next sentence does not violate the summary length in Aker et al. and jump(y, y) == F the case where the next sentence is not redundant and therefore not to be jumped over.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Jump based on redundancy threshold (JRT):</head><p>We use the similarity score of a sentence x i with respect to the summary y and a similarity or redundancy threshold R to decide whether to jump over the sentence or not. In general we jump over a sentence x i if its similarity score is above R (see Algorithm in 1). The similarity scores are computed using the sim(., .) function shown in Equation <ref type="formula" target="#formula_4">5</ref>.</p><p>Algorithm 1 Jump when similarity score is above a threshold R, jump(y, x i )</p><p>Require: require a similarity or redundancy threshold R</p><formula xml:id="formula_4">1: if sim(y, xi) ≤ R then 2: return F 3: end if 4: return T sim(y, x j ) = 1 n n l=1 |ngrams(y, l) ngrams(x j , l)| |ngrams(x j , l)|<label>(5)</label></formula><p>where ngrams(y, n) is the set of n-grams in summary y and ngrams(x j , n) in sentence x j respectively. This method returns 0 if y and x j do not share any n-grams. When all n-grams of x j are found in the list of n-grams of y the method returns 1. Note that we use this function to only see how many n-grams of x j are found in y. The other direction is less important for our purpose. The idea of omitting redundant sentences if their redundancy score exceeds a threshold has already been introduced in previous work <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b18">19]</ref>. However, in contrast to these studies, in which the redundancy threshold is set manually, we learn it automatically. To learn the redundancy threshold R we make use of the entire framework (search and training) and proceed as shown in Figure <ref type="figure" target="#fig_0">1</ref>. In the beginning (the top left of the figure) we create a random R ∈ (0, 1]. In addition to this R we generate two further values: R + 0.1 ≤ 1 and R − 0.1 &gt; 0. These two additional numbers are used to move R towards its optimum value. All three Rs are used to generate n best summaries using A* search. In the A* search we also require a prediction model to score the sentences. For this we start with an initial prediction model (initial feature weights W ). For each of the R values (denoted with r in the figure) we then create an n best list using A* search leading to 3 × n summaries. If there are summaries from a previous step we extend the new n best list with them, so that in training the entire history of n best lists is provided. For each summary its corresponding R value is known. Next, these n best summaries are input to MERT to train new weights W , i.e. a new prediction model. After obtaining W we can pick up the summary from the n best summaries created for each document set MERT has used to come up with W . We sum the R values of those summaries (in total m for m document sets) and divide the sum by m to obtain the new R . We replace R with R and W with W and repeat the entire process until no new summaries are added to the n best list, when the process stops. Depending on which R was used to generate the best summaries (R, R + 0.1 or R − 0.1), the optimal value for R ((R that leads to best summaries under the ROUGE metric)) will choose its direction either towards &gt; 0 or ≤ 1.</p><p>In this section we describe the data used in the experiments, our summarization system and the training and testing procedure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Data</head><p>For training and testing we use the freely available image corpus described in <ref type="bibr" target="#b2">[3]</ref>. The corpus contains 296 images of static located objects (e.g Eiffel Tower, Mont Blanc) each with a manually assigned place name and object type category (e.g. church, mountain). For each place name there are up to four model summaries that were extracted manually from existing image descriptions taken from the VirtualTourist travel community website. Each summary contains a minimum of 190 and a maximum of 210 words.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Summarization system</head><p>To generate summaries for each of the 296 document sets we use an extractive, querybased multi-document summarization system. It is given three inputs: a query (place name, e.g. Westminster Abbey), the object type associated with an image (e.g. church) and a set of web-documents retrieved using the place name as query. The summarizer uses the following features described in <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b0">1]</ref>:</p><p>-sentencePosition: Position of the sentence within its document. The first sentence in the document gets the score 1 and the last one gets<ref type="foot" target="#foot_0">1</ref> n where n is the number of sentences in the document.</p><p>-inFirst5: Binary feature indicating whether the sentence is one of the first 5 sentences of the document. -isStarter: A sentence gets a binary score if it starts with the query term (e.g. Westminster Abbey) or with the object type, e.g. The church. -LMProb: The probability of the sentence under a bi-gram language model. We trained a separate language model on Wikipedia articles about locations for each object type, e.g., church, bridge, etc. When we generate a summary about a location of type church, for instance, then we apply the church language model on the related input documents. 1 -DepSim: Similar to LMProb we trained a separate dependency pattern model using Wikipedia articles about locations for each object type. As in LMProb we use these models to score the input sentences. A sentence is scored based on the number of patterns it contains from the model. -sentenceCount: Each sentence gets assigned a value of 1. This feature is used to learn whether summaries with many sentences are better than summaries with few sentences or vice versa. -wordCount: Number of words in the summary, to decide whether the model should favor long summaries or short ones. The fortress was built in 1299, and the meaning of the name is 'the (fortified) house of (the district) Aker'. In the 1600s a castle (or in norsk, "slott") was built. In the reign of Christian IV the medieval stronghold was converted into a Renaissance castle and the fortifications were extended. Guided tours of the fortress in the summer, all year on request. The services are announced in the newspapers and are open to all. During World War II, several people were executed here by the German occupiers. The fortress was reconstructed several times to withstand increasing fighting power. The castle is well positioned overlooking Oslo's harbour. The fortress was strategically important for Oslo and therefore for Norway as well.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Results</head><p>We use 191 document sets for training and 105 for testing. When training the prediction model we use ROUGE as a metric to maximize because it is also used for automatic summary evaluation in DUC<ref type="foot" target="#foot_1">2</ref> and TAC. <ref type="foot" target="#foot_2">3</ref> In particular, following DUC and TAC we use ROUGE 2 (R-2) and ROUGE SU4 (R-SU4) for both in training and testing. R-2 computes the number of bi-gram overlaps between the automatic and model summaries. R-SU4 measures uni-gram overlaps between two text units but also bi-grams composed of non-contiguous words, with a maximum of four words between the words. The results of our experiments are shown in Table <ref type="table" target="#tab_1">1</ref>.</p><p>As shown in Table <ref type="table" target="#tab_1">1</ref> the results achieved with the JRT method where we learn a redundancy threshold R automatically are better than the ones obtained using the setting without the idea of jump. The JRT method significantly <ref type="foot" target="#foot_3">4</ref> (p &lt; 0.001) outperforms the method of Aker et al.. <ref type="foot" target="#foot_4">5</ref>The values of the learnt redundancy threshold R differ for different ROUGE metrics: for R2 this is 0.5338 and for RSU4 0.4675. The different R values are expected given the different properties of R2 and RSU4. Compared to R2 the redundancy threshold for RSU4 is more strict which reflects the way RSU4 works. As mentioned in Section 4, RUS4 measure the uni-gram overlap between two text units but also bi-grams where gaps of up to four words are allowed between the words. This means that RSU4 is able to capture more similarities between sentences than R2, where single word overlaps are not captured. In R2 gaps within a bi-gram are allowed. For example bi-grams AB and A??B are identical in RSU4, but not in R2. Consequently, a stricter redundancy threshold is required in RSU4 than in R2. This fact illustrates also that there cannot be a single R for every ROUGE metric and highlights the importance of learning it for each of the ROUGE metrics separately.</p><p>From the example summary about the query Akershus Castle shown in Table <ref type="table" target="#tab_2">2</ref> we can see that the summary does capture a variety of facts about the castle such as when the castle was built, where it is located, etc. This type of essential information about the castle occurs only once in the summary. What is repeated in most of the sentences are referring expressions such as the name of the place (Akershus Castle) or the object type (the castle or the fortress). Sentences containing referring expressions are more likely to contain relevant information about the castle in the model summaries than sentences which do not contain such expressions. The redundancy thresholds are set to allow some repetition in the summary, which means that MERT learned to allow referring expressions to be repeated in the summary, so it can maximize the ROUGE metrics.</p><p>We also evaluated our summaries using a readability assessment as in DUC and TAC. DUC and TAC manually assess the quality of automatically generated summaries by asking human subjects to score each summary using five criteria -grammaticality, redundancy, clarity, focus and structure. Each criterion is scored on a five point scale with high scores indicating a better result <ref type="bibr" target="#b5">[6]</ref>. In the evaluation we asked three people to assess the summaries. Each person was shown 100 summaries (50 from each summary type selected randomly from the entire test set of 105 places). The summaries were shown in a random way. The results of the manual evaluation are shown in Table <ref type="table" target="#tab_3">3</ref>. Table <ref type="table" target="#tab_4">4</ref> shows percentage values of summaries which achieved scores at levels four or above.</p><p>We see from Table <ref type="table" target="#tab_3">3</ref> that JRT type summaries perform much better than in the Aker et al. setting where summaries are generated without redundancy detection. The percentage values at levels 5 and 4 (see Table <ref type="table" target="#tab_4">4</ref>) show that the JRT summaries have more clarity (95.9% of the summaries), are more coherent (71.5% of the summaries), have better focus (87.7% of the summaries) and grammar (79.5% of the summaries) and contain less redundant information (69.4% of the summaries) than the ones generated in the wordLimit setting (47.9%, 25%, 39.5%, 30.2% and 12.5%). The substantial improvement in redundancy from the Aker et al. setting to JRT demonstrated that incorporating a jump into a summarization system adds to redundancy reduction but also improves other quality aspects of the summary.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>In this paper we proposed and evaluated an automatic method for improving the global quality of extractive multi-document summaries by means of reducing the redundancy within summaries. We used the framework proposed by Aker et al. <ref type="bibr" target="#b0">[1]</ref> as a baseline because it uses a combined search and training approach to maximize the summary quality locally and adapted it for global optimization. We demonstrated that our proposed method, JRT , for reduction improves the quality of the summary over the baseline as indicated by the ROUGE metric and manual evaluation. In JRT we jump over sentences which are more similar than a similarity threshold R learnt automatically. We have seen that the properties of different ROUGE metrics require different redundancy thresholds, so that R must be learned for each ROUGE metric separately. The automatically determined R values appeared to be neither too strict nor too generous as they allow referring expressions to be redundant in the output summary but not whole factual assertions. This reflects the fact that in the model summaries the sentences containing referring expressions are also those which contain the most relevant information about a query.</p><p>In future work we intend to address several issues arising from this work. First, we intend to incorporate semantic knowledge into computation of the redundancy scores. Currently, when learning the R value we purely use surface level comparison and compute the redundancy score between a sentence and a summary using uni and bi-gram lexical overlaps. By doing this we can only capture the repetition of information units if they are expressed in the same way. We believe that the results can be further improved if techniques to detect semantic overlaps are also used. Second, we aim to address the issue of information flow, which is currently missing in the output summaries. From the example summary we can see that the summary reads like the bag of sentences. By integrating flow into the A* search algorithm we hope to improve the readability of the summaries.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Learning the redundancy threshold R. The learning procedure starts in the box denoted with Start.</figDesc><graphic coords="5,194.29,196.01,226.77,175.02" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>al. The difference between our implementation and the one of Aker et al. is the integration of a function jump(y, y) into the search process. We use this function to jump over a sentence with the index y when it is redundant with respect to the summary y. Thus compared to Aker et al. we do not only skip a sentence if it is too long as it is the case in Aker et al., but also when it is redundant compared to the summary created so far. In our work we replace the jump conditions of Aker et al. with:</figDesc><table /><note>lengthConstraintsOK ∧ jump(y, y) == F</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 .</head><label>1</label><figDesc>ROUGE scores. In each row the results were obtained with the prediction model trained on the metric of that row.</figDesc><table><row><cell cols="2">Recall Aker et al. [1]</cell><cell>JRT</cell></row><row><cell>R2</cell><cell>0.094</cell><cell>0.109*</cell></row><row><cell cols="2">RSU4 0.146</cell><cell>0.167*</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 .</head><label>2</label><figDesc>Example summary about the query Akershus Castle.Norwegian Royalty have been buried in the Royal Mausoleum in the castle. During the 17th and 18th century the castle fell into decay, and restoration work only started in 1899. The Akershus castle and fortress are located on the eastern side of the Oslo harbor. The fortress was first used in battle in 1306. The original Akershus Castle is located inside the fortress. Akershus Fortress (Norwegian: Akershus Festning) is the old castle built to protect Oslo, the capital of Norway.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 .</head><label>3</label><figDesc>Readability evaluation results: Each cell shows the percentage of summaries scoring the ranking score heading the column for each criterion in the row as produced by the summary method indicated by the subcolumn heading -Aker et al. (RW ) and JRT . The numbers indicate the percentage values averaged over the three people.</figDesc><table><row><cell></cell><cell>5</cell><cell>4</cell><cell>3</cell><cell>2</cell><cell>1</cell></row><row><cell>Criterion</cell><cell cols="5">RW JRT RW JRT RW JRT RW JRT RW JRT</cell></row><row><cell>clarity</cell><cell cols="3">6.2 22.4 41.7 73.5 29.2 2.0</cell><cell>20.8 0</cell><cell>2.1 2.0</cell></row><row><cell cols="6">coherence 6.2 28.6 18.8 42.9 33.3 24.5 37.5 4.1 4.2 0</cell></row><row><cell>focus</cell><cell cols="4">6.2 26.5 33.3 61.2 29.2 12.2 29.2 0</cell><cell>2.1 0</cell></row><row><cell>grammar</cell><cell cols="3">4.2 12.2 58.3 67.3 12.5 4.1</cell><cell cols="2">20.8 14.3 4.2 2.0</cell></row><row><cell cols="2">redundancy 4.2 8.2</cell><cell cols="4">8.3 61.2 2.1 12.2 41.7 18.4 43.8 0</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4 .</head><label>4</label><figDesc>Readability evaluation results: Each cell shows the percentage of summaries scoring the ranking score &gt;= 4 for each criterion in the row as produced by the summary method indicated by column heading -Aker et al. (RW ) and JRT . The numbers indicate the percentage values averaged over the three people.</figDesc><table><row><cell>Criterion</cell><cell>RW</cell><cell>JRT</cell></row><row><cell>clarity</cell><cell>47.9</cell><cell>95.9</cell></row><row><cell>coherence</cell><cell>25</cell><cell>71.5</cell></row><row><cell>focus</cell><cell>39.5</cell><cell>87.7</cell></row><row><cell>grammar</cell><cell>30.2</cell><cell>79.5</cell></row><row><cell>redundancy</cell><cell>12.5</cell><cell>69.4</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">For our training and testing sets we manually assigned each location to its corresponding object type.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://duc.nist.gov/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://www.nist.gov/tac/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">We use a two-tail paired T-test to compute significance test.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">We have also studied different alternative methods to the JRT one to be used in the jump(., .) function such as favoring the following sentence to the current one if it is less redundant than the current one or combining the redundancy scores with the actual raw scores of the sentences and jumping only over the current sentence if the combined score is less than the combined score of the following sentence. However, the results by these alternative methods led only to moderate improvement over the baseline. For this reason we do not report those results.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Multi-document summarization using A* search and discriminative training</title>
		<author>
			<persName><forename type="first">A</forename><surname>Aker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Cohn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gaizauskas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2010 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="482" to="491" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Generating image descriptions using dependency relational patterns</title>
		<author>
			<persName><forename type="first">A</forename><surname>Aker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gaizauskas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the ACL 2010</title>
				<meeting>of the ACL 2010<address><addrLine>Upsala, Sweden</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Model Summaries for Location-related Images</title>
		<author>
			<persName><forename type="first">A</forename><surname>Aker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gaizauskas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the LREC-2010 Conference</title>
				<meeting>of the LREC-2010 Conference</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Information fusion in the context of multidocument summarization</title>
		<author>
			<persName><forename type="first">R</forename><surname>Barzilay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mckeown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Elhadad</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics</title>
				<meeting>the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="550" to="557" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The use of mmr, diversity-based reranking for reordering documents and producing summaries</title>
		<author>
			<persName><forename type="first">J</forename><surname>Carbonell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Goldstein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 21st annual international ACM SI-GIR conference on Research and development in information retrieval</title>
				<meeting>the 21st annual international ACM SI-GIR conference on Research and development in information retrieval</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="335" to="336" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Dang</surname></persName>
		</author>
		<title level="m">DUC 05 Workshop at HLT/EMNLP</title>
				<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
	<note>Overview of DUC 2005</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A scalable global model for summarization</title>
		<author>
			<persName><forename type="first">D</forename><surname>Gillick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Favre</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing</title>
				<meeting>the Workshop on Integer Linear Programming for Natural Langauge Processing</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="10" to="18" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A global optimization framework for meeting summarization</title>
		<author>
			<persName><forename type="first">D</forename><surname>Gillick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Riedhammer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Favre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hakkani-Tür</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Acoustics, Speech and Signal Processing</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2009">2009. 2009</date>
			<biblScope unit="page" from="4769" to="4772" />
		</imprint>
	</monogr>
	<note>IEEE International Conference on</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Automatic summarizing: factors and directions</title>
		<author>
			<persName><forename type="first">K</forename><surname>Jones</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Automatic Text Summarization</title>
				<imprint>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="1" to="12" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Rouge: A package for automatic evaluation of summaries</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">Y</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the ACL-04 Workshop</title>
				<meeting>of the ACL-04 Workshop</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="74" to="81" />
		</imprint>
	</monogr>
	<note>Text Summarization Branches Out</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">From single to multi-document summarization: A prototype system and its evaluation</title>
		<author>
			<persName><forename type="first">C</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hovy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th Annual Meeting on Association for Computational Linguistics</title>
				<meeting>the 40th Annual Meeting on Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="457" to="464" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Multi-document summarization via budgeted maximization of submodular functions</title>
		<author>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bilmes</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics</title>
				<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="912" to="920" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Advances in automatic text summarization</title>
		<author>
			<persName><forename type="first">I</forename><surname>Mani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Maybury</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1999">1999</date>
			<publisher>the MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">A study of global inference algorithms in multi-document summarization</title>
		<author>
			<persName><forename type="first">R</forename><surname>Mcdonald</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval</title>
				<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="557" to="564" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Minimum error rate training in statistical machine translation</title>
		<author>
			<persName><forename type="first">F</forename><surname>Och</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 41st Annual Meeting on Association for Computational Linguistics</title>
				<meeting>of the 41st Annual Meeting on Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">167</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Packing the meeting summarization knapsack</title>
		<author>
			<persName><forename type="first">K</forename><surname>Riedhammer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gillick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Favre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hakkani-Tür</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Interspeech</title>
				<meeting>Interspeech<address><addrLine>Brisbane, Australia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Russell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Norvig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Canny</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Malik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Edwards</surname></persName>
		</author>
		<title level="m">Artificial intelligence: a modern approach</title>
				<meeting><address><addrLine>Englewood Cliffs, NJ</addrLine></address></meeting>
		<imprint>
			<publisher>Prentice hall</publisher>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A robust and adaptable summarization tool</title>
		<author>
			<persName><forename type="first">H</forename><surname>Saggion</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Traitement Automatique des Langues</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Automatically generating wikipedia articles: A structure-aware approach</title>
		<author>
			<persName><forename type="first">C</forename><surname>Sauper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Barzilay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP</title>
				<meeting>the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="208" to="216" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Multi-document summarization by maximizing informative content-words</title>
		<author>
			<persName><forename type="first">W</forename><surname>Yih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Goodman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Vanderwende</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Suzuki</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of IJCAI</title>
				<meeting>IJCAI</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">7</biblScope>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
