<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Estimation of the Factual Correctness of Summaries of a Ukrainian-language Silver Standard Corpus</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Oleksandr</forename><surname>Bauzha</surname></persName>
							<email>asbauzha@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Taras Shevchenko National University of Kyiv</orgName>
								<address>
									<addrLine>Volodymyrska Street 64/13</addrLine>
									<postCode>01601</postCode>
									<settlement>Kyiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Artem</forename><surname>Kramov</surname></persName>
							<email>artemkramov@gmail.com</email>
							<affiliation key="aff1">
								<orgName type="institution">Seraf AI LLC</orgName>
								<address>
									<postBox>PO Box 3978</postBox>
									<postCode>60532</postCode>
									<settlement>Lisle</settlement>
									<region>Illinois</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Oleksandr</forename><surname>Yavorskyi</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Seraf AI LLC</orgName>
								<address>
									<postBox>PO Box 3978</postBox>
									<postCode>60532</postCode>
									<settlement>Lisle</settlement>
									<region>Illinois</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">Information Technology and Implementation (IT&amp;I-2023)</orgName>
								<address>
									<addrLine>November 20-21</addrLine>
									<postCode>2023</postCode>
									<settlement>Kyiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Estimation of the Factual Correctness of Summaries of a Ukrainian-language Silver Standard Corpus</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">72E0C15954717F5182DC2E52FE706B32</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T20:01+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Natural language processing</term>
					<term>factual correctness</term>
					<term>abstractive summarization</term>
					<term>low-resource languages</term>
					<term>multilingual models</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, different metrics for estimating the factual correctness of summaries of a Ukrainian-language silver standard summarization corpus have been analyzed. The different state-of-the-art methods of detecting the factually inconsistent document-summary pairs have been considered first; moreover, the types of errors in current summarization datasets have been analyzed too. It has been shown that suggested metrics can be used for the discrimination of correct/incorrect document-summary pairs that may be useful for the automatic generation of a summarization corpus. The results obtained for the ground-truth samples may indicate the availability of many erroneous summaries: more than 50% of the test subset can contain factually inconsistent samples. Further analysis of the factual correctness of model-generated summaries showed better factual consistency between documents and summaries than the ground-truth summaries. However, due to the availability of noisy ground-truth samples, the generated summaries can still contain hallucinated information; applying the suggested metrics may allow filtering out erroneous samples, which should also increase the summarization model's performance.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Abstractive text summarization falls into the category of sequence-to-sequence natural language processing (NLP) tasks. The development of the self-supervised methods of the training of language models on large corpora <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref> with further fine-tuning of the corresponding model on the summarization dataset allows for achieving remarkable success in the domains of the abstractive summarization of articles <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref> and dialogues <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>. However, the aforementioned advances in the abstractive text summarization task are mostly connected with the analysis of high-resource languages (English, Chinese, etc.). Unfortunately, the research on the abstractive summarization of Ukrainian documents is still in the initial stage. Similarly to other NLP issues that are presented for the lowresource languages, the lack of human-written datasets remains a key problem for the investigation of the summarization of Ukrainian corpora <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>: while the summarization models themselves can potentially be created by the projection of the corresponding English models into the Ukrainianlanguage space (e.g., the Ukrainian GPT-2 model has been recently created according to the paper <ref type="bibr" target="#b8">[9]</ref>), the verification of the quality of the summaries that are generated by the produced models remains a challenging task. One of the possible solutions for the generation of a summarization dataset consists in the web-scrapping of news portals <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11]</ref>. Namely, the well-known XSum dataset <ref type="bibr" target="#b11">[12]</ref> was created by treating the headline of a news article as the corresponding summary. However, such an approach cannot be reliable as far as the headlines can contain extra information that is not presented in the article for the attention attraction of a reader.</p><p>To overcome this problem, the authors of the paper <ref type="bibr" target="#b12">[13]</ref> suggested extracting the summary of the article from the short description of the article of a BBC news portal resulting in a multilingual silver standard XL-Sum summarization dataset. The statistics of the Ukrainian-language subset (articlesummary pairs) in the XL-Sum dataset are presented in Table <ref type="table">1</ref>. Moreover, the corresponding Ukrainian-language summarization model was trained as well.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Number of samples for the Ukrainian-language part of the XL-Sum dataset according to the performed train-dev-test split The aforementioned automatic generation of the document-summary pairs requires the answer to the following question: how to verify the quality of the collected summaries automatically? While the coherency and fluency of summaries should be preserved (texts were written by editors), the factual consistency between the document and the summary should be estimated. The authors of the XL-Sum dataset conducted the human evaluation of the summaries of 10 languages from a small subset (around 250 article-summary pairs). According to the results <ref type="bibr" target="#b12">[13]</ref>, up to 42% of the selected summaries contained extra information. The availability of such factual errors complicates the usage of the dataset for the verification of the quality of any summarization model; moreover, the training of the model on such samples can lead to the generation of hallucinated summaries by the last one. Thus, the detection of factual errors in summaries is a relevant problem for the analysis of the automatically generated dataset and the estimation of the performance of the summarization model.</p><p>In this paper, the factual consistency metric for a Ukrainian-language document-summary pair is suggested. Namely, different cross-lingual approaches that can be applied to a wide range of languages are considered with the following analysis of their effectiveness. Moreover, the factual correctness of the Ukrainian-language summaries of the XL-Sum dataset is considered due to the retrieved metrics. In addition, the performance of the already trained Ukrainian summarization model in terms of the factual consistency of generated summaries is analyzed as well.</p><p>Before the creation of the metric for the estimation of the factual consistency of a documentsummary pair, it was decided to consider existing approaches and current issues within this subject area. The next section is devoted to the analysis of the different state-of-the-art methods of the detection and correction of factual mistakes in a summary given an input document.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related work</head><p>One of the key concepts in the factual consistency analysis consists in the generation of the corresponding dataset that defines the types of factual errors that are presented in an erroneous summary. According to the paper <ref type="bibr" target="#b13">[14]</ref>, two approaches for dataset generation are mostly used: entitycentric approach (Ent-C) and generation-centric approach (Gen-C).</p><p>The Ent-C approach implies the transformation of a ground-truth summary into an erroneous summary by applying different modification operations on its entities and noun phrases: entity swap, pronoun swap, negation, etc. The corresponding dataset (K2019) was first presented in the paper <ref type="bibr" target="#b14">[15]</ref> and was later used as a baseline for other methods. The ground-truth samples were taken from the CNN/DM dataset. The authors of the dataset also presented a FactCC method for detecting factual errors in a summary. The main idea consists in the fine-tuning of the uncased BERT model <ref type="bibr" target="#b15">[16]</ref> with the further binary classification of a document-summary pair (consistent/inconsistent) on the training dataset. As was shown, the FastCC method outperformed the MNLI-based approach <ref type="bibr" target="#b16">[17]</ref> that consisted in the interpretation of an entailment measure between a document and a summary as a factual consistency metric. In the paper <ref type="bibr" target="#b17">[18]</ref>, it was suggested to fine-tune the sequence-to-sequence BART model <ref type="bibr" target="#b3">[4]</ref> to generate the corrected version of a summary. Namely, a document and an inconsistent summary were concatenated and passed to the input of an encoder; the entire model was trained to generate the corrected consistent summary. The authors of the paper <ref type="bibr" target="#b18">[19]</ref> proposed to mask each entity of a summary with the further usage of the BERT model (BertForQuestionAnswering architecture) for the prediction of answer spans in a source document. In contrast to this paper, the method QAGS <ref type="bibr" target="#b19">[20]</ref> consists in generating questions to the entities of a summary automatically; then the question-answering model is used to find answers in both a source document and a summary to verify their match.</p><p>Unlike the Ent-C approach, the Gen-C approach <ref type="bibr" target="#b20">[21]</ref> consists in the transformation of a groundtruth summary by applying the paraphrasing model. The following assumption is made: the bottomplaced candidates of the beam search (e.g., the 10th best paraphrase) potentially contain error facts. In contrast to the Ent-C-related methods, the authors <ref type="bibr" target="#b20">[21]</ref> considered the factual consistency problem at the level of dependency arcs retrieved from a syntactic parser: the dependency arc (fact) is entailed by a source document if a semantic relation between the corresponding head and the child word is also entailed by the document. Elaborating on this assumption, the Dependency Arc Entailment (DAE) model was designed and trained to estimate the entitlement of dependency arcs by a source document. In order to extend the consideration of the dependency arcs as a representation of facts in a more general way, the FactGraph method <ref type="bibr" target="#b21">[22]</ref> was recently proposed. The main idea of the FactGraph method consists in decomposing the document and the summary into structured meaning representations. Such meaning representations define semantic concepts and their relations by generating a semantic graph for both a document and a summary. Following the idea of the entailment of dependency arcs, the factual consistency was calculated based on the probability of establishing edges between the semantic concepts of a summary.</p><p>As mentioned in the papers <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b17">18]</ref>, the NLI-based models showed worse results than their counterparts. However, in the paper <ref type="bibr" target="#b22">[23]</ref>, the usage of the NLI models was reconsidered by presenting a SummaC method. Namely, while the previous attempts were focused on estimating the entailment of a document and a summary entirely, the SummaC method is based on the consideration of their factual consistency at the level of sentences. The SummaC method outperformed FastCC, DAE, and QA-based methods, thus, confirming the ability of the usage of the NLI models for the estimation of the factual correctness of summaries. In parallel with our work, the factual consistency evaluation method for multilingual corpora based on the usage of the NLI model was recently suggested <ref type="bibr" target="#b23">[24]</ref>. The NLI model was created by fine-tuning the mT5-XXL model <ref type="bibr" target="#b24">[25]</ref> for the binary classification of a documentsummary pair: the input data are represented as the concatenation of a document and a summary; the output binary value indicates whether the given pair is consistent or not. This classification model was later used for filtering inconsistent samples in the XL-Sum dataset and re-training models. Such an approach allowed for better results in ROUGE scores and human scores (the Ukrainian language was not considered during those experiments). However, according to the conclusion of annotators, only 52% of retrieved summaries (or even more for some languages) were factually consistent with documents. Moreover, the estimation of the entailment of a document-summary pair entirely can contradict recent results shown by the sentence-level SummaC method <ref type="bibr" target="#b22">[23]</ref>. We assume that the consideration of the entailment of a document and a summary at the level of sentences may be crucial for the XL-Sum dataset: collected summaries can potentially contain additional information (references, full names, positions, etc.) that may be revealed by increasing the granularity of the analysis of the document parts.</p><p>Finally, before applying the aforementioned methods or creating a new one, the following question should be answered: which types of factual errors are most expected in the XL-Sum dataset? In order to get insights, the corresponding statistics for the XSum dataset <ref type="bibr" target="#b11">[12]</ref> that was also generated automatically can be considered. In the paper <ref type="bibr" target="#b13">[14]</ref>, the authors conducted an error analysis of the summaries of the XSum. Namely, the errors were classified into four main categories:  Entity-related (conflating two different entities, hallucinated entities).  Event-related (incorrect event description, agents, new event).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head></head><p>Noun phrase related (incorrect NP or NP modifiers, new NP, etc.).  Others (grammar, noise). In addition, each category was divided into 2 subcategories: extrinsic (hallucination) and intrinsic (incorrect data interpretation) errors. According to the results <ref type="bibr" target="#b13">[14]</ref>, most of the errors are actually connected with the appearance of extrinsic errors of all categories. The ratio of intrinsic entity-related errors which are typical for the aforementioned K2019 dataset is relatively small. Thus, it was decided to rely on NLI-based approaches that can be useful for detecting relevant types of errors. The next section describes the corresponding selected methods and metrics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Factual consistency estimation metrics</head><p>According to the previous section, the usage of the NLI-based metrics seems useful for analyzing the different types of errors. Taking into account the findings of the SummaC zero-shot method <ref type="bibr" target="#b22">[23]</ref>, it was decided to process document-summary pairs at the level of sentences. Namely, given a pair of a document and a summary (doc,summary), let us represent both of them ( D and S correspondingly) as a list of sentences: </p><formula xml:id="formula_0">     <label>12</label></formula><p>In other words, the retrieved vector EntRed contains information about the best consistency score for each summary sentence. Then an output factual consistency score</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Ent</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>FactCons</head><p>is calculated as the mean value of the vector EntRed :</p><formula xml:id="formula_2"> mean( ) Ent FactCons EntRed (<label>6</label></formula><formula xml:id="formula_3">)</formula><p>The aggregation of the consistency scores for summary sentences as an average value allows reducing the</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Ent</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>FactCons</head><p>in cases when some summary sentences are not consistent with any of the document sentences. Taking into account the potential big ratio of hallucinated summaries in the XL-Sum dataset, such an approach may help to reveal erroneous samples. Figure <ref type="figure" target="#fig_0">1</ref> demonstrates an example of the detection of a factually inconsistent hallucinated sentence.</p><p>The summary sentence (s2) which describes the source of information in a news article is not consistent with any of the document sentences; thus, its maximum consistency value is low. The availability of such consistency outlier decreases the final factual consistency score Ent FactCons . </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental part 4.1. Inconsistent summaries discrimination</head><p>Before the calculation and analysis of the values of metrics for the Ukrainian part of the XL-Sum dataset, it was decided to verify the ability of the different methods to discriminate between factually consistent and inconsistent summaries. This inconsistent summaries discrimination task consists in the following: given two document-summary pairs with a common document where one pair contains a correct summary, and another one contains an incorrect one, it is necessary to predict which pair contains a factually consistent summary. The accuracy is calculated as the ratio of correctly processed pairs to a general number of them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.1.">Dataset</head><p>The test part of the Ukrainian-language XL-Sum dataset was analyzed. In order to generate a factually inconsistent sample for each document, the following rules were applied:</p><p> an inconsistent summary should belong to another document;  the ROUGE-1 F1 measure between the document and inconsistent summary should be higher than the corresponding value between the document and the consistent summary.</p><p>The aforementioned rules allowed picking inconsistent summaries that can relate to the same topic as a document, but contain other information to make the discrimination task more challenging. Half of the test dataset was analyzed resulting in 1619 data points. The statistics of the dataset are available in Table <ref type="table" target="#tab_2">2</ref>. The Stanza package <ref type="bibr" target="#b25">[26]</ref> was used for the tokenization; the stemming process was performed with the usage of the Ukrainian Stemmer library <ref type="bibr" target="#b26">[27]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.2.">Metrics configurations</head><p>According to the previous section, it was suggested to use the NLI-based metric (SummaC). SummaC metric (</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Emb</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SummaC</head><p>) was calculated with the usage of sentence embedding models that are mentioned below:</p><p> paraphrase-multilingual-mpnet-base-v2 <ref type="bibr" target="#b27">[28]</ref> -multilingual sentence embedding model based on the MPNet <ref type="bibr" target="#b28">[29]</ref> model;  distiluse-base-multilingual-cased-v2 <ref type="bibr" target="#b27">[28]</ref> -multilingual knowledge distilled version of multilingual Universal Sentence Encoder <ref type="bibr" target="#b29">[30]</ref>.</p><p>SummaC metric ( Ent SummaC ) was implemented based on the usage of the NLI model xlm- roberta-large-xnli -XLM-RoBERTa model <ref type="bibr" target="#b30">[31]</ref> fine-tuned on the multilingual XNLI dataset <ref type="bibr" target="#b31">[32]</ref>.</p><p>All pre-trained models were taken from the Huggingface repository <ref type="bibr" target="#b32">[33]</ref>. It was decided to use the chosen multilingual models for the SummaC-based metric as far as they were pre-trained on Ukrainian parallel data as well. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.3.">Results</head><p>Table <ref type="table">3</ref> shows the results of solving the inconsistent summaries discrimination task using different metrics. Except for the accuracy of the discrimination of incorrect/correct samples, the Pearson correlation coefficient (PCC) between metrics and the ROUGE-1 score is also provided. As can be seen, the Emb SummaC metric showed the best accuracy results. The usage of the Emb SummaC metric based on the model paraphrase-multilingual-mpnet-base-v2 (the best option due to accuracy results) may be useful especially for the automatic construction of a summarization dataset when it is necessary to map a document with a potential summary. For instance, the BookSum <ref type="bibr" target="#b33">[34]</ref> summarization dataset (namely, its chapter-level subset) was constructed by the mapping of the chapter of a book with sentences of a summary that relates to an entire book; we assume that the analyzed metrics can be used for the construction of a similar Ukrainian or even multilingual dataset as well.</p><p>Let us consider the Pearson correlation coefficient values between metrics and ROUGE-1 scores. As far as a higher ROUGE-1 score should imply the lower value of a metric (incorrect summaries have higher ROUGE-scores than correct ones), the PPC value should be low. As can be seen, the lowest (and a negative) PPC value was retrieved for the Ent SummaC metric indicating the possibility of the usage of the metric for the detection of the factually inconsistent summaries by setting up some threshold value. Thus, this metric was later used to analyze the Ukrainian-language part of the XL-Sum dataset and the summarization itself.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3</head><p>Results of solving the inconsistent summaries discrimination task using different metrics: accuracy of the discrimination of correct/incorrect summaries and the Pearson correlation coefficient (PCC) between the metrics and the ROUGE-1 score of samples </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">XL-Sum dataset analysis</head><p>Firstly, let us analyze the Ukrainian test part of the dataset. The value of the Ent SummaC metric across the dataset was calculated. The density of the distribution of the retrieved metric value is shown in Figure <ref type="figure" target="#fig_1">2</ref>. As can be seen, the distribution is skewed, and the 50th percentile equals 0.845. Thus, referring to the paper <ref type="bibr" target="#b23">[24]</ref> where the threshold value 0.5 for the NLI model allowed filtering almost a half of incorrect samples (but approximately 50% of left summaries were judged by human evaluation as factually inconsistent), it can be concluded that a higher threshold value for the</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Ent</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SummaC</head><p>has to be taken as well. Indeed, the probability mass peak that starts from the 70th percentile value can potentially indicate the threshold for filtering incorrect summaries; however, this hypothesis should be later verified by an appropriate human evaluation. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Summarization model analysis</head><p>As the test dataset may contain many erroneous samples, it is hard to rely on the estimated ROUGE metrics. Thus, it was decided to calculate the Ent SummaC metric for the summaries generated by the summarization model on the test dataset. The summaries were picked from the set provided by the authors <ref type="bibr" target="#b12">[13]</ref>. Figure <ref type="figure" target="#fig_2">3</ref> shows the retrieved distribution. As can be seen, the distribution of Ent SummaC scores is skewed too.</p><p>In order to compare the results between ground-truth and model-generated summaries, it was decided to take a median value as an average score, and the interquartile range (IQR) value for the measurement of the deviation of the metric. Table <ref type="table" target="#tab_4">4</ref> demonstrates the retrieved results. The median value of the metric for the model-predicted summaries is higher; moreover, its IQR value is lower. Thus, the summaries that were generated by the model are considered to be even more factually correct than the ground-truth summaries.</p><p>As can be seen from Figure <ref type="figure" target="#fig_2">3</ref>, there are some document-summary pairs whose Ent SummaC value is close to zero. Moreover, as can be expected from the noisy hallucinated dataset, the summarization model learned some pattern relations available in the dataset (e.g., the positions of persons) that led to the generation of hallucinated content (see Figure <ref type="figure" target="#fig_3">4</ref> and Figure <ref type="figure" target="#fig_4">5</ref> for such examples that were revealed by the low values of the metric). The removal of such dataset samples by the suggested metric can allow for avoiding such a situation and provide a more robust summarization model in terms of its ability to generalize the knowledge of a source document.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>In this paper, several metrics for estimating the factual consistency of documents and summaries were analyzed for processing the Ukrainian-language part of the XL-Sum corpus. Moreover, the experimental verification of the effectiveness of the chosen SummaC metric was performed on the Ukrainian-language part of the XL-Sum corpus using different configurations and models. According to the results obtained from the evaluation of the discrimination of factually correct/incorrect document-summary pairs, the best accuracy was achieved with the usage of the multilingual sentence embedding model. Such a result may indicate the advisability of the utilizing of the aforementioned model for related tasks as the automatic construction of document-summary pairs for the generation of a silver standard Ukrainian summarization corpus. Moreover, the configuration of the metric SummaC with an NLI model showed the lowest expected correlation with a ROUGE score that can underline the possibility of the usage of this model for further detailed analysis of factual mistakes. The analysis of the values of the chosen NLI-based metric for the ground-truth samples of the XL-Sum dataset may indicate the availability of at least 50% of erroneous summaries that match the results of the previous research. Moreover, the retrieved distribution of metric values may indicate the presence of even more than 70% of error samples; however, the search for an appropriate threshold value for the considered metric still requires the usage of a more general human evaluation.</p><p>Finally, it was shown that the metrics retrieved from evaluating the factual consistency of modelgenerated summaries are higher than those of ground-truth summaries. Nevertheless, the availability of generated summaries with an almost zero metric score may indicate the big impact of the hallucinated dataset on the trained model. Further filtering of erroneous samples from the dataset using the considered metrics may allow learning the model to generate more factually consistent summaries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">References</head></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Detection of a factually inconsistent summary sentence. The edge values indicate the maximum consistency score for each summary sentence. As far as the summary sentence (s2) is not entailed with any of the document sentences, its consistency score is lower.</figDesc><graphic coords="5,147.32,84.65,300.29,211.45" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Density of the distribution of the</figDesc><graphic coords="7,118.88,115.95,357.25,245.42" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Density of the distribution of the</figDesc><graphic coords="8,119.75,103.30,355.05,234.72" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: A summary is inconsistent with a document in terms of events: an entire summary statement contradicts the facts from a document (both are highlighted in orange color)</figDesc><graphic coords="8,90.50,479.15,413.85,144.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: A summary contains two types of errors: hallucinated entity (person name and his position) that is marked in a blue color, and the contradiction of facts (the document states that the person suggests participating in a negation process, but the summary states an opposite fact)</figDesc><graphic coords="9,88.90,157.90,417.20,204.68" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Statistics of the generated dataset for the inconsistent summaries discrimination task: a number of samples, an average number of sentences per a document, an average number of sentences per summary</figDesc><table><row><cell>Samples number</cell><cell>Doc sentences</cell><cell>Summary sentences</cell></row><row><cell>1619</cell><cell>24.40 17.92 </cell><cell>1.43 0.65 </cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4</head><label>4</label><figDesc>Statistics of the</figDesc><table><row><cell>Summaries</cell><cell>Median</cell><cell>IQR</cell></row><row><cell>Ground-truth</cell><cell>0.848</cell><cell>0.248</cell></row><row><cell>Model-predicted</cell><cell>0.958</cell><cell>0.186</cell></row></table><note>EntSummaCmetric for ground-truth and model-predicted summaries</note></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Roberta: A robustly optimized bert pretraining approach</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1907.11692</idno>
		<ptr target="http://arxiv.org/abs/1907.11692.doi:10.48550/arXiv.1907.11692" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">SpanBERT: Improving pre-training by representing and predicting spans</title>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Weld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<idno type="DOI">10.1162/tacl_a_00300</idno>
		<ptr target="https://aclanthology.org/2020.tacl-1.5.doi:10.1162/tacl_a_00300" />
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="64" to="77" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Saleh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Liu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1912.08777</idno>
		<title level="m">Pegasus: Pre-training with extracted gap-sentences for abstractive summarization</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension</title>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ghazvininejad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mohamed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.703</idno>
		<ptr target="https://aclanthology.org/2020.acl-main.703.doi:10.18653/v1/2020.acl-main.703" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="7871" to="7880" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Dialogue discourse-aware graph model and data augmentation for meeting summarization</title>
		<author>
			<persName><forename type="first">X</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Geng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Joint Conference on Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Controllable abstractive dialogue summarization with sketch supervision</title>
		<author>
			<persName><forename type="first">C.-S</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Stenetorp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.findings-acl.454</idno>
		<ptr target="https://aclanthology.org/2021.findings-acl.454.doi:10.18653/v1/2021.findings-acl.454" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="5108" to="5122" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Evaluation of the coherence of Ukrainian texts using a transformer architecture</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kramov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pogorilyy</surname></persName>
		</author>
		<idno type="DOI">10.1109/ATIT50783.2020.9349355</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE 2nd International Conference on Advanced Trends in Information Theory (ATIT)</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page" from="296" to="301" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Coreference resolution method using a convolutional neural network</title>
		<author>
			<persName><forename type="first">S</forename><surname>Pogorilyy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kramov</surname></persName>
		</author>
		<idno type="DOI">10.1109/ATIT49449.2019.9030596</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Advanced Trends in Information Theory (ATIT)</title>
				<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="397" to="401" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models</title>
		<author>
			<persName><forename type="first">B</forename><surname>Minixhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Paischer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Rekabsaz</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.naacl-main.293</idno>
		<ptr target="https://aclanthology.org/2022.naacl-main.293.doi:10.18653/v1/2022.naacl-main.293" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</title>
				<meeting>the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics<address><addrLine>Seattle, United States</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="3992" to="4006" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Automated extraction of structured information from a variety of web pages</title>
		<author>
			<persName><forename type="first">S</forename><surname>Pogorilyy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kramov</surname></persName>
		</author>
		<idno type="DOI">10.15407/pp2018.02.149</idno>
		<ptr target="https://pp.isofts.kiev.ua/ojs1/article/view/277.doi:10.15407/pp2018.02.149" />
	</analytic>
	<monogr>
		<title level="j">PROBLEMS IN PROGRAMMING</title>
		<imprint>
			<biblScope unit="page" from="149" to="158" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">About the issue of algorithms formalized design for parallel computer architectures</title>
		<author>
			<persName><forename type="first">A</forename><surname>Anisimov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pogorilyy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Vitel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied and computational mathematics</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="140" to="151" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Don&apos;t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization</title>
		<author>
			<persName><forename type="first">S</forename><surname>Narayan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">B</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lapata</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D18-1206</idno>
		<ptr target="https://aclanthology.org/D18-1206.doi:10.18653/v1/D18-1206" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<meeting>the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1797" to="1807" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">XL-sum: Large-scale multilingual abstractive summarization for 44 languages</title>
		<author>
			<persName><forename type="first">T</forename><surname>Hasan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bhattacharjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Islam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mubasshir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-F</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-B</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Rahman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Shahriyar</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.findings-acl.413</idno>
		<ptr target="https://aclanthology.org/2021.findings-acl.413.doi:10.18653/v1/2021.findings-acl.413" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="4693" to="4703" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Annotating and modeling fine-grained factuality in summarization</title>
		<author>
			<persName><forename type="first">T</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Durrett</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.naacl-main.114</idno>
		<ptr target="https://aclanthology.org/2021.naacl-main.114.doi:10.18653/v1/2021.naacl-main.114" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</title>
				<meeting>the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="1449" to="1462" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Evaluating the factual consistency of abstractive text summarization</title>
		<author>
			<persName><forename type="first">W</forename><surname>Kryscinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mccann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-main.750</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-main.750.doi:10.18653/v1/2020.emnlp-main.750" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="9332" to="9346" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
		<ptr target="https://aclanthology.org/N19-1423.doi:10.18653/v1/N19-1423" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A broad-coverage challenge corpus for sentence understanding through inference</title>
		<author>
			<persName><forename type="first">A</forename><surname>Williams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nangia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bowman</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N18-1101</idno>
		<ptr target="https://aclanthology.org/N18-1101.doi:10.18653/v1/N18-1101" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long Papers</title>
		<meeting>the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>New Orleans, Louisiana</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1112" to="1122" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Factual error correction for abstractive summarization models</title>
		<author>
			<persName><forename type="first">M</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C K</forename><surname>Cheung</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-main.506</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-main.506.doi:10.18653/v1/2020.emnlp-main.506" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="6251" to="6258" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Multi-fact correction in abstractive text summarization</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Gan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C K</forename><surname>Cheung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-main.749</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-main.749.doi:10.18653/v1/2020.emnlp-main.749" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="9320" to="9331" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Asking and answering questions to evaluate the factual consistency of summaries</title>
		<author>
			<persName><forename type="first">A</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.450</idno>
		<ptr target="https://aclanthology.org/2020.acl-main.450.doi:10.18653/v1/2020.acl-main.450" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="5008" to="5020" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Evaluating factuality in generation with dependency-level entailment</title>
		<author>
			<persName><forename type="first">T</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Durrett</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.findings-emnlp.322</idno>
		<ptr target="https://aclanthology.org/2020.findings-emnlp.322.doi:10.18653/v1/2020.findings-emnlp.322" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3592" to="3603" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">FactGraph: Evaluating factuality in summarization with semantic graph representations</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">F R</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dreyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bansal</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.naacl-main.236</idno>
		<ptr target="https://aclanthology.org/2022.naacl-main.236.doi:10.18653/v1/2022.naacl-main.236" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</title>
				<meeting>the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics<address><addrLine>Seattle, United States</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="3238" to="3253" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">SummaC: Re-visiting NLI-based models for inconsistency detection in summarization</title>
		<author>
			<persName><forename type="first">P</forename><surname>Laban</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Schnabel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">N</forename><surname>Bennett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Hearst</surname></persName>
		</author>
		<idno type="DOI">10.1162/tacl_a_00453</idno>
		<ptr target="https://aclanthology.org/2022.tacl-1.10.doi:10.1162/tacl_a_00453" />
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="163" to="177" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">mface: Multilingual summarization with factual consistency evaluation</title>
		<author>
			<persName><forename type="first">R</forename><surname>Aharoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narayan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Maynez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Herzig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lapata</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2212.10622</idno>
		<ptr target="https://arxiv.org/abs/2212.10622.doi:10.48550/ARXIV.2212.10622" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">mT5: A massively multilingual pre-trained text-to-text transformer</title>
		<author>
			<persName><forename type="first">L</forename><surname>Xue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Constant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Al-Rfou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Siddhant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.naacl-main.41</idno>
		<ptr target="https://aclanthology.org/2021.naacl-main.41.doi:10.18653/v1/2021.naacl-main.41" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</title>
				<meeting>the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="483" to="498" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Stanza: A python natural language processing toolkit for many human languages</title>
		<author>
			<persName><forename type="first">P</forename><surname>Qi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bolton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-demos.14</idno>
		<ptr target="https://aclanthology.org/2020.acl-demos.14.doi:10.18653/v1/2020.acl-demos.14" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="101" to="108" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title level="m" type="main">Ukrainian stemmer</title>
		<author>
			<persName><forename type="first">Vladislav</forename><surname>Klim</surname></persName>
		</author>
		<ptr target="https://github.com/Desklop/Uk_Stemmer" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Sentence-BERT: Sentence embeddings using Siamese BERTnetworks</title>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-1410</idno>
		<ptr target="https://aclanthology.org/D19-1410.doi:10.18653/v1/D19-1410" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3982" to="3992" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Mpnet: Masked and permuted pre-training for language understanding</title>
		<author>
			<persName><forename type="first">K</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-Y</forename><surname>Liu</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper_files/paper/2020/file/c3a690be93aa602ee2dc0ccab5b7b67e-Paper.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Larochelle</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Ranzato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Hadsell</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Balcan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="16857" to="16867" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Multilingual universal sentence encoder for semantic retrieval</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ahmad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Law</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Constant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hernandez Abrego</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>-H. Sung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Strope</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kurzweil</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-demos.12</idno>
		<ptr target="https://aclanthology.org/2020.acl-demos.12.doi:10.18653/v1/2020.acl-demos.12" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="87" to="94" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Unsupervised cross-lingual representation learning at scale</title>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Khandelwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Chaudhary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wenzek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Guzmán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.747</idno>
		<ptr target="https://aclanthology.org/2020.acl-main.747.doi:10.18653/v1/2020.acl-main.747" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="8440" to="8451" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">XNLI: Evaluating cross-lingual sentence representations</title>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rinott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Williams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bowman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schwenk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D18-1269</idno>
		<ptr target="https://aclanthology.org/D18-1269.doi:10.18653/v1/D18-1269" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<meeting>the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="2475" to="2485" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<title level="m" type="main">Hugging face</title>
		<author>
			<persName><forename type="first">Clément</forename><surname>Delangue</surname></persName>
		</author>
		<ptr target="https://huggingface.co/models" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<title level="m" type="main">Booksum: A collection of datasets for longform narrative summarization</title>
		<author>
			<persName><forename type="first">W</forename><surname>Kryscinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Rajani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Radev</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2105.08209</idno>
		<ptr target="https://arxiv.org/abs/2105.08209.doi:10.48550/ARXIV.2105.08209" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
