<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">The Role of Text Simplification Operations in Evaluation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Laura</forename><surname>Vásquez-Rodríguez</surname></persName>
							<email>laura.vasquezrodriguez@manchester.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="department">National Centre for Text Mining</orgName>
								<orgName type="institution">The University of Manchester</orgName>
								<address>
									<settlement>Manchester</settlement>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Matthew</forename><surname>Shardlow</surname></persName>
							<email>m.shardlow@mmu.ac.uk</email>
							<affiliation key="aff1">
								<orgName type="department">Department of Computing and Mathematics</orgName>
								<orgName type="institution">Manchester Metropolitan University</orgName>
								<address>
									<settlement>Manchester</settlement>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Piotr</forename><surname>Przybyła</surname></persName>
							<email>piotr.przybyla@ipipan.waw.pl</email>
							<affiliation key="aff2">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">Polish Academy of Sciences</orgName>
								<address>
									<settlement>Warsaw</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sophia</forename><surname>Ananiadou</surname></persName>
							<email>sophia.ananiadou@manchester.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="department">National Centre for Text Mining</orgName>
								<orgName type="institution">The University of Manchester</orgName>
								<address>
									<settlement>Manchester</settlement>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
							<affiliation key="aff3">
								<orgName type="institution">The Alan Turing Institute</orgName>
								<address>
									<settlement>London</settlement>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">The Role of Text Simplification Operations in Evaluation</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">2637518D5782268DD56BD4E17B29D61C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T11:07+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Text Simplification, Evaluation, Edit-operations, Simplification-operations, Wikipedia-based datasets Orcid 0000-0002-7313-905X (L. Vásquez-Rodríguez)</term>
					<term>0000-0003-1129-2750 (M. Shardlow)</term>
					<term>0000-0001-9043-6817 (P. Przybyła)</term>
					<term>0000-0001-7116-9338 (S. Ananiadou)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Research in Text Simplification (TS) has relied mostly on the Wikipedia-based datasets and the SARI evaluation metric, as the preferred means for creating and evaluating new simplification methods. Previous studies have pointed out the flaws of data evaluation resources, including incorrect alignment of simple/complex sentence pairs, sentences with no simplifications or a dearth in the variety of simplification operations. However, there are no further analyses on the impact of the original data distribution regarding the type of simplification operations performed. In this paper, we set up a systematic benchmark of the most common TS datasets, basing our evaluation on different protocols for split selection (e.g., selection by random or by Monte Carlo). We perform an operation-based investigation, demonstrating in detail the limitations of existing simplification datasets. Further, we make recommendations for future standardised practices in the design, creation and evaluation of TS resources.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>TS methods transform complex text fragments into their simple variants, according to specific operations and audiences. Non-native speakers can significantly benefit from the substitution of complex to simple words <ref type="bibr" target="#b0">[1]</ref>, while other audiences, such as people with aphasia, will benefit more from short, simple sentences <ref type="bibr" target="#b1">[2]</ref>. Although, categorising what complexity means for different audiences is useful for evaluation, TS remains a challenging task to benchmark for the following reasons: 1) The basic concept of simplicity (relying on language complexity) is vague and hard to define quantitatively, which means that proficient language users usually come up with different simplifications for a given sentence; 2) The possible usages of TS include scenarios aimed at different target audiences (e.g., children, non-native readers, people with aphasia or dyslexia) and domains (e.g., scientific texts, medical and legal documents), who may require different simplification methods; and 3) Using a gold-standard for TS evaluation requires human annotations which is time consuming and costly. This is usually avoided in a way similar to other Natural Language Generation (NLG) tasks (e.g., machine translation) by obtaining human annotated reference simplifications and evaluating systems based on their similarities to these. Although this mechanism of evaluation allows an unlimited number of systems and variants to be evaluated without further human effort, there are a number of factors we have to consider when interpreting the results.</p><p>Firstly, there may be many equally good simplifications for a given sentence, so comparison to a single reference may be penalising them unfairly. Despite the existing multiple references in some TS datasets, these cannot capture the rich diversity possible in simplification. Secondly, automatic similarity measures, such as BLEU <ref type="bibr" target="#b2">[3]</ref>, ROUGE <ref type="bibr" target="#b3">[4]</ref> or SARI <ref type="bibr" target="#b4">[5]</ref> have been previously shown to have limitations (e.g., weak correlation with human judgement, dependency on quality references, failure to capture task-dependent aspects such as simplicity), both in general tasks <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7]</ref> and in the context of TS <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9]</ref>. Thirdly, how data is split between training and test data can influence results. This is well-known in general <ref type="bibr" target="#b9">[10]</ref>, but has not attracted much attention in TS. Finally, simplification operations may be unevenly distributed in TS datasets, affecting the types of simplifications that a model learns to produce. Test splits may not reflect the same simplification operations as in the training split from the same dataset.</p><p>In this paper, we explore the impact of data splits (random and stratified) on English TS datasets and set up a systematic benchmark on the existing datasets with altered distributions. Our contributions are: 1) An operations-based analysis of TS datasets generated by stratification algorithms; 2) A performance evaluation on experimental operation-based datasets; and 3) Recommendations towards a standardised practice for building and evaluating new TS datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Previous studies <ref type="bibr" target="#b4">[5]</ref> have demonstrated the poor-quality of TS datasets used for establishing the state of the art. In particular, Wikipedia-based datasets <ref type="bibr" target="#b10">[11]</ref> have incorrectly aligned complexsimple sentence pairs (e.g., sentences with no semantic similarity to each other), and pairs with no simplification or unbalanced simplification operations (e.g., datasets that perform mostly deletions). In contrast, Newsela is a better quality dataset <ref type="bibr" target="#b11">[12]</ref> created by professional translators, however it includes a restrictive data agreement that prohibits publishing or sharing data, preventing research reproducibility and the sharing of splits or alignments. Due to these reasons, we have not included Newsela in our study.</p><p>Operations-based analysis for datasets is less common and mostly performed based on specific scenarios. Alva-Manchego et al. <ref type="bibr" target="#b12">[13]</ref> performed a detailed text features-based analysis in the ASSET dataset, including sentences splits, word deletions, insertions and reorder. Xu et al. <ref type="bibr" target="#b11">[12]</ref> analysed a sample of 200 sentences from the PWKP dataset <ref type="bibr" target="#b13">[14]</ref> and classified them based on whether they were simplifications or not. Real simplifications were classified under these categories: amount of deletions-only, paraphrasing-only and a combination of both.</p><p>Despite the efforts to improve these datasets in terms of the variety of simplification operations performed and the amount of gold-standard references <ref type="bibr" target="#b12">[13]</ref>, the statistical distributions of these datasets have not been explored. Recent work from the NLG domain has suggested how the use of random splits can contribute to model performance <ref type="bibr" target="#b14">[15]</ref>. Further, there is also a strong argument towards biased or adversarial splits <ref type="bibr" target="#b9">[10]</ref>, demonstrating that dataset distribution is relevant in NLP. Neither of these has been considered for TS.</p><p>Another important fact to consider is the unsuitability of TS evaluation metrics. Over the past few years, the TS research community avoided using the BLEU evaluation metric <ref type="bibr" target="#b7">[8]</ref> due to its low correlation with simplicity. Moreover, when simplicity is directly compared with human evaluation, it shows a negative correlation with meaning preservation <ref type="bibr" target="#b15">[16]</ref>, since building simple sentences also involves removing information from the original ones. As of today, the only available means of TS evaluation is SARI <ref type="bibr" target="#b4">[5]</ref>, which is not only limited as a measure of 'simplicity gain' in a lexical paraphrasing setting, but also it is potentially flawed when multiple rewrite operations are present <ref type="bibr" target="#b16">[17]</ref>. As aforementioned, automatic evaluation of simplicity is still an open question in the TS domain.</p><p>For the development of TS systems, simplification operations can also have a fundamental role, where they are explicitly identified or submitted into a TS model. The EditNTS system, a neural programmer-interpreter model <ref type="bibr" target="#b17">[18]</ref>, detects and predicts ADD, DELETE and KEEP simplification operation during training. Others systems, such as SeqLabel <ref type="bibr" target="#b18">[19]</ref>, performs an automatic identification of operations in the original parallel corpus, creating a new annotated corpus for training the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Operation-based Simplification Experiments</head><p>We conducted a systematic analysis of the key operations we have identified for all commonly available TS datasets. Initially, we analysed the amount of deletions, insertions and replacements for the different subsets of each TS dataset (i.e., train, development and test when available). We did not include the split operation, since our preliminary analysis using HSplit did not show relevant changes from an edit-distance perspective. Next, we analysed the impact of these operations on the output sentences, comparing how much a complex sentence is changed with the presence of these transformations (Section 3.1 and Section 3.2). Furthermore, we also analysed their distribution, with regards to these simplification operations, proposing new scenarios to benchmark on these new distributions (Section 3.3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Creating Operation-based Datasets</head><p>We performed our analysis using common Wikipedia-based TS datasets, including: WikiSmall and WikiLarge <ref type="bibr" target="#b10">[11]</ref>, TurkCorpus <ref type="bibr" target="#b4">[5]</ref> and ASSET <ref type="bibr" target="#b12">[13]</ref>  <ref type="foot" target="#foot_0">1</ref> . In particular, we focused on analysing the original TS datasets and our proposed experimental datasets, which are modified versions of WikiLarge and WikiSmall using different distribution methods. We have chosen these resources, since they provide a test, a development and training subset, which are essential for our distribution experiments. We analysed these datasets under the following classifications:</p><p>Original distribution: we examined all subsets of the original TS datasets with no modification by applying the metrics defined in section 3.2. We quantified the distribution divergence between subsets (test compared to train and test compared to development) by calculating the Kullback-Lieber (KL-divergence) <ref type="bibr" target="#b20">[21]</ref> and Jensen-Shannon divergence (JSD) <ref type="bibr" target="#b21">[22]</ref>. As a result, the Wikilarge dataset had a KL-divergence of 0.46 and a JSD divergence of 0.41, confirming that the split of this dataset is not truly random. This can be compared in detail by observing the distribution of these subsets in Figure <ref type="figure" target="#fig_0">1a</ref>. Also, we determined that there is a significant amount of sentences with no operations and sentences that have changed 100% during the simplification process. By performing a post-hoc manual inspection of these cases, we noticed that these corresponded to inaccurate simplifications as a product of bad alignments (i.e., poor alignments or noise). Given these results, we proposed additional distributions to improve the distribution of simplification operations in WikiSmall and WikiLarge datasets.</p><p>Random distribution: to create randomly distributed datasets, we merged all the subsets from the original dataset into a single dataset, shuffled data using Numpy <ref type="bibr" target="#b22">[23]</ref> and recreated the subsets keeping their original size. We repeated this process by using 5 different random seeds (155, 324, 393, 728, 989). The seeds selection was randomly generated, except for 324, which belongs to the original implementation of EditNTS and to the initial explorations in our previous work <ref type="bibr" target="#b19">[20]</ref>. In Figure <ref type="figure" target="#fig_0">1</ref>, we can see a comparison of the original (Figure <ref type="figure" target="#fig_0">1a</ref>) and the random distribution (Figure <ref type="figure" target="#fig_0">1c</ref>) for seed 324 <ref type="foot" target="#foot_1">2</ref> .</p><p>Minimised poor-alignments distribution: we manually inspected sentences shown at the right-most of Figure <ref type="figure" target="#fig_0">1a</ref> and we observed that sentences close to 100% of change correspond to incorrect simplifications or alignments. Based on this, we created new datasets by removing these poor-alignments ranging from 2% to 20% of sentences with the worst alignment from the original dataset. These splits were not randomised to isolate the effect of removing the poor alignments in TS datasets and duplicates were removed. Figure <ref type="figure" target="#fig_0">1d</ref> and 1e show the decrease in the percentage of change in WikiLarge by using this heuristic, including a significantly higher reduction of change in the tests sets compared to the other subsets.</p><p>Stratified distribution: sentences in TS datasets can be analysed not only by the changes done from the original to the simplified sentence, but also by the operation type. Our main goal for building new stratified splits is to have similar number of operations of each type (e.g., deletes, inserts and replacements) in each subset. Since a single sentence simplification can involve multiple operations, it is difficult to have the desired distribution between subsets. Among the algorithms evaluated, we selected Monte Carlo Algorithm 3 as our best approach based on the operations distribution. The original datasets were distributed according to this algorithm; datasets subsets were rebuilt and then analysed, likewise to the random distribution. We generated 500,000 random splits searching for one with the best standard deviation between the amount of DELETE, INSERT and REPLACE in each subset. At every 100,000 iterations, we saved the 2 best candidates based on their best standard deviation: one in the training set and one in the development and test sets, minimising the difference in their individual standard deviation. For WikiLarge, the more suitable splits were iteration 200,000 and 400,000, whereas for WikiSmall these were iterations 300,000 and 500,000. We show the latter in the Appendix.</p><p>Once the original and new experimental datasets were created and analysed, we evaluated the effect they had on the performance of the EditNTS <ref type="bibr" target="#b17">[18]</ref> model by measuring the change in SARI score when training on the redistributed datasets. We adapted the original code with some minor modifications to run in our setting, including: model randomisation with fixed seeds, scripts for data preprocessing and the automation of test sets evaluation. We trained the models on the original and the experimental subsets (poor-alignments reduction, random and stratified distributions) using the same hyper parameters from the EditNTS model (batch size, epochs, dropout, and learning rate). Next, we evaluated the performance of the newly trained model by using ASSET as an external test subset. Finally, we manually inspected a sample of the model outputs for all the proposed datasets. The adaptations for the EditNTS model, the experimental subsets, the model outputs and the source code are documented via GitHub<ref type="foot" target="#foot_3">4</ref> . </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Quantifying Simplification Operations</head><p>Wikipedia-based TS datasets were created collaboratively by volunteers, with the main goal to support learning for non-native speakers. In addition to the rule of writing in Simple English <ref type="foot" target="#foot_4">5</ref> , there were no specific guidelines on how to simplify text, such as the type and the amount of simplifications allowed, or whether it should match the original Wikipedia article.</p><p>Except for specific studies done by Alva-Manchego et al. <ref type="bibr" target="#b12">[13]</ref> and Xu et al. <ref type="bibr" target="#b11">[12]</ref>, there is no accurate notion of what simplification operations are performed in all TS datasets. These studies are less comprehensive since they investigate specific TS datasets or limited TS operations. Consequently, we analysed common TS datasets by using the following metrics: Simplification operations count: we quantified the percentage of edits required to transform a complex sentence to a simple one (henceforth, edit-distance <ref type="bibr" target="#b23">[24]</ref>). To achieve this, we calculated the edit-distance between two sentences by adapting the Wagner-Fischer algorithm <ref type="bibr" target="#b24">[25]</ref> to determine changes from characters-level to a token-level (e.g., words). This method defines how many tokens in the complex sentence were changed in the simplified output (e.g., 2 tokens that were deleted from one version to another is equivalent to 2 changes). Prior to the analysis, sentences were changed to lowercase. Values are expressed as a change percentage, where 0% indicates sentences with no changes and 100% indicates completely different sentences. In Figure <ref type="figure" target="#fig_0">1</ref> we show the edit-distance analysis for WikiLarge, for the original splits (Figure <ref type="figure" target="#fig_0">1a</ref>) and also, for the randomised (Figure <ref type="figure" target="#fig_0">1c</ref>), poor-alignments-based (Figure <ref type="figure" target="#fig_0">1d</ref>, 1e) and stratified splits (Figure <ref type="figure" target="#fig_0">1f</ref>). Random and stratified experimental splits clearly show a more even distribution of sentences between subsets, according to the amount of change required to obtain a new simplification from a complex sentence. On the other hand, removing poor-alignments, without a proper distribution leaves the tests sets with the majority of samples with minimal or no change.</p><p>Simplification operations types: after extracting the token-level edits done between two sentences, we classified them into simplification operations: INSERT (a token(s) has been added), DELETE (a token(s) has been removed) and REPLACE (a token(s) has been substituted). These three basic operations can be performed at a lexical-level <ref type="foot" target="#foot_5">6</ref> . We show in Figure <ref type="figure" target="#fig_0">1b</ref> the simplification operations types for WikiLarge dataset. These results not only show how unbalanced these operations are between subsets but also the predominance of DELETE operations in the WikiLarge dataset for the development and training subsets. Also, the DELETE effect is also noticeable when we manually checked the outputs of the models. A majority of the simplification operations performed deletions in the original sentence, rather than, performing substitutions or insertions. Furthermore, in Figure <ref type="figure" target="#fig_3">3</ref> we performed a more exhaustive comparison, analysing the operations count and their distribution in all our experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Evaluating Operation-based Datasets</head><p>We performed an evaluation of the proposed datasets by retraining the EditNTS model with development and training subsets (for both WikiSmall and WikiLarge) <ref type="foot" target="#foot_6">7</ref> . Once models were trained, we evaluated their performance using the SARI scores provided by the model evaluation scripts. In our evaluation setting, the test subset of the ASSET dataset was used to test the trained models. Also, we reported the average results for the evaluation of all ASSET references in each complex sentence, since our implementation based on EditNTS model evaluated one  test reference at a time. Figure <ref type="figure" target="#fig_2">2</ref> shows a comparison between SARI scores in all the models, for WikiLarge-based and WikiSmall-based models. We also include the error bars (standard deviation of ASSET averaged observations) for all calculations. We observed that randomising the distribution and reducing the poor alignments helped for the WikiSmall dataset. Meanwhile, using the Monte Carlo algorithm and performing more substantial reductions in the distribution had a better contribution for WikiLarge.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Discussion</head><p>Firstly, we conclude from our analyses (Section 3) that the original TS datasets do not follow an even distribution between subsets. We observe that the test, development and train subsets are different when measured by the amount of change from the simple to the complex sentence. Furthermore, our evaluation on these experimental subsets show that the random distributions provide significant variations in SARI scores, even though its composition is similar. For Wik-iSmall (Figure <ref type="figure" target="#fig_2">2a</ref>), the results between random splits showed an increase of up to 7 percentage points in SARI score, just by randomising dataset composition and rebuilding the dataset. For WikiLarge (Figure <ref type="figure" target="#fig_2">2b</ref>), we found a similar effect with the Monte Carlo algorithm, which is a split randomised by 200,000 iterations. The main difference is that this algorithm selects the best score among all the generated random samples, rather than any of them. In this setting, the variation of the SARI score is about 5 percentage points. The difference in SARI score should be interpreted as a measure of simplicity gain, which provides a relative comparison of correctness between simplifications. However, it cannot be interpreted as the best possible simplification, since this evaluation metric fails to measure simplicity alone, as mentioned in Section 2.</p><p>Secondly, WikiSmall and WikiLarge datasets show a significant amount of noise and sentences that are not simplifications. Interestingly, we can see in Figure <ref type="figure" target="#fig_0">1e</ref> that when we aggressively removed 15% of the dataset, it reduced considerably the amount of sentences with a percentage of change higher than 40%. Despite this, the performance between the original model orig_100% and its reduced version orig_85% did not change more than 0.02% in both WikiSmall and WikiLarge. For the model orig_80% (which has 20% of estimated noise reduction), we observed a different scenario in WikiSmall; since, in comparison with orig_100% model the performance of this dataset dropped 2.6%. WikiSmall dataset is significantly smaller than WikiLarge (3X), and so, such a reduction affects a higher number of real simplifications. In contrast, Wikilarge model orig_98% has a minimal number of noise reduction, keeping its composition almost unchanged. We presume that the decrease in the model performance relates to having the same dataset composition but with less sentence samples (despite their lower quality). Thirdly, we discuss the datasets composition with respect to the operations count (Figure <ref type="figure" target="#fig_3">3</ref>). Due to the large size of the training corpus, the count in the train subset is similar for all the datasets. However, that is not the case of the test and the development subsets, where we noticed meaningful differences. We observe a consistent decrease in all the operations for the models where we removed the 'poor-alignments'. Nevertheless, as we mentioned earlier, orig_80% was the only model which presented a decrease in performance, with a minimal amount of edit operations. On the contrary, despite the similar distribution in operations between random datasets we did observe performance variations between these models. It is relevant to consider that the test and development subsets are quite smaller than the training subset (359 test / 992 dev / 296,402 train in the original sentence pairs). We presume that this could minimise the effect of new distributions towards the model performance. As future work, we would consider changing the original subset sizes to explore further the effect of simplification operations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Recommendations for TS datasets quality assessment</head><p>Although the evaluation metrics and model outputs are not globally providing enough information about a dataset, we believe it is important to follow a structured setting to value the quality of a dataset. To ensure interpretable methods for dataset quality assessment, we make the following recommendations for TS dataset evaluation.</p><p>Noisy alignments detection: current TS datasets are automatically aligned, hence, these are likely to have incorrect or unaligned sentence pairs. We propose a heuristic in which these inaccurate alignments can be detected by quantifying the amount of change between the complex sentence and the gold-reference ones. This can be implemented by sorting TS datasets using edit-distance values so sentences with higher amount of changes are grouped together, providing a straight-forward way for detection and removal of noise. The ideal threshold in which sentences are removed can be determined by visually inspecting these groups.</p><p>Simplification operations distribution: depending on the audience, some simplification operations can be more useful than others. Ideally, we would expect not only a variety of simplification operations but also, a similar distribution of operations between subsets tailored to a given simplification need. There are valid scenarios in which particular operations could be enough (e.g., REPLACE operation for complex word simplification for non-native speakers). Other areas such as news simplification, require more elaborate constructions which not only involves simplifications at a lexical level, but also at a discourse level (e.g, news for general public targeted for children at schools in the Newsela dataset). By using token-based edit distance, we can perform a global count of simplification operations performed and an evaluation of their distribution as an aid for stratifying TS datasets as needed.</p><p>Datasets stability: from our experiments, we have observed that dataset distribution significantly affects TS model performance (measured by an increase or decrease in SARI score). Our recommendation is to perform a dataset randomisation with different random seeds to evaluate the impact of data distribution in TS models performance. In addition, datasets of significant size, such as Wikilarge, showed to be more stable in this setting (less variation in SARI score between random seeds).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusions</head><p>In this paper, we have performed a systematic analysis of the most common TS operations demonstrating the statistical limitations of English TS datasets. Our analysis can be reproduced through our published scripts, which can also be used to analyse any other TS parallel dataset for quality assessment. Moreover, we carried out a detailed evaluation of all of our experimental settings, including distributions with poor-alignments reduction, randomisation and stratification using the Monte Carlo algorithm. Finally, we have proposed a set of recommendations for the creation of more reliable and standardised datasets for a better environment of TS evaluation resources. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2. Random-based analysis</head></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: WikiLarge dataset analysis. x-axis: (a), (c) to (f) percentage of change (0% to 100%), (b) operation types; y-axis: (a), (c) to (f) sentence probability density, (b) operation types probability density.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Comparison of TS models in EditNTS model using ASSET</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Simplification operations count for WikiLarge Test/Dev subsets</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Random-based dataset analysis for WikiSmall (a-g) and WikiLarge (h-n)</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">For evaluation, we limited our study to ASSET, since it shows a wider variety of operations based on its edit-distance<ref type="bibr" target="#b19">[20]</ref>. Also, due to space constraints, we have included the WikiSmall dataset analysis in the Appendix.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">To avoid multiple sources of randomness, we have improved the EditNTS system to guarantee that our model results would be deterministic. Our adaptations to the model can be found at our fork from the original Github https://github.com/lmvasque/EditNTS-eval.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">Our implementation of the Monte Carlo Algorithm runs multiple iterations of the random distribution and calculates the standard deviation of each attempt. Finally, it chooses the distribution that has the smallest standard deviation as the best approach for n attempts.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://github.com/lmvasque/ts-explore</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">https://simple.wikipedia.org/wiki/Wikipedia:How_to_write_Simple_English_pages</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">We also merged DELETE and INSERT in cases where the same word or phrase is deleted and then inserted again. We called this the MOVE operation. However, since the count of the MOVE operation was insignificant, we only report on three main operations: INSERT, DELETE and REPLACE.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">We did not retrain the model using the traditional subsets (i.e. TurkCorpus, ASSET), since our objective was to study the statistical weakness of aforementioned datasets.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We would like to thank Nhung T.H. Nguyen for her valuable discussions and comments. Laura Vásquez-Rodríguez's work was funded by the Kilburn Scholarship from the University of Manchester. Piotr Przybyła's work was supported by the Polish National Agency for Academic Exchange through a Polish Returns grant number PPN/PPO/2018/1/00006.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Unsupervised lexical simplification for non-native speakers</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">H</forename><surname>Paetzold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</author>
		<ptr target="http://nlp.stanford.edu/projects/glove/" />
	</analytic>
	<monogr>
		<title level="m">30th AAAI Conference on Artificial Intelligence, AAAI 2016</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="3761" to="3767" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Simplifying text for languageimpaired readers</title>
		<author>
			<persName><forename type="first">J</forename><surname>Carroll</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Minnen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pearce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Canning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tait</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/E99-1042" />
	</analytic>
	<monogr>
		<title level="m">Ninth Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting><address><addrLine>Bergen, Norway</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="269" to="270" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Bleu: a method for automatic evaluation of machine translation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Papineni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Roukos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ward</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-J</forename><surname>Zhu</surname></persName>
		</author>
		<idno type="DOI">10.3115/1073083.1073135</idno>
		<ptr target="https://www.aclweb.org/anthology/P02-1040.doi:10.3115/1073083.1073135" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Philadelphia, Pennsylvania, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="311" to="318" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">ROUGE: A package for automatic evaluation of summaries</title>
		<author>
			<persName><forename type="first">C.-Y</forename><surname>Lin</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/W04-1013" />
	</analytic>
	<monogr>
		<title level="m">Text Summarization Branches Out, Association for Computational Linguistics</title>
				<meeting><address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="74" to="81" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Optimizing statistical machine translation for text simplification</title>
		<author>
			<persName><forename type="first">W</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Napoles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pavlick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Callison-Burch</surname></persName>
		</author>
		<idno type="DOI">10.1162/tacl_a_00107</idno>
		<ptr target="https://www.aclweb.org/anthology/Q16-1029.doi:10.1162/tacl_a_00107" />
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="401" to="415" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks</title>
		<author>
			<persName><forename type="first">K</forename><surname>Ganesan</surname></persName>
		</author>
		<idno>arXiv</idno>
		<ptr target="http://arxiv.org/abs/1803.01937.arXiv:1803.01937" />
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Re-evaluating the role of Bleu in machine translation research</title>
		<author>
			<persName><forename type="first">C</forename><surname>Callison-Burch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Osborne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Koehn</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/E06-1032" />
	</analytic>
	<monogr>
		<title level="m">11th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting><address><addrLine>Trento, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">BLEU is not suitable for the evaluation of text simplification</title>
		<author>
			<persName><forename type="first">E</forename><surname>Sulem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Abend</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rappoport</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D18-1081</idno>
		<ptr target="https://www.aclweb.org/anthology/D18-1081.doi:10.18653/v1/D18-1081" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<meeting>the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="738" to="744" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Reference-less quality estimation of text simplification systems</title>
		<author>
			<persName><forename type="first">L</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Humeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P.-E</forename><surname>Mazaré</surname></persName>
		</author>
		<author>
			<persName><forename type="first">É</forename><surname>De La Clergerie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bordes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sagot</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W18-7005</idno>
		<ptr target="https://www.aclweb.org/anthology/W18-7005.doi:10.18653/v1/W18-7005" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st Workshop on Automatic Text Adaptation (ATA), Association for Computational Linguistics</title>
				<meeting>the 1st Workshop on Automatic Text Adaptation (ATA), Association for Computational Linguistics<address><addrLine>Tilburg, the Netherlands</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="29" to="38" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Søgaard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ebert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bastings</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Filippova</surname></persName>
		</author>
		<idno>arXiv</idno>
		<ptr target="http://arxiv.org/abs/2005.00636.arXiv:2005.00636" />
		<title level="m">We need to talk about random splits</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Sentence simplification with deep reinforcement learning</title>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lapata</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D17-1062</idno>
		<ptr target="https://www.aclweb.org/anthology/D17-1062.doi:10.18653/v1/D17-1062" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<meeting>the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Copenhagen, Denmark</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="584" to="594" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Problems in current text simplification research: New data can help</title>
		<author>
			<persName><forename type="first">W</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Callison-Burch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Napoles</surname></persName>
		</author>
		<idno type="DOI">10.1162/tacl_a_00139</idno>
		<ptr target="https://www.aclweb.org/anthology/Q15-1021.doi:10.1162/tacl_a_00139" />
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="283" to="297" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Alva-Manchego</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bordes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Scarton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sagot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.424</idno>
		<idno>arXiv</idno>
		<ptr target="http://arxiv.org/abs/2005.00481.doi:10.18653/v1/2020.acl-main.424.arXiv:2005.00481" />
		<title level="m">ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">A monolingual tree-based translation model for sentence simplification</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bernhard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/C10-1152" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Coling 2010 Organizing Committee</title>
				<meeting>the 23rd International Conference on Computational Linguistics (Coling 2010), Coling 2010 Organizing Committee<address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="1353" to="1361" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">We need to talk about standard splits</title>
		<author>
			<persName><forename type="first">K</forename><surname>Gorman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bedrick</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P19-1267</idno>
		<ptr target="https://www.aclweb.org/anthology/P19-1267.doi:10.18653/v1/P19-1267" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="2786" to="2791" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Human Evaluation for Text Simplification: The Simplicity-Adequacy Tradeoff</title>
		<author>
			<persName><forename type="first">M</forename><surname>Schwarzer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kauchak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SoCal NLP Symposium</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">ASSET: A dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations</title>
		<author>
			<persName><forename type="first">F</forename><surname>Alva-Manchego</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bordes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Scarton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sagot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.424</idno>
		<ptr target="https://www.aclweb.org/anthology/2020.acl-main.424.doi:10.18653/v1/2020.acl-main.424" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="4668" to="4679" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">EditNTS: An neural programmerinterpreter model for sentence simplification through explicit editing</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rezagholizadeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C K</forename><surname>Cheung</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P19-1331</idno>
		<ptr target="https://www.aclweb.org/anthology/P19-1331.doi:10.18653/v1/P19-1331" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3393" to="3402" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Learning how to simplify from explicit labeling of complex-simplified text pairs</title>
		<author>
			<persName><forename type="first">F</forename><surname>Alva-Manchego</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bingel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Paetzold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Scarton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/I17-1030" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eighth International Joint Conference on Natural Language Processing</title>
				<meeting>the Eighth International Joint Conference on Natural Language Processing<address><addrLine>Taipei, Taiwan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="295" to="305" />
		</imprint>
	</monogr>
	<note>Long Papers), Asian Federation of Natural Language Processing</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Investigating text simplification evaluation</title>
		<author>
			<persName><forename type="first">L</forename><surname>Vásquez-Rodríguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Shardlow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Przybyła</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ananiadou</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.findings-acl.77</idno>
		<ptr target="https://aclanthology.org/2021.findings-acl.77.doi:10.18653/v1/2021.findings-acl.77" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="876" to="882" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">On Information and Sufficiency</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kullback</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Leibler</surname></persName>
		</author>
		<idno type="DOI">10.1214/aoms/1177729694</idno>
	</analytic>
	<monogr>
		<title level="j">The Annals of Mathematical Statistics</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page" from="79" to="86" />
			<date type="published" when="1951">1951</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Divergence Measures Based on the Shannon Entropy</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE, Transactions on Information Theory</title>
		<imprint>
			<date type="published" when="1991">1991</date>
		</imprint>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Array programming with NumPy</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">R</forename><surname>Harris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">J</forename><surname>Millman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Van Der Walt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gommers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Virtanen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cournapeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wieser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Taylor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Berg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">J</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kern</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Picus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">H</forename><surname>Van Kerkwijk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Haldane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">F</forename><surname>Del Río</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiebe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Peterson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gérard-Marchant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sheppard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Reddy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Weckesser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Abbasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gohlke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">E</forename><surname>Oliphant</surname></persName>
		</author>
		<idno type="DOI">10.1038/s41586-020-2649-2</idno>
		<ptr target="https://doi.org/10.1038/s41586-020-2649-2.doi:10.1038/s41586-020-2649-2" />
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="volume">585</biblScope>
			<biblScope unit="page" from="357" to="362" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Binary codes capable of correcting deletions, insertions, and reversals</title>
		<author>
			<persName><forename type="first">V</forename><surname>Levenshtein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Soviet Physics Doklady</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<date type="published" when="1966">1966</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">The String-to-String Correction Problem</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Wagner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Fischer</surname></persName>
		</author>
		<idno type="DOI">10.1145/321796.321811</idno>
		<idno>doi:</idno>
		<ptr target="10.1145/321796.321811" />
	</analytic>
	<monogr>
		<title level="j">Journal of the ACM (JACM)</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="168" to="173" />
			<date type="published" when="1974">1974</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
