<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Zeta &amp; Eta: An Exploration and Evaluation of Two Dispersion-based Measures of Distinctiveness</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Keli</forename><surname>Du</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Julia</forename><surname>Dudar</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Cora</forename><surname>Rok</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Christof</forename><surname>Schöch</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">University of Trier</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
							</affiliation>
						</author>
						<title level="a" type="main">Zeta &amp; Eta: An Exploration and Evaluation of Two Dispersion-based Measures of Distinctiveness</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">55E139727A919342D020CAB194822119</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T19:45+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Computational Literary Studies</term>
					<term>measure of distinctiveness</term>
					<term>Zeta</term>
					<term>Eta</term>
					<term>dispersion</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In Corpus Linguistics, numerous statistical measures have been adopted to analyze large amounts of textual data in a contrastive perspective, in order to extract characteristic or "distinctive" features. While the most widely-used keyness measures are based on word frequency, an increasing number of research papers recently suggested dispersion-based measures as a better solution. These, however, are not new to Computational Literary Studies (CLS). In 2007, John Burrows introduced Zeta, a statistical measure that is mainly based on the degree of dispersion of a feature in a text corpus. In this paper, we also introduce Eta, a new measure of distinctiveness that is based on deviation of proportions suggested by Stefan Gries. By comparing Eta with Zeta, we demonstrate that both measures are able to identify relevant, interpretable distinctive words in a target corpus. Additionally, we make a first attempt to detect the key differences between these two measures by interpreting the top distinctive words.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In Linguistics and Literary Studies, comparing groups of texts -e.g. belonging to different literary genres or written for different audiences -is a fundamental procedure <ref type="bibr">[11,</ref> see e.g., ]. In Corpus Linguistics, numerous statistical measures and instruments have been introduced and adopted for investigating and analyzing large amounts of textual data in a contrastive perspective [e.g. <ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b14">15]</ref>. They are usually referred to as 'keyness measures', as they operate on a lexical level and are used for extracting "key" terms or phrases. We prefer the term 'measures of distinctiveness', as it better emphasizes that this kind of analysis is about the extraction of characteristic words on the basis of a comparison [ <ref type="bibr">see 24]</ref>.</p><p>The most widespread keyness measures used in Corpus Linguistics are frequency-based -for example, the chi-squared test or the log-likelihood-ratio test <ref type="bibr" target="#b24">[25]</ref>, implemented e.g. in AntConc <ref type="bibr" target="#b0">[1]</ref>. Recently, several research papers suggested dispersion-based measures as a better solution for contrastive corpus analysis [e.g. <ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b6">7]</ref>. Apart from that, the use of dispersion in the search for important text features is not new to Computational Literary Studies (CLS). In 2007, John Burrows introduced Zeta, a keyness measure that is mainly based on the degree of dispersion of a feature in a text corpus <ref type="bibr" target="#b1">[2]</ref>. Originally, it was used in the context of authorship CHR 2021: Computational Humanities Research Conference, <ref type="bibr">November 17-19, 2021</ref>, Amsterdam, The Netherlands duk@uni-trier.de (K. Du); dudar@uni-trier.de (J. Dudar); rok@uni-trier.de (C. Rok); schoech@uni-trier.de (C. <ref type="bibr">Schöch)</ref> 0000-0001-7800-0682 (K. Du); 0000-0001-5545-9562 (J. Dudar); 0000-0001-9698-7513 (C. Rok); 0000-0002-4557-2753 (C. <ref type="bibr">Schöch)</ref> attribution, but it later came to be used also to solve other issues in CLS, including corpus comparison [e.g. <ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b22">23]</ref>.</p><p>There are several important studies that explore and evaluate frequency-based measures [e.g. <ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b18">19,</ref><ref type="bibr" target="#b5">6]</ref>, and some studies that compare dispersion based measures to frequency based measures [e. <ref type="bibr">g 4, 8, 12]</ref>. However, as far as we know, no attempt has been made to compare the dispersion-based measures to each other. In our project "Zeta and company"<ref type="foot" target="#foot_0">1</ref> we aim to enhance the understanding of both frequency-and dispersion-based measures by implementing them in a Python framework. Based on tests with literary texts we evaluate which measures perform best for different tasks and kinds of textual data. This article presents a pilot study in our project and it aims to perform a statistical analysis and a qualitative evaluation of two dispersion-based distinctiveness measures: (1) Eta, which is based on deviation of proportions (DP), developed by Stefan Gries; (2) Zeta, which was proposed by John Burrows. <ref type="foot" target="#foot_1">2</ref>Firstly, we will explain how Eta and Zeta are calculated. After that, using a collection of 160 novels of four different subgenres published in France in the 1980s, we will examine how Eta behaves in contrast to Zeta and how their relationship changes when the segment length varies. The following questions will be addressed: How useful is Eta as a basis for identifying distinctive words in one text group compared to another text group? What are the differences between Eta and Zeta and what results do they display?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Keyness analysis: from frequency to dispersion</head><p>Despite the dominance of frequency-based keyness measures (e.g. chi-squared test, log-likelihood ratio test), there are several alternative measures which consider other types of information like the distribution of words (e.g. t-Test, Mann-Whitney-U-test) and their dispersion (e.g. Zeta). A helpful overview of the frequency-and distribution-based measures can be found in <ref type="bibr" target="#b11">[12]</ref>. In addition, Machine Learning-approaches (e.g. weights of a linear SVM) or entropy-related approaches (e.g. Kullback-Leibler divergence, see <ref type="bibr" target="#b4">[5]</ref>) can be used to identify distinctive words in a target corpus.</p><p>As already mentioned, the most widely used keyness measures in Corpus Linguistics are frequency-based and they do not consider how the particular words are distributed within a corpus. This means that a word can be marked as distinctive for the entire target corpus, even if it just appears very frequently in a small number of texts. For illustration, Figure <ref type="figure" target="#fig_0">1</ref> presents the result of an analysis carried out using AntConc's log-likelihood ratio test on our working corpus (described below): keywords where extracted from a comparison of 40 French science fiction novels (as the target corpus) with 120 French novels of other subgenres (as the comparison corpus). <ref type="foot" target="#foot_2">3</ref> It turns out that the top-ranked words are almost entirely proper names. Each of them appears only in one novel of the target corpus, albeit very frequently, and likely not at all in the comparison corpus and therefore cannot truly represent the entire target corpus. In order to obtain more meaningful results, proper names should be pruned from the list.</p><p>To deal with this challenge, the dispersion of a feature, which is the degree of an even distribution of a feature, should be considered as well (on dispersion, see <ref type="bibr" target="#b12">[13]</ref>; for the use of dispersion for keyness analysis, see <ref type="bibr" target="#b3">[4]</ref>). Gries <ref type="bibr" target="#b7">[8]</ref> gives a detailed overview of dispersion measures and proposes his own measure, called deviation of proportions (DP).</p><p>DP compares the difference between observed and expected relative frequency of a word in every single document of the corpus in order to quantify the dispersion of the word: DP is calculated as follows: for each corpus part (e.g., a file), compute s, which represents how much of the corpus it constitutes (as a fraction of the whole corpus) and v, which represents how much of the word in question it contains (as a fraction of the word's frequency). Then subtract all s-values from all v-values, take the absolute values of those differences, sum them up, and divide by two <ref type="bibr" target="#b6">[7]</ref>.</p><formula xml:id="formula_0">DP = ∑ n i=1 |s i − v i |<label>2</label></formula><p>The theoretical range of DP values is between 0 and 1. A value of 0 reflects a perfectly even dispersion, while a value of 1 represents a maximally uneven dispersion. This measure seems to have several advantages compared to other dispersion measures. For example, it can handle corpus parts of different lengths and it can distinguish between slight variations in distribution without being overly sensitive. However, there is still a lack of empirical evidence supporting the use of DP.</p><p>As mentioned before, Burrows' Zeta also considers dispersion and it is calculated by comparing the document proportion (docP) of each feature in the target and in the comparison corpus. At first, each text in each group is divided into segments of a certain length (segment length is a key parameter of the measure). For each word w in the vocabulary, docP is calculated by establishing the proportion of segments in which the word occurs at least once, so docP ranges between 0 and 1.</p><p>In order to find out whether a word is distinctive for the target coups, the docP or devP<ref type="foot" target="#foot_3">4</ref> values of the word in the target and the comparison corpus must be compared, respectively. Based on docP and devP, two measures of distinctiveness can be defined. The Zeta score of (w) is the subtraction of docP in the comparison corpus from that in the target corpus [see 21]. Therefore, the theoretical range of the Zeta score is between -1 and 1. The words with the highest Zeta scores are the most distinctive words of the target corpus. By analogy, and using devP instead of docP as the measure of dispersion, a new measure of distinctiveness can be defined, which we call Eta. It is obtained by subtracting the devP of a word (w) in the comparison corpus from the devP of the same word in the target corpus. Contrary to docP, a small devP of a word reflects a more even distribution of a feature in a corpus. It is therefore expected that the devP of distinctive words in the target corpus is smaller than the devP of these words in the comparison corpus. So the words with the lowest Eta scores are the most distinctive words of the target corpus. <ref type="foot" target="#foot_4">5</ref> As we can see here, although Zeta and Eta are both dispersion-based measures, they have a different mathematical definition of dispersion. As Eta takes into account the ratio of document size and corpus size, which Zeta doesn't, we intend to test whether or not Eta performs better in detecting distinctive words than Zeta.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Tests and results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Corpus</head><p>The corpus used in this study is a collection of 160 novels published in France between 1980 and 1989. 120 of them are lowbrow novels of three subgenres (40 novels for each subgenre): sentimental novels, crime fiction and science fiction. The remaining 40 are highbrow novels.</p><p>The corpus size is approximately nine million words. All texts have been lemmatized using Treetagger and the units of calculation are lemmas. As our goal was to extract distinctive lemmas for each subgenre, we used a one-vs-rest strategy: the target corpus contains 40 novels of one subgenre and the comparison corpus contains 120 novels of the other three subgenres. This allowed us to focus on extracting distinctive features that are strongly related to the unique characteristics of the target corpus.<ref type="foot" target="#foot_5">6</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Statistical observations</head><p>The results of our comparative analysis are two lists of words which are ranked by their Zeta or Eta scores, respectively. To compare the differences of Zeta and Eta, we measure the ranking correlation between the two word lists using Spearman's rank correlation. The stronger the correlation, the less different these two word lists are. We performed tests on four comparison groups: sci-fi vs. non-sci-fi, etc. for each genre. The results of these four tests were almost the same. For illustration, the results presented below are based on the comparison of sci-fi vs. non-sci-fi.</p><p>As it is common to split novels into segments when applying Zeta, we also wanted to examine the impact of the segment size on the results. So we did our tests using three segmentation strategies: split all novels into (1) 5000-word segments, (2) 10000-word segments and (3) take each novel as a segment without chunking. (The median length of the novels is about 46800 words.) For ( <ref type="formula">1</ref>) and ( <ref type="formula" target="#formula_0">2</ref>), segments shorter than 5000 or 10000 were removed from the corpus.</p><p>Before comparing Zeta and Eta, we first compared the underlying values: the docP and the devP. Again, Spearman's correlation between the word rankings based on these two dispersion measures was analyzed. In both corpora, the ranking correlations of the three tests with different segment length are -1, -1, and -0.98, respectively. Figure <ref type="figure" target="#fig_1">2</ref> illustrates the relation between docP and devP for all words in the target corpus. <ref type="foot" target="#foot_6">7</ref> Each blue point represents a word and the three graphs from left to right show the results of the tests on 5000-word segments, 10000-word segments and novel segments without chunking, respectively. Clearly, devP and docP have a strong negative correlation, but the distribution of points in the three graphs from left to right becomes increasingly dispersed. This means that the longer the novel segments are, the less similar the word list rankings between devP and docP are.</p><p>The comparison of Zeta and Eta leads to identical results. The strong negative correlations between the word rankings in the three tests are -0.99, -0.99, and -0.85, respectively. Each blue point in Figure <ref type="figure" target="#fig_2">3</ref> represents a word and the x and y axes are the Zeta and Eta scores for each word. The three graphs from left to right show the results of tests on 5000-word segments, 10000-word segments and entire novels, respectively. We can observe that the distribution of points gradually becomes more dispersed. This means that the longer the novel segments are, the less similar the Zeta and Eta scores are.</p><p>Comparing the top distinctive words found by Zeta and Eta for each subgenre, we can often observe the same words, but in a different order. To quantify these differences, we calculated the token based Jaccard similarity and NLTK's edit distance between the top ten to 500 Zeta  and Eta words for different segment lengths. <ref type="foot" target="#foot_7">8</ref> In Figure <ref type="figure" target="#fig_3">4</ref>, the first and the second row are the Jaccard similarity results and the NLTK's edit distance results, respectively. The four columns are the results of each of the four subgenres (from left to right: highbrow, crime, sci-fi and sentimental) taken as a target corpus. The results of both Jaccard similarity and NLTK's edit distance show an increasing trend. The increase of the Jaccard similarity indicates that, as the number of top words increases, the overlap of the Zeta and Eta word lists increases gradually. Splitting novels into shorter segments leads to a greater overlap. In contrast to this result, the increase of the NLTK's edit distance shows that the words are ranked more differently with the increase of the number of top words. These observations also prove our earlier point: the shorter the segments, the more words have the same or similar rank in both lists.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Interpretation of the word lists</head><p>Figure <ref type="figure" target="#fig_4">5</ref> shows the top ten distinctive Zeta and Eta words of the science fiction corpus split into 5000-word segments. Both word lists contain the same genre-specific words with a slightly different ranking.</p><p>To better illustrate the results of the different tests, we assigned the words to semantic categories. Figure <ref type="figure" target="#fig_5">6</ref> shows the (heuristic) categorization of the words of the first test.</p><p>Figure <ref type="figure" target="#fig_6">7</ref> shows the results of the analysis with 10000-word segments: there are only five  overlapping words in the top 10 words. The top 30 Zeta words, however, contain more of the highly ranked Eta words than vice versa.</p><p>If we compare the two Zeta word lists in Figures <ref type="figure" target="#fig_6">5 and 7</ref>, we notice that the Zeta words do not change much with the increased segment length: There are three new words in the top ten list, "level", "base" and "hundred", whereas the words "human", "brain", "planet", "universe", "number", "system" and "emit" can already be found in the first Zeta word list, which indicates a certain consistency. The Eta word list in turn displays more new distinctive words ("civilisation", "level", "complex", "hundred", "computer", "function", "electronic"). However, the words of both lists can be assigned to the previously defined semantic categories (Figure <ref type="figure" target="#fig_7">8</ref>).</p><p>Figure <ref type="figure" target="#fig_8">9</ref> shows the word lists of our third analysis, where a whole novel represents a segment.  It is noticeable that there is no intersection between the words of both lists; only two of the top ten words of each list can be found in the other, namely under the top 25 (Eta rank 14: "concept"; Eta rank 23: "nuclear" / Zeta rank 19: "chemical"; Zeta rank 14: "functioning"). While the Zeta list contains words like "humanity", "civilization", "space", "orbit", "earthly", "computer", "electronic" and "robot", which seem to fit into the previously established semantic categories and represent more general terms from everyday language, the Eta words like "diameter" or "vertebral" are more specific and sophisticated and open up further semantic categories from the fields of science (Figure <ref type="figure" target="#fig_10">10</ref>). This tendency of extracting more new specific words by Eta becomes even stronger when the segment length increases up to novel length, while the Zeta words stay more general. As Eta words seem more specific, our assumption is that they should be less frequent than the Zeta words in a much larger corpus. To verify this, we checked the frequency of the top Zeta and Eta words in the French Wikipedia. <ref type="foot" target="#foot_8">9</ref>    shows that the top (10, 50 and 100) Zeta words are indeed more frequent and therefore less specific than the Eta words. This effect is stronger, the longer the segment length is.</p><p>html. If a word doesn't exist in the frequency table, the frequency is set to 0.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion and future work</head><p>This paper presents a comparison of two measures of distinctiveness, Zeta and Eta. The results show that on the statistical level, both of them have a very strong negative correlation, despite their different basis for calculation. Another observation is that the correlation between Zeta and Eta is stronger when novels are divided into shorter segments. We obtain the weakest correlation when novels are not split into segments at all. This correlation is also reflected in the word lists: the shorter the segments, the more similar the word lists and vice versa. The calculation of the Jaccard similarity allowed us to observe the following trend: The Jaccard similarity decreases, when the segment length increases.</p><p>The observed similarities concern word rankings as well: We observe not only (almost) the same words in the top ten ranking when calculating with small segments, but the wordrankings are also almost the same in both word lists. The calculation of the NLTK's edit distance between word lists verified our observation: The distance between the word-rankings increases when the segment length increases.</p><p>A qualitative interpretation of the word lists confirmed the statistical observations. Both measures are able to identify relevant interpretable distinctive words in a target corpus. There is no need to use stop words or to prune proper names: Both dispersion-based measures mark content words as distinctive. It seems that when the segment length increases, the Zeta words remain content-related and more general, while the Eta words also remain content-related, but become more specific. We are going to investigate this phenomenon in further tests.</p><p>In the future, we plan to deepen our understanding of distinctiveness measures even further. Our next steps are to test the measures on larger and more varied corpora and make more experiments with segment length. We are also planning to include other distinctiveness measures in our framework, such as Kullback-Leibler Divergence, Wilcoxon signed-rank test or T-test. One point to emphasize is that the qualitative interpretation of the word lists may seem very subjective and it looks more like an exploration than an evaluation. This is inevitable, because as far as we know, a widely accepted robust method for a qualitative evaluation in this area is still lacking. Therefore, we will work on developing new evaluation strategies for these measures, in order to explore the advantages and disadvantages of each of these measures and to find out for which purpose they should be used.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Log-likelihood ratio test with AntConc.</figDesc><graphic coords="3,74.40,70.13,446.48,519.26" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Scatter plot of docP and devP of words in the target corpus.</figDesc><graphic coords="6,74.40,70.13,446.48,163.21" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Scatter plot of Zeta and Eta.</figDesc><graphic coords="6,74.40,272.67,446.48,161.92" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Jaccard similarity (top row) and NLTK's edit distance (bottom row) between the top 10 to 500 Zeta-and Eta-words, for three segment lengths.</figDesc><graphic coords="7,74.40,70.13,446.48,205.14" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Top ten Zeta (left) and Eta (right) words of a 5000-word segment analysis.</figDesc><graphic coords="7,155.90,320.76,283.48,235.66" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: A heuristic categorization of the top ten words of the 5000-word segments analysis.</figDesc><graphic coords="8,127.55,70.14,340.18,169.08" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Top Ten Zeta and Eta words of a 10000-words segment analysis.</figDesc><graphic coords="8,155.90,278.56,283.48,221.39" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: A heuristic categorization of the top ten words of the 10000-word segments analysis (the words in yellow are new compared to the 5000-word segment analysis).</figDesc><graphic coords="9,74.40,70.14,446.48,133.25" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 9 :</head><label>9</label><figDesc>Figure 9: Top ten Zeta and Eta words of the novel as a segment analysis.</figDesc><graphic coords="9,155.90,250.23,283.48,218.76" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head>Figure 11</head><label>11</label><figDesc></figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head>Figure 10 :</head><label>10</label><figDesc>Figure 10: A heuristic categorization of the top ten words of the novel as a segment analysis (the categories in yellow are the 'new' ones, established for the third analysis).</figDesc><graphic coords="10,74.40,70.14,446.48,170.73" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_11"><head>Figure 11 :</head><label>11</label><figDesc>Figure 11: Word frequency of top Zeta and Eta words in French Wikipedia.</figDesc><graphic coords="10,74.40,287.69,446.48,133.12" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">See: https://zeta-project.eu/en/.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">We have implemented both measures in our Python framework. See:https://github.com/ Zeta-and-Company/pydistinto.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">AntConc 3.5.9 [see 1] was used with the following keyness parameters: Log-Likelihood (4-way) and a p-value cut-off of 0.001. The measure of effect size shown is DIFF.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">We use devP instead of DP to better distinguish between the two terms.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">Only words which appear at least once in both corpora will be considered here and in the following, because devP does not yield meaningful results otherwise.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">The texts contained in the corpus are in-copyright texts that we are using in the framework of the "Text and Data Mining Exception" defined in German copyright law ( §60d Urhg), following the EU "Directive on Copyright in the Digital Single Market". While the corpus cannot be shared as it is, we plan to publish derived features [see 22] that allow others to repeat our calculations.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">The scatter plot of docP and devP of words in the comparison corpus is almost the same as that in the target corpus, so it will not be displayed again.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">The Jaccard similarity [see 16] calculates the size of the intersection divided by the size of the union of two word lists without considering the ranking of words. Larger values indicate a greater overlap between the top Zeta and Eta words. In contrast to the Jaccard similarity, the NLTK's edit distance (https://www.nltk.org/api/ nltk.metrics.html#nltk.metrics.distance.edit_distance, see Levenshtein edit-distance,<ref type="bibr" target="#b13">[14]</ref>) takes the ranking of words into consideration and counts the number of words that need to be substituted, inserted, or deleted, to transform one list into another. Larger values indicate a greater difference between the Zeta and Eta word lists.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8">The frequency of words in Wikipedia are obtained from http://redac.univ-tlse2.fr/corpora/wikipedia_en.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_9">See https://casrai.org/credit.</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Author contributions</head><p>All authors contributed to the conceptualization of the research, investigation, formal analysis, writing the original draft and editing and reviewing the text. Specific additional contributions: KD contributed to project administration, software development, visualisation and methodology. JD contributed to data curation and software development. CR contributed validation. CS contributed to data curation, software development, funding acquisition and supervision. Author order is alphabetical. All authors gave final approval for publication and agree to be held accountable for the work performed therein. 10  </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">AntConc: Design and development of a freeware corpus analysis toolkit for the technical writing classroom</title>
		<author>
			<persName><forename type="first">L</forename><surname>Anthony</surname></persName>
		</author>
		<idno type="DOI">10.1109/ipcc.2005.1494244</idno>
		<imprint>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="729" to="737" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">All the Way Through: Testing for Authorship in Different Frequency Strata</title>
		<author>
			<persName><forename type="first">J</forename><surname>Burrows</surname></persName>
		</author>
		<idno type="DOI">10.1093/llc/fqi067</idno>
		<ptr target="http://llc.oxfordjournals.org/content/22/1/27.abstract" />
	</analytic>
	<monogr>
		<title level="j">Literary and Linguistic Computing</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="27" to="47" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m">Shakespeare, Computers, and the Mystery of Authorship</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Craig</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">F</forename><surname>Kinney</surname></persName>
		</editor>
		<imprint>
			<publisher>Cambridge University Press</publisher>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
	<note>1st ed</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Incorporating text dispersion into keyword analyses</title>
		<author>
			<persName><forename type="first">J</forename><surname>Egbert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Biber</surname></persName>
		</author>
		<idno type="DOI">10.3366/cor.2019.0162</idno>
		<ptr target="https://www.euppublishing.com/doi/abs/10.3366/cor.2019.0162" />
	</analytic>
	<monogr>
		<title level="j">Corpora</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="77" to="104" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Exploring and Visualizing Variation in Language Resources</title>
		<author>
			<persName><forename type="first">P</forename><surname>Fankhauser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Knappen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Teich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC&apos;14</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Calzolari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Choukri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Declerck</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Loftsson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Maegaard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Mariani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Moreno</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Odijk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Piperidis</surname></persName>
		</editor>
		<meeting>the Ninth International Conference on Language Resources and Evaluation (LREC&apos;14<address><addrLine>Reykjavik, Iceland</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Keyness Analysis: nature, metrics and techniques</title>
		<author>
			<persName><forename type="first">C</forename><surname>Gabrielatos</surname></persName>
		</author>
		<ptr target="https://research.edgehill.ac.uk/en/publications/keyness-analysis-nature-metrics-and-techniques-2" />
	</analytic>
	<monogr>
		<title level="m">Corpus Approaches to Discourse: A Critical Review</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="225" to="258" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A new approach to (key) keywords analysis: Using frequency, and now also dispersion</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gries</surname></persName>
		</author>
		<idno type="DOI">10.32714/ricl.09.02.02</idno>
	</analytic>
	<monogr>
		<title level="j">Research in Corpus Linguistics</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="1" to="33" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Dispersions and adjusted frequencies in corpora</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>Gries</surname></persName>
		</author>
		<idno type="DOI">10.1075/ijcl.13.4.02gri</idno>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Teasing out Authorship and Style with t-tests and Zeta</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Hoover</surname></persName>
		</author>
		<ptr target="http://dh2010.cch.kcl.ac.uk/academic-programme/abstracts/papers/html/ab-658.html" />
	</analytic>
	<monogr>
		<title level="m">Digital Humanities Conference</title>
				<meeting><address><addrLine>London</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Comparing word frequencies across corpora: Why chi-square doesn&apos;t work, and an improved LOB-Brown comparison</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kilgarriff</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ALLC-ACH Conference</title>
				<imprint>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="169" to="172" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Vergleich als Methode? Zur Empirisierung eines philologischen Verfahrens im Zeitalter der Digital Humanities</title>
		<author>
			<persName><forename type="first">S</forename><surname>Klimek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Müller</surname></persName>
		</author>
		<ptr target="http://www.jltonline.de/index.php/articles/article/view/758" />
	</analytic>
	<monogr>
		<title level="m">JLT Articles 9</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page">1</biblScope>
		</imprint>
	</monogr>
	<note>Abstract</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Significance testing of word frequencies in corpora</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lijffijt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Nevalainen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Säily</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Papapetrou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Puolamäki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mannila</surname></persName>
		</author>
		<idno type="DOI">10.1093/llc/fqu064</idno>
		<ptr target="http://dsh.oxfordjournals.org/lookup/doi/10.1093/llc/fqu064" />
	</analytic>
	<monogr>
		<title level="j">Digital Scholarship in the Humanities</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="374" to="397" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Dispersion&quot;. In: The Vocabulary of French Business Correspondence: Word Frequencies, Collocations and Problems of Lexicometric Method</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Lyne</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1985">1985</date>
			<publisher>Slatkine</publisher>
			<biblScope unit="page" from="101" to="124" />
			<pubPlace>Paris</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">A guided tour to approximate string matching</title>
		<author>
			<persName><forename type="first">G</forename><surname>Navarro</surname></persName>
		</author>
		<idno type="DOI">10.1145/375360.375365</idno>
		<ptr target="https://dl.acm.org/doi/10.1145/375360.375365" />
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="31" to="88" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Gender differences in language use: An analysis of 14,000 text samples</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Newman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Groom</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">D</forename><surname>Handelman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Pennebaker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Discourse Processes</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="211" to="236" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Using of Jaccard coefficient for keywords similarity</title>
		<author>
			<persName><forename type="first">S</forename><surname>Niwattanakul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Singthongchai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Naenudorn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wanapu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the international multiconference of engineers and computer scientists</title>
				<meeting>the international multiconference of engineers and computer scientists</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="380" to="384" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Use of the Chi-Squared Test to Examine Vocabulary Differences in English Language Corpora Representing Seven Different Countries</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">P</forename><surname>Oakes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Farrow</surname></persName>
		</author>
		<idno type="DOI">10.1093/llc/fql044</idno>
		<ptr target="https://academic.oup.com/dsh/article/22/1/85/1025876" />
	</analytic>
	<monogr>
		<title level="j">Literary and Linguistic Computing</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="85" to="99" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction</title>
		<author>
			<persName><forename type="first">M</forename><surname>Paquot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bestgen</surname></persName>
		</author>
		<idno type="DOI">10.1163/9789042029101\_014</idno>
		<ptr target="https://brill.com/view/book/edcoll/9789042029101/B9789042029101-s014.xml" />
	</analytic>
	<monogr>
		<title level="m">Corpora: Pragmatics and Discourse</title>
				<editor>
			<persName><forename type="first">A</forename><forename type="middle">H</forename><surname>Jucker</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Schreier</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Hundt</surname></persName>
		</editor>
		<meeting><address><addrLine>Rodopi</addrLine></address></meeting>
		<imprint>
			<publisher>Brill</publisher>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis</title>
		<author>
			<persName><forename type="first">P</forename><surname>Pojanapunya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">W</forename><surname>Todd</surname></persName>
		</author>
		<idno type="DOI">10.1515/cllt-2015-0030</idno>
		<ptr target="https://www.degruyter.com/view/journals/cllt/14/1/article-p133.xml" />
	</analytic>
	<monogr>
		<title level="j">Corpus Linguistics and Linguistic Theory</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="133" to="167" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Social differentiation in the use of English vocabulary: some analyses of the conversational component of the British National Corpus</title>
		<author>
			<persName><forename type="first">P</forename><surname>Rayson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">N</forename><surname>Leech</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hodges</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Corpus Linguistics</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="133" to="152" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Zeta für die kontrastive Analyse literarischer Texte. Theorie, Implementierung, Fallstudie</title>
		<author>
			<persName><forename type="first">C</forename><surname>Schöch</surname></persName>
		</author>
		<ptr target="https://www.degruyter.com/view/books/9783110523300/9783110523300-004/9783110523300-004.xml" />
	</analytic>
	<monogr>
		<title level="m">Quantitative Ansätze in den Literatur-und Geisteswissenschaften. Systematische und historische Perspektiven</title>
				<editor>
			<persName><forename type="first">T</forename><surname>Bernhart</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Richter</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Lepper</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Willand</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Albrecht</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin</addrLine></address></meeting>
		<imprint>
			<publisher>de Gruyter</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="77" to="94" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Abgeleitete Textformate: Text und Data Mining mit urheberrechtlich geschützten Textbeständen</title>
		<author>
			<persName><forename type="first">C</forename><surname>Schöch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Döhl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rettinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Gius</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Trilcke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Leinen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Jannidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hinzmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Röpke</surname></persName>
		</author>
		<idno type="DOI">10.17175/2020\_006.url</idno>
		<idno>doi:</idno>
		<ptr target="http://www.zfdg.de/2020%5C%5F006" />
	</analytic>
	<monogr>
		<title level="j">Zeitschrift für digitale Geisteswissenschaften (ZfdG)</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Burrows&apos; Zeta: Exploring and Evaluating Variants and Parameters</title>
		<author>
			<persName><forename type="first">C</forename><surname>Schöch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schlör</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zehe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gebhard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Becker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hotho</surname></persName>
		</author>
		<ptr target="https://dh2018.adho.org/burrows-zeta-exploring-and-evaluating-variants-and-parameters/" />
	</analytic>
	<monogr>
		<title level="m">Book of Abstracts of the Digital Humanities Conference</title>
				<meeting><address><addrLine>Mexico City</addrLine></address></meeting>
		<imprint>
			<publisher>Adho</publisher>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">From Keyness to Distinctiveness -Triangulation and Evaluation in Computational Literary Studies</title>
		<author>
			<persName><forename type="first">J</forename><surname>Schröter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dudar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Rok</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Schöch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Literary Theory</title>
		<imprint>
			<publisher>JLT</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">PC Analysis of Key Words and Key Key Words</title>
		<author>
			<persName><forename type="first">M</forename><surname>Scott</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">System</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="233" to="245" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
