<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Locating the Leading Edge of Cultural Change</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Sarah</forename><surname>Griebel</surname></persName>
							<email>sarahg8@illinois.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Illinois</orgName>
								<address>
									<settlement>Urbana-Champaign</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Becca</forename><surname>Cohen</surname></persName>
							<email>rscohen2@illinois.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Illinois</orgName>
								<address>
									<settlement>Urbana-Champaign</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lucian</forename><surname>Li</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Illinois</orgName>
								<address>
									<settlement>Urbana-Champaign</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jaihyun</forename><surname>Park</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Nanyang Technological University</orgName>
								<address>
									<country key="SG">Singapore</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jiayu</forename><surname>Liu</surname></persName>
							<email>jiayu13@illinois.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Illinois</orgName>
								<address>
									<settlement>Urbana-Champaign</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jana</forename><surname>Perkins</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Illinois</orgName>
								<address>
									<settlement>Urbana-Champaign</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ted</forename><surname>Underwood</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Illinois</orgName>
								<address>
									<settlement>Urbana-Champaign</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Locating the Leading Edge of Cultural Change</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">29321957FA4828E81DAD20F7DB34C9C9</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>cultural change</term>
					<term>document embeddings</term>
					<term>topic modeling</term>
					<term>fiction</term>
					<term>bibliometrics</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Measures of textual similarity and divergence are increasingly used to study cultural change. But which measures align, in practice, with social evidence about change? We apply three different representations of text (topic models, document embeddings, and word-level perplexity) to three different corpora (literary studies, economics, and fiction). In every case, works by highly-cited authors and younger authors are textually ahead of the curve. We don't find clear evidence that one representation of text is to be preferred over the others. But alignment with social evidence is strongest when texts are represented through the top quartile of passages, suggesting that a text's impact may depend more on its most forward-looking moments than on sustaining a high level of innovation throughout.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>A growing body of scholarship seeks to understand cultural change by measuring the way individual texts precede or lag corpus-level trends.</p><p>Different disciplines have framed this problem differently. Fields like bibliometrics measure novelty by comparing an article to past precedent, and ask how well novelty predicts impact as measured by citations <ref type="bibr" target="#b22">[22]</ref>. By contrast, some computational social scientists are less interested in divergence from the past than in anticipation of the future. In <ref type="bibr" target="#b20">Vicinanza et al. 2022</ref>, for instance, a text's "prescience, " or anticipation of future change, is used to identify social locations where new patterns tend to emerge <ref type="bibr" target="#b20">[20]</ref>. It is also possible to combine both approaches, and study a text's relationship to past and future at once. <ref type="bibr" target="#b1">Barron et al. 2018</ref> measures a text's divergence from the past ("novelty") and subtracts divergence from the future ("transience")producing a measure of durable innovation they call "resonance" <ref type="bibr" target="#b1">[1]</ref>.</p><p>Models of textual change have also relied on radically different representations of text, ranging from lexical topic models in <ref type="bibr" target="#b1">[1]</ref> to a deep-learning model of sentences in <ref type="bibr" target="#b20">[20]</ref>. Plausible a priori arguments can be made for all of these methods. In this paper we will try to provide empirical evidence about best practices.</p><p>To empirically assess methods of measuring textual change, of course, we need some kind of ground truth about a text's divergence from the past (or similarity to the future). This is not a topic where absolute ground truth is available. In fact, researchers measure innovation textually because they have reason to suspect that social evidence will be unreliable here. So instead of relying on a single unimpeachable source of social evidence, we may have to combine several.</p><p>For instance, bibliometricians have repeatedly confirmed that innovation does correlate with publicity <ref type="bibr" target="#b22">[22,</ref><ref type="bibr" target="#b3">3]</ref>. Works that introduce new language, or cite new combinations of sources, tend to attract more attention and receive more citations themselves. So we could use citation frequency as one signal that a text was on the leading edge of change.</p><p>But we also have reason to suspect that using publicity as a measure of innovation will overrate already-prominent writers, who tend to receive more attention through the "Matthew effect" <ref type="bibr" target="#b13">[13]</ref>. Young writers are cited less frequently. And yet many ideas and locutions must emerge first in young writers, since cohort succession is a major driver of cultural change <ref type="bibr" target="#b12">[12,</ref><ref type="bibr" target="#b14">14,</ref><ref type="bibr" target="#b19">19]</ref>.</p><p>The tension between these two forms of social evidence gives us leverage on the problem. If we can find a measure of a text's relation to change that aligns well with youth but also with citation and prominence, we will have validated our measure against two independent variables, suggesting that it describes "the leading edge of cultural change" in a relatively broad and robust sense.</p><p>The documents we consider include journal articles drawn from literary studies and economics, as well as a collection of English-language fiction ranging from 1890 to 2000. In both cases, we have all or most of the documents in full text, so we can compare Transformer-based models to older strategies of lexical modeling.</p><p>Our experiment supports several inferences about best practices for measuring change. For instance, is a text's relation to the past or the future more informative? When do Transformerbased models outperform lexical ones? Should texts always be considered as wholes, or might it be more meaningful to represent them through their most innovative parts?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Data</head><p>We modeled the impact of textual innovation using three datasets. Two datasets contained academic journal articles from the fields of literary studies and economics-fields selected because we expect their rhetorical and citation practices to diverge significantly. One contained English-language fiction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Academic journals</head><p>Journals were selected for longevity and influence in the field. Journals with longer lifespans were prioritized, as this would ensure relative stability across the corpus.</p><p>The literary studies dataset contains a corpus of 40,407 full text academic articles from seven journals. The economics dataset contains 43,081 articles from eight journals. Texts were obtained through JSTOR <ref type="bibr" target="#b4">[4]</ref>. Consult Appendix E for a full list of journal titles. Citation counts were gathered from Semantic Scholar <ref type="bibr" target="#b8">[8]</ref>. Authors' years of birth were inferred through a mixture of manual checking and matching to VIAF, which gave us age at publication for 2,646 articles in literary studies (see Appendix D for our methods of inference).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Fiction</head><p>We gathered 8,918 works of English-language fiction distributed approximately evenly across time from 1890 through 2000. The first and last 10% of each book was discarded to avoid mixing fiction with introductions, advertisements, and other nonfiction paratext. Of our 8,918 books, only 7,304 are in full text; since we only produced embeddings of these books, the embedding method had a slight disadvantage on the fiction corpus <ref type="bibr" target="#b7">[7]</ref>.</p><p>We drew information about authors' years of birth from Underwood et al. 2022, which gave us author ages for 3,272 works in the period we were analyzing <ref type="bibr" target="#b19">[19]</ref>.</p><p>We also created a subset of "critically discussed" works by finding the titles and authors of our fiction corpus in our literary studies corpus. This group of 463 books was compared to a contrast set with the same distribution across time, but never mentioned in that corpus.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methods</head><p>We measured both divergence from the past (which following <ref type="bibr" target="#b1">Barron et al. 2018</ref> we call "novelty") and divergence from future documents ("transience"). But most of the results below are based on the composite quantity they call "resonance" (novelty minus transience). To avoid any suggestion of causality we call this quantity "precocity. " A text with high precocity simply "looks later than" peers published in the same year. We calculate these quantities using three different representations of texts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Topic models</head><p>We topic modeled our corpora using the implementation of LDA in MALLET, and divided documents into chunks of at least 512 tokens <ref type="bibr" target="#b11">[11,</ref><ref type="bibr" target="#b2">2]</ref>. For more detail, see Appendix A. We compared documents by measuring Kullback-Leibler divergence on topic distributions, following Barron et al. 2018 <ref type="bibr" target="#b1">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Document embeddings</head><p>Topic models are limited to lexical evidence. It seemed plausible that neural document embeddings, compared via cosine distance, might capture a richer representation of text. We experimented with several different embedding strategies. Off-the-shelf embeddings performed very poorly, even if they were at the top of the leaderboard for contemporary applications. Finetuning using the sentence Transformers library was necessary to produce embeddings more suited to the specialized subject matter and temporal range (1890-2017) of this experiment <ref type="bibr" target="#b16">[16]</ref>. See Appendix B for details of our fine-tuning strategy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Perplexity</head><p>Vicinanza et al. measure a quantity they call "prescience, " which is calculated by comparing the mean perplexity of a document's sentences in two models-one trained via masked language modeling on its own period, and one trained on a future period. Sentences that have lower perplexity in the future (which become more probable in the future) will have high prescience. In bibliometrics, a loosely similar method has been used to compute novelty <ref type="bibr" target="#b20">[20,</ref><ref type="bibr" target="#b21">21,</ref><ref type="bibr" target="#b17">17]</ref>.</p><p>We tested Vicinanza's definition of prescience on our corpora, but found that we got much greater predictive power by using an expanded version of the method that included both past and future. Instead of subtracting future perplexity from a document's perplexity at time of publication, we subtract it from perplexity calculated in the past. precocity = 2 ⋅ perplexity past − perplexity future perplexity past + perplexity future <ref type="bibr" target="#b1">(1)</ref> This measures not just anticipation of a specific future period, but a quality of being "ahead of the curve, " where the curve is inferred from the whole time window around publication of a text. For further details see Appendix F.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Details of precocity calculation</head><p>Documents were divided into chunks for all three of these strategies, and chunks were characterized individually. For the first two methods this meant that each chunk was compared to all the other document chunks in the preceding (and following) 20 years. Perplexity relied on models that characterize a 12-year period, so direct chunk-to-chunk comparisons were not required. The full span from the "past" model to the "future" model was 36 years, rendering the scale of the perplexity calculation comparable to the 41-year span of the other two methods.</p><p>It is certainly possible to characterize a document through the mean precocity of its chunks. But an argument can also be made that what matters, socially, is often not the average tenor of a document, but its most surprising and forward-looking moment. For this reason we also tested an alternate strategy that characterized documents by selecting the top 25% of their chunks with highest precocity, and taking the mean of those values.</p><p>An alert reader will anticipate that questions of circularity might emerge when texts quote each other or were written by the same author. See Appendix C for our solution to these problems. In practice these effects were very small; excluding or leaving in texts that quote each other made almost no difference.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Regression strategy</head><p>We assess the explanatory power of precocity through a multiple linear regression that includes terms for precocity, precocity squared, and novelty (which gives the regression leverage to separate the components of precocity that refer to the past or to the future). Date of publication is also present as a control variable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>We'll begin with a quick overview of the variance explained when six different methods of text analysis are applied to predicting five social variables. As we predicted, textual innovation is associated both with prominence and with youth (even though a text's prominence is anti-correlated with youth in this data). The best-performing methods were able to explain 7-9% of the variance in prominence (e.g., citation count) simply by identifying works that were (textually) ahead of the curve-more like the future than the past.</p><p>It is difÏcult to say whether explaining 7-9% of social variance is good performance, because we don't know how much of a work's prominence is really determined by innovation-and how much by factors like institutional prestige. Some research suggests that the answer varies from one discipline to another <ref type="bibr" target="#b22">[22]</ref>. It nevertheless seems reasonable to take social variance explained as a heuristic to choose between methods-for while we don't know the real effect size, it is unclear how significant effects larger than the real one would be produced.</p><p>So what did we learn about measuring precocity? The clearest lesson is that the signal tended to be strongest when we measured documents "at their most forward-looking, " by averaging the 25% of passages with the highest precocity scores. In all of the tests we ran, this method aligned better with social evidence than a method that averaged all passages. One might infer that citations-and more surprisingly, critical references to fiction-are often motivated by innovations expressed in a relatively small part of a text.</p><p>Second, on examining regression coefÏcients, transience (difference from the future) provides slightly stronger evidence of failure than novelty (difference from the past) provides evidence of success. The difference between these variables was not huge, however, and there was nothing to be gained by discarding information about the past. The original perplexity method in Vicinanza et al. 2022, which only included information about the future, achieved r 2 less than half as large as the improved method we describe in the table above. Comparing texts only to the past, or only to the future, would admittedly make it easier to use causal language. Precocity, which characterizes a text in relation to a whole time window around its publication, is hard to interpret causally. But if causal explanation is not being claimed, there is no reason not to use both time arrows at once. All three representations of text (topic models, embeddings, and perplexity) performed well in some cases. Topic models seemed to predict prominence well, while embeddings performed well on age-but we don't have enough data points to generalize. If any conclusion can be drawn here, it might be "a dog that doesn't bark. " We found no evidence that neural models of text systematically outperformed lexical models. On the contrary, lexical topic models displayed consistently strong performance across tasks and corpora.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion</head><p>We found clear alignment between textual measures of precocity and two independent kinds of social evidence that we expected to align with change (prominence and authorial youth). There is no absolute ground truth in this domain, but statistically significant relationships across three corpora do increase our confidence that text analysis can locate a leading edge of cultural change.</p><p>We also consistently found a better fit with social evidence when we represented documents through the 25% of passages with highest precocity. It seems likely that significant innovations are often concentrated in a small portion of an article or work of fiction.</p><p>However, we also found that precocity can be measured in different ways, which do not always agree with each other. Since change is taking place in a space that has multiple dimensions, the same text can be at the leading edge on one axis and lagging behind on another. Measures of similarity based on a topic model seemed to excel at predicting citations and public reputation. Transformer-based embeddings may be better at catching tacit signals of generational style.</p><p>Since topic modeling is an older representation of text, its strong performance overall may require discussion. We don't have a confident answer here, but for what it's worth, topic models are explicitly designed to factor a corpus into latent variables. Document embeddings, by contrast, don't have any representational goal at the corpus level. The embeddings we used are tuned contrastively, using the Sentence Transformers library <ref type="bibr" target="#b16">[16]</ref>. But that process is not guaranteed to model the corpus in a principled way-which might be a disadvantage in an experiment that seeks to measure a document's relation to corpus-level trends.</p><p>At least for now, researchers wrestling with questions about textual change are well advised to evaluate the performance of a principled lexical model as a baseline and confirm that embeddings do actually improve on it before relying on embeddings alone. It is not safe to assume that a model will perform better simply because it captures information about word order.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Public data and code</head><p>Data and code for this project are available on GitHub: https://github.com/IllinoisLiteraryLab /novelty/tree/main, and will also be archived on Zenodo.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Contribution statement</head><p>Authors are listed alphabetically here.</p><p>The tuning method we ultimately adopted relies on multiple negatives ranking loss, as implemented in Sentence Transformers <ref type="bibr" target="#b6">[6,</ref><ref type="bibr" target="#b16">16]</ref>. That is, the training dataset includes only positive pairs of similar passages; negative pairs are created implicitly by misaligning the passages in a batch. We created positive pairs mostly by selecting adjacent passages from the same article (or work of fiction). But we adopted several tricks to prevent the model from learning a model of similarity defined purely by vocabulary overlap. First, we used GPT-3.5 to paraphrase and condense one element of some pairs <ref type="bibr" target="#b15">[15]</ref>. Paraphrasing up to 18% of pairs seemed to improve results. Second, in training embeddings for fiction, we replaced personal names in one element of each pair-preserving first and last names, and gender signals, as much as possible. Both of these changes made the learning task more difÏcult and improved alignment with social evidence. We used these datasets to fine-tune RoBERTa <ref type="bibr" target="#b10">[10]</ref>.</p><p>We also explored several alternate approaches that aren't represented in the final paper. For the task of predicting citations, we experimented with embeddings that were trained specifically to identify the kind of similarity between articles that produces citation. Here, positives were sentences from articles related by citation, and negatives were pairs of sentences identified by off-the-shelf embedding methods as sharing intellectual influence, despite no documented citation existing between the two articles. Our hypothesis was that these pairs represent spurious or coincidental similarities in language not necessarily associated with the type of intellectual influence we are trying to measure. We took these pairs and fine-tuned the GTE model, through Cosine Entropy Loss, assigning high similarity to correctly identified citation pairs and low similarity to false identified pairs <ref type="bibr" target="#b9">[9]</ref>.</p><p>Since we were concerned that embeddings might perform less well on long passages than on individual sentences, we also tested a strategy where we generated embeddings on single sentences, then clustered them, and took the cluster centroids as synthetic "document embeddings. " This did not improve performance.</p><p>An alternate approach we have not yet checked would be to train embeddings entirely from scratch on these corpora. Some recent studies suggest that even older methods of doing that, like doc2vec, can outperform topic models on clustering tasks <ref type="bibr" target="#b18">[18]</ref>.</p><p>We embedded passages of up to 512 tokens, with the constraint that we divide passages only at sentence breaks. Note that the chunks used for topic modeling were generally combinations of two or more embedding chunks; this difference of size was permitted in order to emphasize the strengths of both methods, without hindering either one.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Text-reuse detection</head><p>We avoided comparing any papers written by the same author. We also aimed to avoid comparing chunks of text that directly quoted each other, as including these, we estimated, would create a circularity in the precocity calculation for such chunks, directly guaranteeing that it would correlate with citation.</p><p>To avoid this circularity, we looked for both the existence of the cited author's last name or a string of six or more matching words that were in single or double quotations within the citing paper. If either of these are found, the chunk is not used for comparison. It is important to note that the whole paper is not excluded from comparison; only the offending chunk.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Author age determination</head><p>For the fiction corpus we could rely on previously published data to determine authors' years of birth <ref type="bibr" target="#b19">[19]</ref>.</p><p>To create analogous data for literary scholars, we estimated years of birth for a sample of 1,093 authors (and 2,646 articles) through a mixture of manual research and searches on the VIAF API. A model was trained to distinguish true VIAF matches from false ones. We estimate that we achieved overall accuracy of greater than 90%; this estimate is based both on the accuracy of the VIAF model and on manually checking a sample of articles.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. Corpus construction</head><p>The literary studies journals included are: Publications of the Modern Language Association  . Both the economics and literary studies datasets were originally sourced from JS-TOR <ref type="bibr" target="#b4">[4]</ref>. Because we look back and forward 20 years in calculating precocity, we only directly characterize articles in the central period 1920-1996 (or 1997 for economics).</p><p>For the academic articles, citations from Semantic Scholar were collected as external evidence of impact <ref type="bibr" target="#b8">[8]</ref>. In practice, this meant that the articles sourced from JSTOR had to be aligned with articles available in Semantic Scholar. Semantic Scholar, like all bibliographic databases, is incomplete, so all citation counts referred to in this article may underrepresent the total post-publication impact of publications. Note also that we do not limit the time period for citation, so works published earlier have, in principle, more opportunities to be cited. We address this later by controlling for date of publication.</p><p>Our fiction corpus covers the period between 1890-2000. However, because we look back and forward 20 years in characterizing a book's relationship to the past and future, we can only directly characterize books in the central period 1910-1979. The 20-year shoulders on either side of this period are used as comparative touchstones. So our analysis directly describes 5,880 books 1910-1979, of which we had 4,392 in full text. Since we only ran embeddings on works in full text, the smaller size of that corpus does create a slight disadvantage for the embedding method in the case of fiction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>F. Timeline for perplexity calculation</head><p>We calculated perplexity using RoBERTa on chunks of up to 512 tokens (the same ones we used for embedding) <ref type="bibr" target="#b10">[10]</ref>. We divided the timeline into overlapping 12-year periods with a 4-year offset, which ends up meaning that a text published in 1968-1971, for instance, would be compared to a past model trained on 1952-63 and a future model trained on 1976-87. But a text published in 1964-67 would be compared to a past model trained on 1948-59 and a future model trained on 1972-83.</p><p>Our goal in creating 12-year models, but moving them forward 4 years at a time, was to create sufÏciently large corpora for training while ensuring that texts were not greatly (dis)advantaged by their position within a time step.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>G. Domain insights</head><p>Our primary goal in this paper is to validate a method. But it is also easy to see how this method could be used to illuminate substantive research questions about a genre or academic discipline. To give a quick sense of what it might reveal, we've visualized the seven journals that comprise our literary studies corpus, along with a selection of authors who have exceptionally high precocity and/or an exceptionally high number of citations.</p><p>Citation counts are already public. But precocity-vertical position in Figure <ref type="figure" target="#fig_1">1</ref>-is new information. Precocity does tend to correlate with citations, as is visible in the positive slope of the journals. But journals that attract different numbers of citations (like New Literary History and Critical Inquiry) may nevertheless be close to indistinguishable when it comes to precocitywhich suggests they have substantively equal power to predict trends in the discipline. In other cases, journals that aren't distinguished by citation count can be distinguished by precocity. PMLA is the flagship journal of the Modern Language Association, and arguably the highestprestige venue in this group. It attracts almost as many citations as New Literary History or ELH, but its position on the vertical axis suggests that editorial practices have sometimes been more conservative (as perhaps befits the journal of a large professional organization).</p><p>The apparent negative slope of author names is an artifact of the process we used to select exceptional authors, which deliberately highlights names on the periphery. If we plotted all authors, we would get a Gaussian cloud of points with the same slope and center as the journals (but much larger, since authors are associated with fewer articles and thus aren't pulled to the origin as strongly by the law of averages).</p><p>The names of well-known critics, like Fredric Jameson and Gayatri Spivak, tend to be found in the upper right corner, suggesting that they were not only widely cited but prescient (or influential-causality is impossible to determine here). Moving up and to the left we find names that may be less familiar, but that our algorithm suggests were also ahead of the curve. Carl E. W. L. Dahlström is an early-twentieth-century critic whose articles have almost never been cited, although they anticipate subsequent trends.</p><p>On the right side of the graph we find a few widely-cited authors who aren't especially distinguished by precocity. This is not necessarily a negative reflection on their work. For instance, several authors in this region (Richard Rorty, Jacques Derrida, and Ian Hacking) are well-known philosophers who were occasionally invited to publish in literary studies journals. Since they can hardly expect to convert literary scholars into philosophers en masse, these honorific late-career publications won't stand at the beginning of a long tradition of similar work, and therefore won't have high precocity. In short, there can be more than one kind of influence. Precocity measures a text's relation to a specific corpus, and may not capture all the intellectual influences that flow between corpora. It is nevertheless easy to see how this metric could be used to pose questions about editorial practices and career arcs within a discipline.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>H. Preregistration and paths not taken</head><p>Most of the methodological details above were preregistered in Fall 2023 <ref type="bibr" target="#b5">[5]</ref>. But the experimental plan did change in some important ways afterward. In particular, our embedding strategy changed several times, after off-the-shelf GTE embeddings proved not to be competitive. Also, comparison to authorial age wasn't part of our original plan. A critical reader might (correctly) interpret these adjustments to our plan as efforts to find some method or context that would allow Transformer-based methods to outperform a topic model, as we had originally expected. If we had followed our original experimental plan exactly, the result would have been a simple endorsement of topic modeling. Evidence of our struggle to avoid or complicate that conclusion may perhaps make it even more persuasive.</p><p>There is also a question we proposed in the preregistration, and did investigate, but haven't discussed above for reasons of space. Some researchers may wonder whether it really makes sense to compare a text chunk to all the parts of all documents in the preceding and following 20 years. One could argue that mystery novels, for instance, are not really innovating relative to science fiction, but to other mystery novels. One way of taking this into account-which performed well in some previous work-was to compare chunks only to a subset of very similar chunks in the past and future (say the top 5%) <ref type="bibr" target="#b19">[19]</ref>. We also tested that strategy here, but it didn't often improve on other approaches, and so we've deferred discussion to this appendix.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>, English Literary History, The Review of English Studies (1925-2016), Critical Inquiry (1974-2016), Modern Language Review (1905-2016), Modern Philology (1903-2016), and New Literary History (1969-2016). The economics journals included are: The American Economic Review (1911-2017), Econometrica (1933-2017), Journal of Economic Literature (1969-2017), Journal of Political Economy (1900-2017), The Quarterly Journal of Economics (1900-2017), The Review of Economics and Statistics (1919-2017), The Journal of Finance (1946-2017), and The Review of Economic Studies</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure1: Literary studies journals and authors plotted by the average precocity and number of citations associated with their articles. Precocity is determined here by topic modeling. Since both axes are z-scores, the center of the whole corpus would be at 0, 0. We're looking mostly at the upper right quadrant.</figDesc><graphic coords="13,89.28,84.17,416.72,381.71" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>R 2 for different representations of text, on different social variables. 0.25 indicates that documents were represented by the quartile of passages with highest precocity; 1.0, represented by all passages.</figDesc><table><row><cell></cell><cell>Topics,</cell><cell>Embeds,</cell><cell>Perplexity,</cell><cell>Topics,</cell><cell>Embeds,</cell><cell>Perplexity,</cell></row><row><cell></cell><cell>0.25</cell><cell>0.25</cell><cell>0.25</cell><cell>1.0</cell><cell>1.0</cell><cell>1.0</cell></row><row><cell cols="2">Citations, literary scholars .082</cell><cell>.070</cell><cell>.057</cell><cell>.067</cell><cell>.049</cell><cell>.041</cell></row><row><cell>Age, literary scholars</cell><cell>.033</cell><cell>.035</cell><cell>.024</cell><cell>.026</cell><cell>.034</cell><cell>.024</cell></row><row><cell cols="2">Critical discussion, fiction .071</cell><cell>.011</cell><cell>.013</cell><cell>.033</cell><cell>.009</cell><cell>.012</cell></row><row><cell>Age, fiction writers</cell><cell>.051</cell><cell>.083</cell><cell>.014</cell><cell>.049</cell><cell>.076</cell><cell>.014</cell></row><row><cell>Citations, economists</cell><cell>.063</cell><cell>.029</cell><cell>.096</cell><cell>.040</cell><cell>.018</cell><cell>.063</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work made use of the Illinois Campus Cluster, a computing resource that is operated by the Illinois Campus Cluster Program (ICCP) in conjunction with the National Center for Supercomputing Applications (NCSA) and which is supported by funds from the University of Illinois at Urbana-Champaign-specifically, through the Illinois Computes program. This work also used the Delta system at the National Center for Supercomputing Applications through allocation xras-ncsa-72 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services &amp; Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296. Some fiction data for this project was provided by HathiTrust Digital Library <ref type="bibr" target="#b7">[7]</ref>.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Appendices</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Topic models</head><p>Topic granularity will vary if a corpus includes many more texts in some periods than others, and this could be problematic for a project interested in comparisons across time. So our procedure in every case was:</p><p>1. Restrict the corpus to an even distribution across time. 2. Generate a 250-topic model with MALLET, including an "inferencer. " 3. Use the inferencer to generate topic distributions for documents that had to be left out of the "flat" distribution in step 1.</p><p>Using this model, we assessed novelty, transience, and precocity by measuring the K-L divergence between texts. K-L divergence is an asymmetric measure; we took the document being characterized as the reference probability distribution, and compared both past and future documents to that reference point.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Embeddings</head><p>We began by testing off-the-shelf GTE embeddings <ref type="bibr" target="#b9">[9]</ref>. When these performed poorly, we realized that embeddings are trained mostly on twenty-first-century material, and fine-tuning would be needed to give them a better chance of representing an earlier period.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Conceived and designed the analysis</title>
		<author>
			<persName><forename type="first">Becca</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sarah</forename><surname>Griebel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lucian</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jiayu</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jaihyun</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jana</forename><surname>Perkins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ted</forename><surname>Underwood; Wrote The ; Becca Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sarah</forename><surname>Griebel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lucian</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jiayu</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jaihyun</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jana</forename><surname>Perkins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ted</forename><surname>Underwood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Becca</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sarah</forename><surname>Griebel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lucian</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ted</forename><surname>Underwood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Becca</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sarah</forename><surname>Griebel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lucian</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ted</forename><surname>Underwood</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Contributed data or analysis tools</title>
				<imprint>
			<publisher>Ted Underwood</publisher>
		</imprint>
	</monogr>
	<note>Performed the analysis</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Individuals, Institutions, and Innovation in the Debates of the French Revolution</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">T J</forename><surname>Barron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>Spang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dedeo</surname></persName>
		</author>
		<idno type="DOI">10.1073/pnas.1717729115</idno>
		<ptr target="https://doi.org/10.1073/pnas.1717729115" />
	</analytic>
	<monogr>
		<title level="j">Proceedings of the National Academy of Sciences</title>
		<editor>D. S. Bassett</editor>
		<imprint>
			<biblScope unit="volume">115</biblScope>
			<biblScope unit="page" from="4607" to="4612" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Latent Dirichlet Allocation</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Blei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Jordan</surname></persName>
		</author>
		<ptr target="https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf" />
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="993" to="1022" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Do We Measure Novelty When We Analyze Unusual Combinations of Cited References? A Validation Study of Bibliometric Novelty Indicators Based on F1000Prime Data</title>
		<author>
			<persName><forename type="first">L</forename><surname>Bornmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tekles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">Y</forename><surname>Ye</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1910.03233[cs.DL</idno>
		<ptr target="url:https://arxiv.org/abs/1910.03233" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">JSTOR -Data for Research</title>
		<author>
			<persName><forename type="first">J</forename><surname>Burns</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Brenner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Llewellyn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Snyder</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Research and Advanced Technology for Digital Libraries</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Agosti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Borbinha</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Kapidakis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Papatheodorou</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Tsakonas</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin, Heidelberg; Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="416" to="419" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Comparing Measures of Textual Innovation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Griebel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Perkins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">E</forename><surname>Underwood</surname></persName>
		</author>
		<idno type="DOI">10.17605/osf.io/a3g6e</idno>
		<ptr target="url:osf.io/a3g6e" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">EfÏcient Natural Language Response Suggestion for Smart Reply</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Henderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Al-Rfou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Strope</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lukács</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Miklos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kurzweil</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1705.00652</idno>
		<ptr target="http://arxiv.org/abs/1705.00652" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Jett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Capitanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kudeki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Cole</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Organisciak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Underwood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dickson Koehl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Dubnicek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Downie</surname></persName>
		</author>
		<idno type="DOI">10.13012/r2te-c227</idno>
		<ptr target="https://doi.org/10.13012/R2TE-C227" />
		<title level="m">The HathiTrust Research Center Extracted Features Dataset</title>
				<imprint>
			<publisher>HathiTrust Research Center</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>2.0</note>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">The Semantic Scholar Open Data Platform</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Kinney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Anastasiades</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Authur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Beltagy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bragg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Buraczynski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Cachola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Candra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chandrasekhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Cohan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Crawford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Downey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dunkelberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Evans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Feldman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gorney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">W</forename><surname>Graham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Huff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>King</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kohlmeier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kuehl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Langan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lochner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Macmillan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">C</forename><surname>Murray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Newell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Rao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rohatgi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sayre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Soldaini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Subramanian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tanaka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D</forename><surname>Wade</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">M</forename><surname>Wagner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wilhelm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zamarron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Zuylen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Weld</surname></persName>
		</author>
		<idno>ArXiv abs/2301.10140</idno>
		<ptr target="https://api.semanticscholar.org/CorpusID:256194545" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Towards General Text Embeddings with Multi-Stage Contrastive Learning</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Long</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2308.03281</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">RoBERTa: A Robustly Optimized BERT Pretraining Approach</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.11692</idno>
		<ptr target="http://arxiv.org/abs/1907.11692" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">MALLET: A Machine Learning for Language Toolkit</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Mccallum</surname></persName>
		</author>
		<ptr target="http://mallet.cs.umass.edu" />
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Language Acquisition and Change: A Morphosyntactic Perspective</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Meisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Elsig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Rinke</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<publisher>Edinburgh University Press</publisher>
			<pubPlace>Edinburgh</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">The Matthew Effect in Science</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K</forename><surname>Merton</surname></persName>
		</author>
		<idno type="DOI">10.1126/science.159.3810.56</idno>
		<ptr target="https://www.science.org/doi/10.1126/science.159.3810.56" />
	</analytic>
	<monogr>
		<title level="j">Science</title>
		<imprint>
			<biblScope unit="volume">159</biblScope>
			<biblScope unit="page" from="56" to="63" />
			<date type="published" when="1968">1968</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">The New American Voter</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">E</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Shanks</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1996">1996</date>
			<publisher>Harvard University Press</publisher>
			<pubPlace>Cambridge, MA</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Training Language Models to Follow Instructions with Human Feedback</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ouyang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Almeida</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Wainwright</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mishkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Slama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schulman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hilton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Kelton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Simens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Welinder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Christiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Leike</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lowe</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2203.02155</idno>
		<ptr target="https://arxiv.org/abs/2203.02155" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note>cs.CL</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks</title>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1908.10084" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3982" to="3992" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Measuring Novelty in Science with Word Embedding</title>
		<author>
			<persName><forename type="first">S</forename><surname>Shibayama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Matsumoto</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0254034</idno>
		<ptr target="https://doi.org/10.1371/journal.pone.0254034" />
	</analytic>
	<monogr>
		<title level="j">Plos One</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page" from="1" to="16" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Computational Thematics: Comparing Algorithms for Clustering the Genres of Literary Fiction</title>
		<author>
			<persName><forename type="first">O</forename><surname>Sobchuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Šeļa</surname></persName>
		</author>
		<idno type="DOI">10.1057/s41599-024-02933-6</idno>
		<ptr target="https://doi.org/10.1057/s41599-024-02933-6" />
	</analytic>
	<monogr>
		<title level="j">Humanities and Social Sciences Communications</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">438</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Cohort Succession Explains Most Change in Literary Culture</title>
		<author>
			<persName><forename type="first">T</forename><surname>Underwood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kiley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Shang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Vaisey</surname></persName>
		</author>
		<idno type="DOI">10.15195/v9.a8</idno>
		<ptr target="http://dx.doi.org/10.15195/v9.a8" />
	</analytic>
	<monogr>
		<title level="j">Sociological Science</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="184" to="205" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">A Deep-Learning Model of Prescient Ideas Demonstrates That They Emerge from the Periphery</title>
		<author>
			<persName><forename type="first">P</forename><surname>Vicinanza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Goldberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">B</forename><surname>Srivastava</surname></persName>
		</author>
		<idno type="DOI">10.1093/pnasnexus/pgac275</idno>
		<ptr target="https://doi.org/10.1093/pnasnexus/pgac275" />
	</analytic>
	<monogr>
		<title level="j">PNAS Nexus</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">275</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Identify Novel Elements of Knowledge with Word Embedding</title>
		<author>
			<persName><forename type="first">D</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yokota</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Matsumoto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shibayama</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0284567</idno>
		<ptr target="https://doi.org/10.1371/journal.pone.0284567" />
	</analytic>
	<monogr>
		<title level="j">Plos One</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="1" to="16" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Measuring the Impact of Novelty, Bibliometric, and Academic-Network Factors on Citation Count Using a Neural Network</title>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Song</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.joi.2021.101140</idno>
		<idno>doi:</idno>
		<ptr target="https://www.sciencedirect.com/science/article/pii/S1751157721000110" />
	</analytic>
	<monogr>
		<title level="j">Journal of Informetrics</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">101140</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
