<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Locating the Leading Edge of Cultural Change</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sarah Griebel</string-name>
          <email>sarahg8@illinois.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Becca Cohen</string-name>
          <email>rscohen2@illinois.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucian Li</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaihyun Park</string-name>
          <email>jay.park2@ntu.edu.sg</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiayu Liu</string-name>
          <email>jiayu13@illinois.ed</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jana Perkins</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ted Underwood</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Nanyang Technological University</institution>
          ,
          <country country="SG">Singapore</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Illinois</institution>
          ,
          <addr-line>Urbana-Champaign</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>232</fpage>
      <lpage>245</lpage>
      <abstract>
        <p>Measures of textual similarity and divergence are increasingly used to study cultural change. But which measures align, in practice, with social evidence about change? We apply three diferent representations of text (topic models, document embeddings, and word-level perplexity) to three diferent corpora (literary studies, economics, and fiction). In every case, works by highly-cited authors and younger authors are textually ahead of the curve. We don't find clear evidence that one representation of text is to be preferred over the others. But alignment with social evidence is strongest when texts are represented through the top quartile of passages, suggesting that a text's impact may depend more on its most forward-looking moments than on sustaining a high level of innovation throughout.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;cultural change</kwd>
        <kwd>document embeddings</kwd>
        <kwd>topic modeling</kwd>
        <kwd>fiction</kwd>
        <kwd>bibliometrics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>A growing body of scholarship seeks to understand cultural change by measuring the way
individual texts precede or lag corpus-level trends.</p>
      <p>
        Diferent disciplines have framed this problem diferently. Fields like bibliometrics measure
novelty by comparing an article to past precedent, and ask how well novelty predicts impact
as measured by citations 2[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. By contrast, some computational social scientists are less
interested in divergence from the past than in anticipation of the future. In Vicinanza et al. 2022, for
instance, a text’s “prescience,” or anticipation of future change, is used to identify social
locations where new patterns tend to emerge2[0]. It is also possible to combine both approaches,
and study a text’s relationship to past and future at once. Barron et al. 2018 measures a text’s
divergence from the past (“novelty”) and subtracts divergence from the future (“transience”)—
producing a measure of durable innovation they call “resonanc1e]”. [
      </p>
      <p>Models of textual change have also relied on radically diferent representations of text,
ranging from lexical topic models in1][ to a deep-learning model of sentences in20[]. Plausible
a priori arguments can be made for all of these methods. In this paper we will try to provide
empirical evidence about best practices.</p>
      <p>To empirically assess methods of measuring textual change, of course, we need some kind
of ground truth about a text’s divergence from the past (or similarity to the future). This is
not a topic where absolute ground truth is available. In fact, researchers measure innovation
textually because they have reason to suspect that social evidence will be unreliable here. So
instead of relying on a single unimpeachable source of social evidence, we may have to combine
several.</p>
      <p>
        For instance, bibliometricians have repeatedly confirmed that innovation does correlate with
publicity [
        <xref ref-type="bibr" rid="ref22 ref3">22, 3</xref>
        ]. Works that introduce new language, or cite new combinations of sources,
tend to attract more attention and receive more citations themselves. So we could use citation
frequency as one signal that a text was on the leading edge of change.
      </p>
      <p>
        But we also have reason to suspect that using publicity as a measure of innovation will
overrate already-prominent writers, who tend to receive more attention through the “Matthew
efect” [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Young writers are cited less frequently. And yet many ideas and locutions must
emerge first in young writers, since cohort succession is a major driver of cultural chan1g2e, [
        <xref ref-type="bibr" rid="ref14 ref19">14, 19</xref>
        ].
      </p>
      <p>The tension between these two forms of social evidence gives us leverage on the problem.
If we can find a measure of a text’s relation to change that aligns well with youth but also
with citation and prominence, we will have validated our measure against two independent
variables, suggesting that it describes “the leading edge of cultural change” in a relatively broad
and robust sense.</p>
      <p>The documents we consider include journal articles drawn from literary studies and
economics, as well as a collection of English-language fiction ranging from 1890 to 2000. In both
cases, we have all or most of the documents in full text, so we can compare Transformer-based
models to older strategies of lexical modeling.</p>
      <p>Our experiment supports several inferences about best practices for measuring change. For
instance, is a text’s relation to the past or the future more informative? When do
Transformerbased models outperform lexical ones? Should texts always be considered as wholes, or might
it be more meaningful to represent them through their most innovative parts?</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data</title>
      <p>We modeled the impact of textual innovation using three datasets. Two datasets contained
academic journal articles from the fields of literary studies and economics—fields selected
because we expect their rhetorical and citation practices to diverge significantly. One contained
English-language fiction.</p>
      <sec id="sec-2-1">
        <title>2.1. Academic journals</title>
        <p>Journals were selected for longevity and influence in the field. Journals with longer lifespans
were prioritized, as this would ensure relative stability across the corpus.</p>
        <p>
          The literary studies dataset contains a corpus of 40,407 full text academic articles from seven
journals. The economics dataset contains 43,081 articles from eight journals. Texts were
obtained through JSTOR [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Consult Appendix E for a full list of journal titles. Citation counts
were gathered from Semantic Scholar8][. Authors’ years of birth were inferred through a
mixture of manual checking and matching to VIAF, which gave us age at publication for 2,646
articles in literary studies (see Appendix D for our methods of inference).
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Fiction</title>
        <p>We gathered 8,918 works of English-language fiction distributed approximately evenly across
time from 1890 through 2000. The first and last 10% of each book was discarded to avoid mixing
ifction with introductions, advertisements, and other nonfiction paratext. Of our 8,918 books,
only 7,304 are in full text; since we only produced embeddings of these books, the embedding
method had a slight disadvantage on the fiction corpus 7[].</p>
        <p>We drew information about authors’ years of birth from Underwood et al. 2022, which gave
us author ages for 3,272 works in the period we were analyzin1g9][.</p>
        <p>We also created a subset of “critically discussed” works by finding the titles and authors of
our fiction corpus in our literary studies corpus. This group of 463 books was compared to a
contrast set with the same distribution across time, but never mentioned in that corpus.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>We measured both divergence from the past (which following Barron et al. 2018 we call
“novelty”) and divergence from future documents (“transience”). But most of the results below are
based on the composite quantity they call “resonance” (novelty minus transience). To avoid
any suggestion of causality we call this quantity “precocity.” A text with high precocity simply
“looks later than” peers published in the same year. We calculate these quantities using three
diferent representations of texts.</p>
      <sec id="sec-3-1">
        <title>3.1. Topic models</title>
        <p>
          We topic modeled our corpora using the implementation of LDA in MALLET, and divided
documents into chunks of at least 512 tokens 1[
          <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
          ]. For more detail, see Appendix A. We compared
documents by measuring Kullback-Leibler divergence on topic distributions, following Barron
et al. 2018 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Document embeddings</title>
        <p>Topic models are limited to lexical evidence. It seemed plausible that neural document
embeddings, compared via cosine distance, might capture a richer representation of text. We
experimented with several diferent embedding strategies. Of-the-shelf embeddings performed very
poorly, even if they were at the top of the leaderboard for contemporary applications.
Finetuning using the sentence Transformers library was necessary to produce embeddings more
suited to the specialized subject matter and temporal range (1890–2017) of this experime1n6t][.
See Appendix B for details of our fine-tuning strategy.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Perplexity</title>
        <p>
          Vicinanza et al. measure a quantity they call “prescience,” which is calculated by comparing the
mean perplexity of a document’s sentences in two models—one trained via masked language
modeling on its own period, and one trained on a future period. Sentences that have lower
perplexity in the future (which become more probable in the future) will have high prescience.
In bibliometrics, a loosely similar method has been used to compute nove2l0ty,2[
          <xref ref-type="bibr" rid="ref1 ref17">1, 17</xref>
          ].
        </p>
        <p>We tested Vicinanza’s definition of prescience on our corpora, but found that we got much
greater predictive power by using an expanded version of the method that included both past
and future. Instead of subtracting future perplexity from a document’s perplexity at time of
publication, we subtract it from perplexity calculated in the past.</p>
        <p>precocity= 2 ⋅
perplexitypast − perplexityfuture
perplexitypast + perplexityfuture
(1)</p>
        <p>This measures not just anticipation of a specific future period, but a quality of being “ahead
of the curve,” where the curve is inferred from the whole time window around publication of
a text. For further details see Appendix F.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Details of precocity calculation</title>
        <p>Documents were divided into chunks for all three of these strategies, and chunks were
characterized individually. For the first two methods this meant that each chunk was compared
to all the other document chunks in the preceding (and following) 20 years. Perplexity relied
on models that characterize a 12-year period, so direct chunk-to-chunk comparisons were not
required. The full span from the “past” model to the “future” model was 36 years, rendering the
scale of the perplexity calculation comparable to the 41-year span of the other two methods.</p>
        <p>It is certainly possible to characterize a document through the mean precocity of its chunks.
But an argument can also be made that what matters, socially, is often not the average tenor
of a document, but its most surprising and forward-looking moment. For this reason we also
tested an alternate strategy that characterized documents by selecting the top 25% of their
chunks with highest precocity, and taking the mean of those values.</p>
        <p>An alert reader will anticipate that questions of circularity might emerge when texts quote
each other or were written by the same author. See Appendix C for our solution to these
problems. In practice these efects were very small; excluding or leaving in texts that quote
each other made almost no diference.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Regression strategy</title>
        <p>We assess the explanatory power of precocity through a multiple linear regression that includes
terms for precocity, precocity squared, and novelty (which gives the regression leverage to
separate the components of precocity that refer to the past or to the future). Date of publication
is also present as a control variable.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>We’ll begin with a quick overview of the variance explained when six diferent methods of text
analysis are applied to predicting five social variables.</p>
      <p>As we predicted, textual innovation is associated both with prominence and with youth (even
though a text’s prominence is anti-correlated with youth in this data). The best-performing
methods were able to explain 7-9% of the variance in prominence (e.g., citation count) simply
by identifying works that were (textually) ahead of the curve—more like the future than the
past.</p>
      <p>It is difÏcult to say whether explaining 7-9% of social variance is good performance, because
we don’t know how much of a work’s prominence is really determined by innovation–and
how much by factors like institutional prestige. Some research suggests that the answer varies
from one discipline to another22[]. It nevertheless seems reasonable to take social variance
explained as a heuristic to choose between methods—for while we don’t know the real efect
size, it is unclear how significant efects larger than the real one would be produced.</p>
      <p>So what did we learn about measuring precocity? The clearest lesson is that the signal tended
to be strongest when we measured documents “at their most forward-looking,” by averaging
the 25% of passages with the highest precocity scores. In all of the tests we ran, this method
aligned better with social evidence than a method that averaged all passages. One might
infer that citations—and more surprisingly, critical references to fiction—are often motivated by
innovations expressed in a relatively small part of a text.</p>
      <p>Second, on examining regression coefÏcients, transience (diference from the future)
provides slightly stronger evidence of failure than novelty (diference from the past) provides
evidence of success. The diference between these variables was not huge, however, and there
was nothing to be gained by discarding information about the past. The original perplexity
method in Vicinanza et al. 2022, which only included information about the future, achieved
r2 less than half as large as the improved method we describe in the table above. Comparing
texts only to the past, or only to the future, would admittedly make it easier to use causal
language. Precocity, which characterizes a text in relation to a whole time window around its
publication, is hard to interpret causally. But if causal explanation is not being claimed, there
is no reason not to use both time arrows at once.</p>
      <p>All three representations of text (topic models, embeddings, and perplexity) performed well
in some cases. Topic models seemed to predict prominence well, while embeddings performed
well on age—but we don’t have enough data points to generalize. If any conclusion can be
drawn here, it might be “a dog that doesn’t bark.” We found no evidence that neural
models of text systematically outperformed lexical models. On the contrary, lexical topic models
displayed consistently strong performance across tasks and corpora.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>We found clear alignment between textual measures of precocity and two independent kinds of
social evidence that we expected to align with change (prominence and authorial youth). There
is no absolute ground truth in this domain, but statistically significant relationships across
three corpora do increase our confidence that text analysis can locate a leading edge of cultural
change.</p>
      <p>We also consistently found a better fit with social evidence when we represented documents
through the 25% of passages with highest precocity. It seems likely that significant innovations
are often concentrated in a small portion of an article or work of fiction.</p>
      <p>However, we also found that precocity can be measured in diferent ways, which do not
always agree with each other. Since change is taking place in a space that has multiple
dimensions, the same text can be at the leading edge on one axis and lagging behind on another.
Measures of similarity based on a topic model seemed to excel at predicting citations and public
reputation. Transformer-based embeddings may be better at catching tacit signals of
generational style.</p>
      <p>Since topic modeling is an older representation of text, its strong performance overall may
require discussion. We don’t have a confident answer here, but for what it’s worth, topic
models are explicitly designed to factor a corpus into latent variables. Document embeddings,
by contrast, don’t have any representational goal at the corpus level. The embeddings we used
are tuned contrastively, using the Sentence Transformers libra1r6y].[ But that process is not
guaranteed to model the corpus in a principled way—which might be a disadvantage in an
experiment that seeks to measure a document’s relation to corpus-level trends.</p>
      <p>At least for now, researchers wrestling with questions about textual change are well advised
to evaluate the performance of a principled lexical model as a baseline and confirm that
embeddings do actually improve on it before relying on embeddings alone. It is not safe to assume
that a model will perform better simply because it captures information about word order.</p>
    </sec>
    <sec id="sec-6">
      <title>Public data and code</title>
      <p>Data and code for this project are available on GitHuhbt:tps://github.com/IllinoisLiteraryLab
/novelty/tree/mai n,and will also be archived on Zenodo.</p>
    </sec>
    <sec id="sec-7">
      <title>Contribution statement</title>
      <sec id="sec-7-1">
        <title>Authors are listed alphabetically here.</title>
        <p>Conceived and designed the analysis: Becca Cohen, Sarah Griebel, Lucian Li, Jiayu Liu,
Jaihyun Park, Jana Perkins, Ted UnderwoodW;rote the paper: Becca Cohen, Sarah Griebel,
Lucian Li, Jiayu Liu, Jaihyun Park, Jana Perkins, Ted UnderwooCd;ollected the data: Becca
Cohen, Sarah Griebel, Lucian Li, Ted UnderwooCd;ontributed data or analysis tools: Becca
Cohen, Sarah Griebel, Lucian Li, Ted UnderwoodP;erformed the analysis: Sarah Griebel,
Ted Underwood.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work made use of the Illinois Campus Cluster, a computing resource that is operated by
the Illinois Campus Cluster Program (ICCP) in conjunction with the National Center for
Supercomputing Applications (NCSA) and which is supported by funds from the University of
Illinois at Urbana-Champaign—specifically, through the Illinois Computes program. This work
also used the Delta system at the National Center for Supercomputing Applications through
allocation xras-ncsa-72 from the Advanced Cyberinfrastructure Coordination Ecosystem:
Services &amp; Support (ACCESS) program, which is supported by National Science Foundation grants
#2138259, #2138286, #2138307, #2137603, and #2138296. Some fiction data for this project was
provided by HathiTrust Digital Librar7y][.</p>
    </sec>
    <sec id="sec-9">
      <title>Appendices</title>
    </sec>
    <sec id="sec-10">
      <title>A. Topic models</title>
      <p>Topic granularity will vary if a corpus includes many more texts in some periods than
others, and this could be problematic for a project interested in comparisons across time. So our
procedure in every case was:</p>
      <sec id="sec-10-1">
        <title>1. Restrict the corpus to an even distribution across time. 2. Generate a 250-topic model with MALLET, including an “inferencer.” 3. Use the inferencer to generate topic distributions for documents that had to be left out of the “flat” distribution in step 1.</title>
        <p>Using this model, we assessed novelty, transience, and precocity by measuring the K-L
divergence between texts. K-L divergence is an asymmetric measure; we took the document being
characterized as the reference probability distribution, and compared both past and future
documents to that reference point.</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>B. Embeddings</title>
      <p>
        We began by testing of-the-shelf GTE embeddings [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. When these performed poorly, we
realized that embeddings are trained mostly on twenty-first-century material, and fine-tuning
would be needed to give them a better chance of representing an earlier period.
      </p>
      <p>
        The tuning method we ultimately adopted relies on multiple negatives ranking loss, as
implemented in Sentence Transformers6[
        <xref ref-type="bibr" rid="ref16">, 16</xref>
        ]. That is, the training dataset includes only positive
pairs of similar passages; negative pairs are created implicitly by misaligning the passages in
a batch. We created positive pairs mostly by selecting adjacent passages from the same article
(or work of fiction). But we adopted several tricks to prevent the model from learning a model
of similarity defined purely by vocabulary overlap. First, we used GPT-3.5 to paraphrase and
condense one element of some pairs1[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Paraphrasing up to 18% of pairs seemed to improve
results. Second, in training embeddings for fiction, we replaced personal names in one element
of each pair—preserving first and last names, and gender signals, as much as possible. Both
of these changes made the learning task more difÏcult and improved alignment with social
evidence. We used these datasets to fine-tune RoBERTa [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>We also explored several alternate approaches that aren’t represented in the final paper. For
the task of predicting citations, we experimented with embeddings that were trained
specifically to identify the kind of similarity between articles that produces citation. Here, positives
were sentences from articles related by citation, and negatives were pairs of sentences identified
by of-the-shelf embedding methods as sharing intellectual influence, despite no documented
citation existing between the two articles. Our hypothesis was that these pairs represent
spurious or coincidental similarities in language not necessarily associated with the type of
intellectual influence we are trying to measure. We took these pairs and fine-tuned the GTE model,
through Cosine Entropy Loss, assigning high similarity to correctly identified citation pairs
and low similarity to false identified pairs9[].</p>
      <p>Since we were concerned that embeddings might perform less well on long passages than
on individual sentences, we also tested a strategy where we generated embeddings on single
sentences, then clustered them, and took the cluster centroids as synthetic “document
embeddings.” This did not improve performance.</p>
      <p>An alternate approach we have not yet checked would be to train embeddings entirely from
scratch on these corpora. Some recent studies suggest that even older methods of doing that,
like doc2vec, can outperform topic models on clustering tas1k8s][.</p>
      <p>We embedded passages of up to 512 tokens, with the constraint that we divide passages only
at sentence breaks. Note that the chunks used for topic modeling were generally combinations
of two or more embedding chunks; this diference of size was permitted in order to emphasize
the strengths of both methods, without hindering either one.</p>
    </sec>
    <sec id="sec-12">
      <title>C. Text-reuse detection</title>
      <p>We avoided comparing any papers written by the same author. We also aimed to avoid
comparing chunks of text that directly quoted each other, as including these, we estimated, would
create a circularity in the precocity calculation for such chunks, directly guaranteeing that it
would correlate with citation.</p>
      <p>To avoid this circularity, we looked for both the existence of the cited author’s last name
or a string of six or more matching words that were in single or double quotations within the
citing paper. If either of these are found, the chunk is not used for comparison. It is important
to note that the whole paper is not excluded from comparison; only the ofending chunk.</p>
    </sec>
    <sec id="sec-13">
      <title>D. Author age determination</title>
      <p>
        For the fiction corpus we could rely on previously published data to determine authors’ years
of birth [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>To create analogous data for literary scholars, we estimated years of birth for a sample of
1,093 authors (and 2,646 articles) through a mixture of manual research and searches on the
VIAF API. A model was trained to distinguish true VIAF matches from false ones. We
estimate that we achieved overall accuracy of greater tha9n0%; this estimate is based both on the
accuracy of the VIAF model and on manually checking a sample of articles.</p>
    </sec>
    <sec id="sec-14">
      <title>E. Corpus construction</title>
    </sec>
    <sec id="sec-15">
      <title>F. Timeline for perplexity calculation</title>
      <p>We calculated perplexity using RoBERTa on chunks of up to 512 tokens (the same ones we
used for embedding) 1[0]. We divided the timeline into overlapping 12-year periods with a
4-year ofset, which ends up meaning that a text published in 1968-1971, for instance, would
be compared to a past model trained on 1952-63 and a future model trained on 1976-87. But a
text published in 1964-67 would be compared to a past model trained on 1948-59 and a future
model trained on 1972-83.</p>
      <p>Our goal in creating 12-year models, but moving them forward 4 years at a time, was
to create sufÏciently large corpora for training while ensuring that texts were not greatly
(dis)advantaged by their position within a time step.</p>
    </sec>
    <sec id="sec-16">
      <title>G. Domain insights</title>
      <p>Our primary goal in this paper is to validate a method. But it is also easy to see how this
method could be used to illuminate substantive research questions about a genre or academic
discipline. To give a quick sense of what it might reveal, we’ve visualized the seven journals that
comprise our literary studies corpus, along with a selection of authors who have exceptionally
high precocity and/or an exceptionally high number of citations.</p>
      <p>Citation counts are already public. But precocity—vertical position in Figure 1—is new
information. Precocity does tend to correlate with citations, as is visible in the positive slope of the
journals. But journals that attract diferent numbers of citations (liNkew Literary History and
Critical Inquiry) may nevertheless be close to indistinguishable when it comes to precocity—
which suggests they have substantively equal power to predict trends in the discipline. In other
cases, journals that aren’t distinguished by citation count can be distinguished by precocity.
PMLA is the flagship journal of the Modern Language Association, and arguably the
highestprestige venue in this group. It attracts almost as many citations aNsew Literary History or
ELH, but its position on the vertical axis suggests that editorial practices have sometimes been
more conservative (as perhaps befits the journal of a large professional organization).</p>
      <p>The apparent negative slope of author names is an artifact of the process we used to select
exceptional authors, which deliberately highlights names on the periphery. If we plotted all
authors, we would get a Gaussian cloud of points with the same slope and center as the journals
(but much larger, since authors are associated with fewer articles and thus aren’t pulled to the
origin as strongly by the law of averages).</p>
      <p>The names of well-known critics, like Fredric Jameson and Gayatri Spivak, tend to be found
in the upper right corner, suggesting that they were not only widely cited but prescient (or
influential—causality is impossible to determine here). Moving up and to the left we find names
that may be less familiar, but that our algorithm suggests were also ahead of the curve. Carl
E. W. L. Dahlström is an early-twentieth-century critic whose articles have almost never been
cited, although they anticipate subsequent trends.</p>
      <p>On the right side of the graph we find a few widely-cited authors who aren’t especially
distinguished by precocity. This is not necessarily a negative reflection on their work. For
instance, several authors in this region (Richard Rorty, Jacques Derrida, and Ian Hacking) are
well-known philosophers who were occasionally invited to publish in literary studies journals.
Since they can hardly expect to convert literary scholars into philosophers en masse, these
honorific late-career publications won’t stand at the beginning of a long tradition of similar
work, and therefore won’t have high precocity. In short, there can be more than one kind of
influence. Precocity measures a text’s relation to a specific corpus, and may not capture all the
intellectual influences that flow between corpora. It is nevertheless easy to see how this metric
could be used to pose questions about editorial practices and career arcs within a discipline.</p>
    </sec>
    <sec id="sec-17">
      <title>H. Preregistration and paths not taken</title>
      <p>Most of the methodological details above were preregistered in Fall 250]2.3B[ut the
experimental plan did change in some important ways afterward. In particular, our embedding strategy
changed several times, after of-the-shelf GTE embeddings proved not to be competitive. Also,
comparison to authorial age wasn’t part of our original plan. A critical reader might (correctly)
interpret these adjustments to our plan as eforts to find some method or context that would
allow Transformer-based methods to outperform a topic model, as we had originally expected.
If we had followed our original experimental plan exactly, the result would have been a simple
endorsement of topic modeling. Evidence of our struggle to avoid or complicate that
conclusion may perhaps make it even more persuasive.</p>
      <p>
        There is also a question we proposed in the preregistration, and did investigate, but haven’t
discussed above for reasons of space. Some researchers may wonder whether it really makes
sense to compare a text chunk to all the parts of all documents in the preceding and following
20 years. One could argue that mystery novels, for instance, are not really innovating relative
to science fiction, but to other mystery novels. One way of taking this into account—which
performed well in some previous work—was to compare chunks only to a subset of very similar
chunks in the past and future (say the top 5%) 1[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. We also tested that strategy here, but it
didn’t often improve on other approaches, and so we’ve deferred discussion to this appendix.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. T. J.</given-names>
            <surname>Barron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Spang</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. DeDeo.</surname>
          </string-name>
          “Individuals, Institutions, and
          <article-title>Innovation in the Debates of the French Revolution”</article-title>
          .
          <source>InP:roceedings of the National Academy of Sciences 115.18</source>
          (
          <year>2018</year>
          ). Ed. by D. S. Bassett, pp.
          <fpage>4607</fpage>
          -
          <lpage>4612</lpage>
          . doi:
          <volume>10</volume>
          .1073/pnas.1717729115. eprint:https://doi.org/10.1073/pnas.171772911 5.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <surname>M. I. Jordan.</surname>
          </string-name>
          “Latent Dirichlet Allocation”.
          <source>JIonu:rnal of Machine Learning Research</source>
          <volume>3</volume>
          (
          <year>2003</year>
          ), pp.
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          . url: https://www.jmlr.org/papers/volume3/bl ei03a/blei03a.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bornmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tekles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F. Y.</given-names>
            <surname>Ye</surname>
          </string-name>
          .
          <source>Do We Measure Novelty When We Analyze Unusual Combinations of Cited References? A Validation Study of Bibliometric Novelty Indicators Based on F1000Prime Data</source>
          .
          <year>2019</year>
          . arXiv:
          <year>1910</year>
          .
          <article-title>03233 [cs</article-title>
          .DL]. url: htt ps://arxiv.org/abs/
          <year>1910</year>
          .03233.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Burns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brenner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kiser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Llewellyn</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Snyder</surname>
          </string-name>
          . “
          <article-title>JSTOR - Data for Research”</article-title>
          . In:Research and
          <article-title>Advanced Technology for Digital Libraries</article-title>
          . Ed. by
          <string-name>
            <given-names>M.</given-names>
            <surname>Agosti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Borbinha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kapidakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Papatheodorou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Tsakonas</surname>
          </string-name>
          . Berlin, Heidelberg: Springer Berlin Heidelberg,
          <year>2009</year>
          , pp.
          <fpage>416</fpage>
          -
          <lpage>419</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Griebel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Perkins</surname>
          </string-name>
          , and W. E. UnderwooCdo.mparing Measures of Textual Innovation.
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .17605/osf.io/a3g6e. url: osf.io/a3g6e.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Henderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Strope</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lukács</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Miklos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Kurzweil</surname>
          </string-name>
          .
          <article-title>“EfÏcient Natural Language Response Suggestion for Smart Reply”</article-title>
          .
          <source>In: CoRR abs/1705</source>
          .00652 (
          <year>2017</year>
          ). arXiv:
          <volume>1705</volume>
          .00652. url: http://arxiv.org/abs/1705.00652.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Capitanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kudeki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cole</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Organisciak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Underwood</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Dickson Koehl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dubnicek</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. S.</given-names>
            <surname>DownieT.he HathiTrust Research Center Extracted Features Dataset</surname>
          </string-name>
          (
          <volume>2</volume>
          .0). HathiTrust Research Center.
          <year>2020</year>
          . doi1:
          <fpage>0</fpage>
          .13012/r2te-c227. url: https://do i.
          <source>org/10</source>
          .13012/R2TE-C227.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Kinney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Anastasiades</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Authur</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bragg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Buraczynski</surname>
          </string-name>
          , I. Cachola,
          <string-name>
            <given-names>S.</given-names>
            <surname>Candra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chandrasekhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Crawford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dunkelberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gorney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Graham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Huf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kohlmeier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kuehl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Langan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lochner</surname>
          </string-name>
          , K. MacMillan, T. C. Murray,
          <string-name>
            <given-names>C.</given-names>
            <surname>Newell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rohatgi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sayre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Soldaini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Subramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tanaka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Wade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wilhelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zamarron</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van Zuylen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and D. S.</given-names>
            <surname>Weld</surname>
          </string-name>
          .
          <article-title>“The Semantic Scholar Open Data Platform”</article-title>
          .
          <source>In: ArXiv abs/2301</source>
          .10140 (
          <year>2023</year>
          ). url: https://api.semanticscholar.org/CorpusID:2561945
          <fpage>45</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Xie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          . “
          <article-title>Towards General Text Embeddings with Multi-Stage Contrastive Learning”</article-title>
          .
          <source>Ina:rXiv preprint arXiv:2308.03281</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          . “
          <article-title>RoBERTa: A Robustly Optimized BERT Pretraining Approach”</article-title>
          . ICn:oRR abs/
          <year>1907</year>
          .11692 (
          <year>2019</year>
          ). arXiv:
          <year>1907</year>
          .11692. url: http://arxiv.org/abs/
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>A. K. McCallum.MALLET:</surname>
          </string-name>
          <article-title>A Machine Learning for Language Toolkit</article-title>
          .
          <year>2002</year>
          . url: http://m allet.
          <source>cs.umass.edu.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>J. M. Meisel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Elsig</surname>
          </string-name>
          , and E. RinkeL.anguage Acquisition and
          <article-title>Change: A Morphosyntactic Perspective</article-title>
          . Edinburgh: Edinburgh University Press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Merton</surname>
          </string-name>
          . “
          <article-title>The Matthew Efect in Science”</article-title>
          .
          <source>In: Science 159.3810</source>
          (
          <year>1968</year>
          ), pp.
          <fpage>56</fpage>
          -
          <lpage>63</lpage>
          . doi:
          <volume>10</volume>
          .1126/science.159.3810.56. url: https://www.science.org/doi/10.1126/science.159.3810 .56.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>W. E.</given-names>
            <surname>Miller</surname>
          </string-name>
          and
          <string-name>
            <surname>J. M.</surname>
          </string-name>
          <article-title>ShanksT</article-title>
          .he New American Voter. Cambridge, MA: Harvard University Press,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Wainwright</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mishkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Agarwal,
          <string-name>
            <given-names>K.</given-names>
            <surname>Slama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schulman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kelton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Welinder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Christiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leike</surname>
          </string-name>
          , and
          <string-name>
            <surname>R.</surname>
          </string-name>
          <article-title>LoweT.raining Language Models to Follow Instructions with Human Feedback</article-title>
          .
          <year>2022</year>
          . arXiv:
          <volume>2203</volume>
          .02155 [cs.CL]. url: https://arxiv.or g/abs/2203.02155.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          and
          <string-name>
            <surname>I. Gurevych.</surname>
          </string-name>
          “
          <article-title>Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks”</article-title>
          .
          <source>In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3982</fpage>
          -
          <lpage>3992</lpage>
          . url: https://arxiv.org/abs/
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shibayama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Matsumoto</surname>
          </string-name>
          . “
          <article-title>Measuring Novelty in Science with Word Embedding”</article-title>
          .
          <source>In: Plos One 16.7</source>
          (
          <issue>2021</issue>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . doi:
          <volume>10</volume>
          . 1371 / journal . pone .
          <volume>025403</volume>
          .4url: https://doi.org/10.1371
          <source>/journal.pone.02540 3</source>
          .
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>O.</given-names>
            <surname>Sobchuk</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Šeļa</surname>
          </string-name>
          . “Computational Thematics:
          <article-title>Comparing Algorithms for Clustering the Genres of Literary Fiction”</article-title>
          .
          <source>InH:umanities and Social Sciences Communications 11.1</source>
          (
          <issue>2024</issue>
          ), p.
          <fpage>438</fpage>
          . doi:
          <volume>10</volume>
          .1057/s41599-024-02933-6. url: https://doi.org/10.1057/s41599 -024-02933-6.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Underwood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kiley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Vaisey</surname>
          </string-name>
          . “
          <article-title>Cohort Succession Explains Most Change in Literary Culture”</article-title>
          .
          <source>InS:ociological Science 9.8</source>
          (
          <issue>2022</issue>
          ), pp.
          <fpage>184</fpage>
          -
          <lpage>205</lpage>
          . doi:
          <volume>10</volume>
          .151 95/v9.a8. url: http://dx.doi.org/10.15195/v9.a8.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>P.</given-names>
            <surname>Vicinanza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          .
          <article-title>“A Deep-Learning Model of Prescient Ideas Demonstrates That They Emerge from the Periphery”</article-title>
          .
          <source>InP:NAS Nexus 2.1</source>
          (
          <issue>2022</issue>
          ),
          <year>pgac275</year>
          . doi:
          <volume>10</volume>
          .1093/pnasnexus/pgac275. url: https://doi.org/10.1093/pnasnexus/pgac 275.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>D.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yokota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Matsumoto</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Shibayama</surname>
          </string-name>
          . “
          <article-title>Identify Novel Elements of Knowledge with Word Embedding”</article-title>
          .
          <source>InP:los One 18.6</source>
          (
          <issue>2023</issue>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . doi:
          <volume>10</volume>
          .1371/jour nal.
          <source>pone.0284567</source>
          . url: https://doi.org/10.1371
          <source>/journal.pone.02845 6</source>
          .
          <fpage>7</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Song</surname>
          </string-name>
          . “
          <article-title>Measuring the Impact of Novelty, Bibliometric, and Academic-Network Factors on Citation Count Using a Neural Network”</article-title>
          .
          <source>JIno:urnal of Informetrics 15.2</source>
          (
          <issue>2021</issue>
          ), p.
          <fpage>101140</fpage>
          . doi: https://doi.org/10.1016/j.joi.
          <year>2021</year>
          .
          <volume>101140</volume>
          .url: https://www.sciencedirect.com/science/article/pii/S175115772100011. 0
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>