1. Introduction

Locating the Leading Edge of Cultural Change

Sarah Griebel

sarahg8@illinois.edu 1

Becca Cohen

rscohen2@illinois.edu 1

Lucian Li

Jaihyun Park

jay.park2@ntu.edu.sg 0

Jiayu Liu

jiayu13@illinois.ed 1

Jana Perkins

Ted Underwood

1 0 Nanyang Technological University , Singapore 1 University of Illinois , Urbana-Champaign , USA

232 245

Measures of textual similarity and divergence are increasingly used to study cultural change. But which measures align, in practice, with social evidence about change? We apply three diferent representations of text (topic models, document embeddings, and word-level perplexity) to three diferent corpora (literary studies, economics, and fiction). In every case, works by highly-cited authors and younger authors are textually ahead of the curve. We don't find clear evidence that one representation of text is to be preferred over the others. But alignment with social evidence is strongest when texts are represented through the top quartile of passages, suggesting that a text's impact may depend more on its most forward-looking moments than on sustaining a high level of innovation throughout.

eol>cultural change document embeddings topic modeling fiction bibliometrics

1. Introduction

A growing body of scholarship seeks to understand cultural change by measuring the way individual texts precede or lag corpus-level trends.

Diferent disciplines have framed this problem diferently. Fields like bibliometrics measure novelty by comparing an article to past precedent, and ask how well novelty predicts impact as measured by citations 2[ 2 ]. By contrast, some computational social scientists are less interested in divergence from the past than in anticipation of the future. In Vicinanza et al. 2022, for instance, a text’s “prescience,” or anticipation of future change, is used to identify social locations where new patterns tend to emerge2[0]. It is also possible to combine both approaches, and study a text’s relationship to past and future at once. Barron et al. 2018 measures a text’s divergence from the past (“novelty”) and subtracts divergence from the future (“transience”)— producing a measure of durable innovation they call “resonanc1e]”. [

Models of textual change have also relied on radically diferent representations of text, ranging from lexical topic models in1][ to a deep-learning model of sentences in20[]. Plausible a priori arguments can be made for all of these methods. In this paper we will try to provide empirical evidence about best practices.

To empirically assess methods of measuring textual change, of course, we need some kind of ground truth about a text’s divergence from the past (or similarity to the future). This is not a topic where absolute ground truth is available. In fact, researchers measure innovation textually because they have reason to suspect that social evidence will be unreliable here. So instead of relying on a single unimpeachable source of social evidence, we may have to combine several.

For instance, bibliometricians have repeatedly confirmed that innovation does correlate with publicity [ 22, 3 ]. Works that introduce new language, or cite new combinations of sources, tend to attract more attention and receive more citations themselves. So we could use citation frequency as one signal that a text was on the leading edge of change.

But we also have reason to suspect that using publicity as a measure of innovation will overrate already-prominent writers, who tend to receive more attention through the “Matthew efect” [ 13 ]. Young writers are cited less frequently. And yet many ideas and locutions must emerge first in young writers, since cohort succession is a major driver of cultural chan1g2e, [ 14, 19 ].

The tension between these two forms of social evidence gives us leverage on the problem. If we can find a measure of a text’s relation to change that aligns well with youth but also with citation and prominence, we will have validated our measure against two independent variables, suggesting that it describes “the leading edge of cultural change” in a relatively broad and robust sense.

The documents we consider include journal articles drawn from literary studies and economics, as well as a collection of English-language fiction ranging from 1890 to 2000. In both cases, we have all or most of the documents in full text, so we can compare Transformer-based models to older strategies of lexical modeling.

Our experiment supports several inferences about best practices for measuring change. For instance, is a text’s relation to the past or the future more informative? When do Transformerbased models outperform lexical ones? Should texts always be considered as wholes, or might it be more meaningful to represent them through their most innovative parts?

2. Data

We modeled the impact of textual innovation using three datasets. Two datasets contained academic journal articles from the fields of literary studies and economics—fields selected because we expect their rhetorical and citation practices to diverge significantly. One contained English-language fiction.

2.1. Academic journals

Journals were selected for longevity and influence in the field. Journals with longer lifespans were prioritized, as this would ensure relative stability across the corpus.

The literary studies dataset contains a corpus of 40,407 full text academic articles from seven journals. The economics dataset contains 43,081 articles from eight journals. Texts were obtained through JSTOR [ 4 ]. Consult Appendix E for a full list of journal titles. Citation counts were gathered from Semantic Scholar8][. Authors’ years of birth were inferred through a mixture of manual checking and matching to VIAF, which gave us age at publication for 2,646 articles in literary studies (see Appendix D for our methods of inference).

2.2. Fiction

We gathered 8,918 works of English-language fiction distributed approximately evenly across time from 1890 through 2000. The first and last 10% of each book was discarded to avoid mixing ifction with introductions, advertisements, and other nonfiction paratext. Of our 8,918 books, only 7,304 are in full text; since we only produced embeddings of these books, the embedding method had a slight disadvantage on the fiction corpus 7[].

We drew information about authors’ years of birth from Underwood et al. 2022, which gave us author ages for 3,272 works in the period we were analyzin1g9][.

We also created a subset of “critically discussed” works by finding the titles and authors of our fiction corpus in our literary studies corpus. This group of 463 books was compared to a contrast set with the same distribution across time, but never mentioned in that corpus.

3. Methods

We measured both divergence from the past (which following Barron et al. 2018 we call “novelty”) and divergence from future documents (“transience”). But most of the results below are based on the composite quantity they call “resonance” (novelty minus transience). To avoid any suggestion of causality we call this quantity “precocity.” A text with high precocity simply “looks later than” peers published in the same year. We calculate these quantities using three diferent representations of texts.

3.1. Topic models

We topic modeled our corpora using the implementation of LDA in MALLET, and divided documents into chunks of at least 512 tokens 1[ 1, 2 ]. For more detail, see Appendix A. We compared documents by measuring Kullback-Leibler divergence on topic distributions, following Barron et al. 2018 [ 1 ].

3.2. Document embeddings

Topic models are limited to lexical evidence. It seemed plausible that neural document embeddings, compared via cosine distance, might capture a richer representation of text. We experimented with several diferent embedding strategies. Of-the-shelf embeddings performed very poorly, even if they were at the top of the leaderboard for contemporary applications. Finetuning using the sentence Transformers library was necessary to produce embeddings more suited to the specialized subject matter and temporal range (1890–2017) of this experime1n6t][. See Appendix B for details of our fine-tuning strategy.

3.3. Perplexity

Vicinanza et al. measure a quantity they call “prescience,” which is calculated by comparing the mean perplexity of a document’s sentences in two models—one trained via masked language modeling on its own period, and one trained on a future period. Sentences that have lower perplexity in the future (which become more probable in the future) will have high prescience. In bibliometrics, a loosely similar method has been used to compute nove2l0ty,2[ 1, 17 ].

We tested Vicinanza’s definition of prescience on our corpora, but found that we got much greater predictive power by using an expanded version of the method that included both past and future. Instead of subtracting future perplexity from a document’s perplexity at time of publication, we subtract it from perplexity calculated in the past.

precocity= 2 ⋅ perplexitypast − perplexityfuture perplexitypast + perplexityfuture (1)

This measures not just anticipation of a specific future period, but a quality of being “ahead of the curve,” where the curve is inferred from the whole time window around publication of a text. For further details see Appendix F.

3.4. Details of precocity calculation

Documents were divided into chunks for all three of these strategies, and chunks were characterized individually. For the first two methods this meant that each chunk was compared to all the other document chunks in the preceding (and following) 20 years. Perplexity relied on models that characterize a 12-year period, so direct chunk-to-chunk comparisons were not required. The full span from the “past” model to the “future” model was 36 years, rendering the scale of the perplexity calculation comparable to the 41-year span of the other two methods.

It is certainly possible to characterize a document through the mean precocity of its chunks. But an argument can also be made that what matters, socially, is often not the average tenor of a document, but its most surprising and forward-looking moment. For this reason we also tested an alternate strategy that characterized documents by selecting the top 25% of their chunks with highest precocity, and taking the mean of those values.

An alert reader will anticipate that questions of circularity might emerge when texts quote each other or were written by the same author. See Appendix C for our solution to these problems. In practice these efects were very small; excluding or leaving in texts that quote each other made almost no diference.

3.5. Regression strategy

We assess the explanatory power of precocity through a multiple linear regression that includes terms for precocity, precocity squared, and novelty (which gives the regression leverage to separate the components of precocity that refer to the past or to the future). Date of publication is also present as a control variable.

4. Results

We’ll begin with a quick overview of the variance explained when six diferent methods of text analysis are applied to predicting five social variables.

As we predicted, textual innovation is associated both with prominence and with youth (even though a text’s prominence is anti-correlated with youth in this data). The best-performing methods were able to explain 7-9% of the variance in prominence (e.g., citation count) simply by identifying works that were (textually) ahead of the curve—more like the future than the past.

It is difÏcult to say whether explaining 7-9% of social variance is good performance, because we don’t know how much of a work’s prominence is really determined by innovation–and how much by factors like institutional prestige. Some research suggests that the answer varies from one discipline to another22[]. It nevertheless seems reasonable to take social variance explained as a heuristic to choose between methods—for while we don’t know the real efect size, it is unclear how significant efects larger than the real one would be produced.

So what did we learn about measuring precocity? The clearest lesson is that the signal tended to be strongest when we measured documents “at their most forward-looking,” by averaging the 25% of passages with the highest precocity scores. In all of the tests we ran, this method aligned better with social evidence than a method that averaged all passages. One might infer that citations—and more surprisingly, critical references to fiction—are often motivated by innovations expressed in a relatively small part of a text.

Second, on examining regression coefÏcients, transience (diference from the future) provides slightly stronger evidence of failure than novelty (diference from the past) provides evidence of success. The diference between these variables was not huge, however, and there was nothing to be gained by discarding information about the past. The original perplexity method in Vicinanza et al. 2022, which only included information about the future, achieved r2 less than half as large as the improved method we describe in the table above. Comparing texts only to the past, or only to the future, would admittedly make it easier to use causal language. Precocity, which characterizes a text in relation to a whole time window around its publication, is hard to interpret causally. But if causal explanation is not being claimed, there is no reason not to use both time arrows at once.

All three representations of text (topic models, embeddings, and perplexity) performed well in some cases. Topic models seemed to predict prominence well, while embeddings performed well on age—but we don’t have enough data points to generalize. If any conclusion can be drawn here, it might be “a dog that doesn’t bark.” We found no evidence that neural models of text systematically outperformed lexical models. On the contrary, lexical topic models displayed consistently strong performance across tasks and corpora.

5. Discussion

We found clear alignment between textual measures of precocity and two independent kinds of social evidence that we expected to align with change (prominence and authorial youth). There is no absolute ground truth in this domain, but statistically significant relationships across three corpora do increase our confidence that text analysis can locate a leading edge of cultural change.

We also consistently found a better fit with social evidence when we represented documents through the 25% of passages with highest precocity. It seems likely that significant innovations are often concentrated in a small portion of an article or work of fiction.

However, we also found that precocity can be measured in diferent ways, which do not always agree with each other. Since change is taking place in a space that has multiple dimensions, the same text can be at the leading edge on one axis and lagging behind on another. Measures of similarity based on a topic model seemed to excel at predicting citations and public reputation. Transformer-based embeddings may be better at catching tacit signals of generational style.

Since topic modeling is an older representation of text, its strong performance overall may require discussion. We don’t have a confident answer here, but for what it’s worth, topic models are explicitly designed to factor a corpus into latent variables. Document embeddings, by contrast, don’t have any representational goal at the corpus level. The embeddings we used are tuned contrastively, using the Sentence Transformers libra1r6y].[ But that process is not guaranteed to model the corpus in a principled way—which might be a disadvantage in an experiment that seeks to measure a document’s relation to corpus-level trends.

At least for now, researchers wrestling with questions about textual change are well advised to evaluate the performance of a principled lexical model as a baseline and confirm that embeddings do actually improve on it before relying on embeddings alone. It is not safe to assume that a model will perform better simply because it captures information about word order.

Public data and code

Data and code for this project are available on GitHuhbt:tps://github.com/IllinoisLiteraryLab /novelty/tree/mai n,and will also be archived on Zenodo.

Contribution statement Authors are listed alphabetically here.

Conceived and designed the analysis: Becca Cohen, Sarah Griebel, Lucian Li, Jiayu Liu, Jaihyun Park, Jana Perkins, Ted UnderwoodW;rote the paper: Becca Cohen, Sarah Griebel, Lucian Li, Jiayu Liu, Jaihyun Park, Jana Perkins, Ted UnderwooCd;ollected the data: Becca Cohen, Sarah Griebel, Lucian Li, Ted UnderwooCd;ontributed data or analysis tools: Becca Cohen, Sarah Griebel, Lucian Li, Ted UnderwoodP;erformed the analysis: Sarah Griebel, Ted Underwood.

Acknowledgments

This work made use of the Illinois Campus Cluster, a computing resource that is operated by the Illinois Campus Cluster Program (ICCP) in conjunction with the National Center for Supercomputing Applications (NCSA) and which is supported by funds from the University of Illinois at Urbana-Champaign—specifically, through the Illinois Computes program. This work also used the Delta system at the National Center for Supercomputing Applications through allocation xras-ncsa-72 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296. Some fiction data for this project was provided by HathiTrust Digital Librar7y][.

Appendices A. Topic models

Topic granularity will vary if a corpus includes many more texts in some periods than others, and this could be problematic for a project interested in comparisons across time. So our procedure in every case was:

1. Restrict the corpus to an even distribution across time. 2. Generate a 250-topic model with MALLET, including an “inferencer.” 3. Use the inferencer to generate topic distributions for documents that had to be left out of the “flat” distribution in step 1.

Using this model, we assessed novelty, transience, and precocity by measuring the K-L divergence between texts. K-L divergence is an asymmetric measure; we took the document being characterized as the reference probability distribution, and compared both past and future documents to that reference point.

B. Embeddings

We began by testing of-the-shelf GTE embeddings [ 9 ]. When these performed poorly, we realized that embeddings are trained mostly on twenty-first-century material, and fine-tuning would be needed to give them a better chance of representing an earlier period.

The tuning method we ultimately adopted relies on multiple negatives ranking loss, as implemented in Sentence Transformers6[ , 16 ]. That is, the training dataset includes only positive pairs of similar passages; negative pairs are created implicitly by misaligning the passages in a batch. We created positive pairs mostly by selecting adjacent passages from the same article (or work of fiction). But we adopted several tricks to prevent the model from learning a model of similarity defined purely by vocabulary overlap. First, we used GPT-3.5 to paraphrase and condense one element of some pairs1[ 5 ]. Paraphrasing up to 18% of pairs seemed to improve results. Second, in training embeddings for fiction, we replaced personal names in one element of each pair—preserving first and last names, and gender signals, as much as possible. Both of these changes made the learning task more difÏcult and improved alignment with social evidence. We used these datasets to fine-tune RoBERTa [ 10 ].

We also explored several alternate approaches that aren’t represented in the final paper. For the task of predicting citations, we experimented with embeddings that were trained specifically to identify the kind of similarity between articles that produces citation. Here, positives were sentences from articles related by citation, and negatives were pairs of sentences identified by of-the-shelf embedding methods as sharing intellectual influence, despite no documented citation existing between the two articles. Our hypothesis was that these pairs represent spurious or coincidental similarities in language not necessarily associated with the type of intellectual influence we are trying to measure. We took these pairs and fine-tuned the GTE model, through Cosine Entropy Loss, assigning high similarity to correctly identified citation pairs and low similarity to false identified pairs9[].

Since we were concerned that embeddings might perform less well on long passages than on individual sentences, we also tested a strategy where we generated embeddings on single sentences, then clustered them, and took the cluster centroids as synthetic “document embeddings.” This did not improve performance.

An alternate approach we have not yet checked would be to train embeddings entirely from scratch on these corpora. Some recent studies suggest that even older methods of doing that, like doc2vec, can outperform topic models on clustering tas1k8s][.

We embedded passages of up to 512 tokens, with the constraint that we divide passages only at sentence breaks. Note that the chunks used for topic modeling were generally combinations of two or more embedding chunks; this diference of size was permitted in order to emphasize the strengths of both methods, without hindering either one.

C. Text-reuse detection

We avoided comparing any papers written by the same author. We also aimed to avoid comparing chunks of text that directly quoted each other, as including these, we estimated, would create a circularity in the precocity calculation for such chunks, directly guaranteeing that it would correlate with citation.

To avoid this circularity, we looked for both the existence of the cited author’s last name or a string of six or more matching words that were in single or double quotations within the citing paper. If either of these are found, the chunk is not used for comparison. It is important to note that the whole paper is not excluded from comparison; only the ofending chunk.

D. Author age determination

For the fiction corpus we could rely on previously published data to determine authors’ years of birth [ 19 ].

To create analogous data for literary scholars, we estimated years of birth for a sample of 1,093 authors (and 2,646 articles) through a mixture of manual research and searches on the VIAF API. A model was trained to distinguish true VIAF matches from false ones. We estimate that we achieved overall accuracy of greater tha9n0%; this estimate is based both on the accuracy of the VIAF model and on manually checking a sample of articles.

E. Corpus construction F. Timeline for perplexity calculation

We calculated perplexity using RoBERTa on chunks of up to 512 tokens (the same ones we used for embedding) 1[0]. We divided the timeline into overlapping 12-year periods with a 4-year ofset, which ends up meaning that a text published in 1968-1971, for instance, would be compared to a past model trained on 1952-63 and a future model trained on 1976-87. But a text published in 1964-67 would be compared to a past model trained on 1948-59 and a future model trained on 1972-83.

Our goal in creating 12-year models, but moving them forward 4 years at a time, was to create sufÏciently large corpora for training while ensuring that texts were not greatly (dis)advantaged by their position within a time step.

G. Domain insights

Our primary goal in this paper is to validate a method. But it is also easy to see how this method could be used to illuminate substantive research questions about a genre or academic discipline. To give a quick sense of what it might reveal, we’ve visualized the seven journals that comprise our literary studies corpus, along with a selection of authors who have exceptionally high precocity and/or an exceptionally high number of citations.

Citation counts are already public. But precocity—vertical position in Figure 1—is new information. Precocity does tend to correlate with citations, as is visible in the positive slope of the journals. But journals that attract diferent numbers of citations (liNkew Literary History and Critical Inquiry) may nevertheless be close to indistinguishable when it comes to precocity— which suggests they have substantively equal power to predict trends in the discipline. In other cases, journals that aren’t distinguished by citation count can be distinguished by precocity. PMLA is the flagship journal of the Modern Language Association, and arguably the highestprestige venue in this group. It attracts almost as many citations aNsew Literary History or ELH, but its position on the vertical axis suggests that editorial practices have sometimes been more conservative (as perhaps befits the journal of a large professional organization).

The apparent negative slope of author names is an artifact of the process we used to select exceptional authors, which deliberately highlights names on the periphery. If we plotted all authors, we would get a Gaussian cloud of points with the same slope and center as the journals (but much larger, since authors are associated with fewer articles and thus aren’t pulled to the origin as strongly by the law of averages).

The names of well-known critics, like Fredric Jameson and Gayatri Spivak, tend to be found in the upper right corner, suggesting that they were not only widely cited but prescient (or influential—causality is impossible to determine here). Moving up and to the left we find names that may be less familiar, but that our algorithm suggests were also ahead of the curve. Carl E. W. L. Dahlström is an early-twentieth-century critic whose articles have almost never been cited, although they anticipate subsequent trends.

On the right side of the graph we find a few widely-cited authors who aren’t especially distinguished by precocity. This is not necessarily a negative reflection on their work. For instance, several authors in this region (Richard Rorty, Jacques Derrida, and Ian Hacking) are well-known philosophers who were occasionally invited to publish in literary studies journals. Since they can hardly expect to convert literary scholars into philosophers en masse, these honorific late-career publications won’t stand at the beginning of a long tradition of similar work, and therefore won’t have high precocity. In short, there can be more than one kind of influence. Precocity measures a text’s relation to a specific corpus, and may not capture all the intellectual influences that flow between corpora. It is nevertheless easy to see how this metric could be used to pose questions about editorial practices and career arcs within a discipline.

H. Preregistration and paths not taken

Most of the methodological details above were preregistered in Fall 250]2.3B[ut the experimental plan did change in some important ways afterward. In particular, our embedding strategy changed several times, after of-the-shelf GTE embeddings proved not to be competitive. Also, comparison to authorial age wasn’t part of our original plan. A critical reader might (correctly) interpret these adjustments to our plan as eforts to find some method or context that would allow Transformer-based methods to outperform a topic model, as we had originally expected. If we had followed our original experimental plan exactly, the result would have been a simple endorsement of topic modeling. Evidence of our struggle to avoid or complicate that conclusion may perhaps make it even more persuasive.

There is also a question we proposed in the preregistration, and did investigate, but haven’t discussed above for reasons of space. Some researchers may wonder whether it really makes sense to compare a text chunk to all the parts of all documents in the preceding and following 20 years. One could argue that mystery novels, for instance, are not really innovating relative to science fiction, but to other mystery novels. One way of taking this into account—which performed well in some previous work—was to compare chunks only to a subset of very similar chunks in the past and future (say the top 5%) 1[ 9 ]. We also tested that strategy here, but it didn’t often improve on other approaches, and so we’ve deferred discussion to this appendix.

[1]

A. T. J.

Barron ,

Huang ,

R. L.

Spang , and S. DeDeo. “Individuals, Institutions, and Innovation in the Debates of the French Revolution” . InP:roceedings of the National Academy of Sciences 115.18 ( 2018 ). Ed. by D. S. Bassett, pp. 4607 - 4612 . doi: 10 .1073/pnas.1717729115. eprint:https://doi.org/10.1073/pnas.171772911 5.

[2]

D. M.

Blei ,

A. Y.

Ng , and M. I. Jordan. “Latent Dirichlet Allocation”. JIonu:rnal of Machine Learning Research 3 ( 2003 ), pp. 993 - 1022 . url: https://www.jmlr.org/papers/volume3/bl ei03a/blei03a.pdf.

[3]

Bornmann ,

Tekles ,

H. H.

Zhang , and

F. Y.

Ye . Do We Measure Novelty When We Analyze Unusual Combinations of Cited References? A Validation Study of Bibliometric Novelty Indicators Based on F1000Prime Data . 2019 . arXiv: 1910 . 03233 [cs .DL]. url: htt ps://arxiv.org/abs/ 1910 .03233.

[4]

Burns ,

Brenner ,

Kiser ,

Krot ,

Llewellyn , and

Snyder . “ JSTOR - Data for Research” . In:Research and Advanced Technology for Digital Libraries . Ed. by

Agosti ,

Borbinha ,

Kapidakis ,

Papatheodorou , and

Tsakonas . Berlin, Heidelberg: Springer Berlin Heidelberg, 2009 , pp. 416 - 419 .

[5]

Griebel ,

Cohen ,

Li ,

Liu ,

Park ,

J. M.

Perkins , and W. E. UnderwooCdo.mparing Measures of Textual Innovation. 2023 . doi: 10 .17605/osf.io/a3g6e. url: osf.io/a3g6e.

[6]

M. L.

Henderson ,

Al-Rfou ,

Strope ,

Sung ,

Lukács ,

Guo ,

Kumar ,

Miklos , and

Kurzweil . “EfÏcient Natural Language Response Suggestion for Smart Reply” . In: CoRR abs/1705 .00652 ( 2017 ). arXiv: 1705 .00652. url: http://arxiv.org/abs/1705.00652.

[7]

Jett ,

Capitanu ,

Kudeki ,

Cole ,

Hu ,

Organisciak ,

Underwood , E. Dickson Koehl ,

Dubnicek , and

J. S.

DownieT.he HathiTrust Research Center Extracted Features Dataset ( 2 .0). HathiTrust Research Center. 2020 . doi1: 0 .13012/r2te-c227. url: https://do i. org/10 .13012/R2TE-C227.

[8]

R. M.

Kinney ,

Anastasiades ,

Authur , I. Beltagy ,

Bragg ,

Buraczynski , I. Cachola,

Candra ,

Chandrasekhar ,

Cohan ,

Crawford ,

Downey ,

Dunkelberger ,

Etzioni ,

Evans ,

Feldman ,

Gorney ,

D. W.

Graham ,

Hu ,

Huf ,

King ,

Kohlmeier ,

Kuehl ,

Langan ,

Lin ,

Liu ,

Lo ,

Lochner , K. MacMillan, T. C. Murray,

Newell ,

S. R.

Rao ,

Rohatgi ,

Sayre ,

Shen ,

Singh ,

Soldaini ,

Subramanian ,

Tanaka ,

A. D.

Wade ,

L. M.

Wagner ,

L. L.

Wang ,

Wilhelm ,

Wu ,

Yang ,

Zamarron , M. van Zuylen ,

and D. S.

Weld . “The Semantic Scholar Open Data Platform” . In: ArXiv abs/2301 .10140 ( 2023 ). url: https://api.semanticscholar.org/CorpusID:2561945 45 .

[9]

Li ,

Zhang ,

Long ,

Xie , and

Zhang . “ Towards General Text Embeddings with Multi-Stage Contrastive Learning” . Ina:rXiv preprint arXiv:2308.03281 ( 2023 ).

[10]

Liu ,

Ott ,

Goyal ,

Du ,

Joshi ,

Chen ,

Levy ,

Lewis ,

Zettlemoyer , and

Stoyanov . “ RoBERTa: A Robustly Optimized BERT Pretraining Approach” . ICn:oRR abs/ 1907 .11692 ( 2019 ). arXiv: 1907 .11692. url: http://arxiv.org/abs/ 1907 .11692.

[11] A. K. McCallum.MALLET: A Machine Learning for Language Toolkit . 2002 . url: http://m allet. cs.umass.edu.

[12] J. M. Meisel , M. Elsig , and E. RinkeL.anguage Acquisition and Change: A Morphosyntactic Perspective . Edinburgh: Edinburgh University Press, 2013 .

[13]

R. K.

Merton . “ The Matthew Efect in Science” . In: Science 159.3810 ( 1968 ), pp. 56 - 63 . doi: 10 .1126/science.159.3810.56. url: https://www.science.org/doi/10.1126/science.159.3810 .56.

[14]

W. E.

Miller and J. M. ShanksT .he New American Voter. Cambridge, MA: Harvard University Press, 1996 .

[15]

Ouyang ,

Wu ,

Jiang ,

Almeida ,

C. L.

Wainwright ,

Mishkin ,

Zhang , S. Agarwal,

Slama ,

Ray ,

Schulman ,

Hilton ,

Kelton ,

Miller ,

Simens ,

Askell ,

Welinder ,

Christiano ,

Leike , and R. LoweT.raining Language Models to Follow Instructions with Human Feedback . 2022 . arXiv: 2203 .02155 [cs.CL]. url: https://arxiv.or g/abs/2203.02155.

[16]

Reimers and I. Gurevych. “ Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks” . In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics , 2019 , pp. 3982 - 3992 . url: https://arxiv.org/abs/ 1908 .10084.

[17]

Shibayama ,

Yin , and

Matsumoto . “ Measuring Novelty in Science with Word Embedding” . In: Plos One 16.7 ( 2021 ), pp. 1 - 16 . doi: 10 . 1371 / journal . pone . 025403 .4url: https://doi.org/10.1371 /journal.pone.02540 3 . 4

[18]

Sobchuk and

Šeļa . “Computational Thematics: Comparing Algorithms for Clustering the Genres of Literary Fiction” . InH:umanities and Social Sciences Communications 11.1 ( 2024 ), p. 438 . doi: 10 .1057/s41599-024-02933-6. url: https://doi.org/10.1057/s41599 -024-02933-6.

[19]

Underwood ,

Kiley ,

Shang , and

Vaisey . “ Cohort Succession Explains Most Change in Literary Culture” . InS:ociological Science 9.8 ( 2022 ), pp. 184 - 205 . doi: 10 .151 95/v9.a8. url: http://dx.doi.org/10.15195/v9.a8.

[20]

Vicinanza ,

Goldberg , and

S. B.

Srivastava . “A Deep-Learning Model of Prescient Ideas Demonstrates That They Emerge from the Periphery” . InP:NAS Nexus 2.1 ( 2022 ), pgac275 . doi: 10 .1093/pnasnexus/pgac275. url: https://doi.org/10.1093/pnasnexus/pgac 275.

[21]

Yin ,

Wu ,

Yokota ,

Matsumoto , and

Shibayama . “ Identify Novel Elements of Knowledge with Word Embedding” . InP:los One 18.6 ( 2023 ), pp. 1 - 16 . doi: 10 .1371/jour nal. pone.0284567 . url: https://doi.org/10.1371 /journal.pone.02845 6 . 7

[22]

Zhang ,

Xie , and

Song . “ Measuring the Impact of Novelty, Bibliometric, and Academic-Network Factors on Citation Count Using a Neural Network” . JIno:urnal of Informetrics 15.2 ( 2021 ), p. 101140 . doi: https://doi.org/10.1016/j.joi. 2021 . 101140 .url: https://www.sciencedirect.com/science/article/pii/S175115772100011. 0