<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">On Classification with Large Language Models in Cultural Analytics</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">David</forename><surname>Bamman</surname></persName>
							<email>dbamman@berkeley.edu</email>
							<affiliation key="aff0">
								<orgName type="department">School of Information</orgName>
								<orgName type="institution">UC Berkeley</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Kent</forename><forename type="middle">K</forename><surname>Chang</surname></persName>
							<email>kentkchang@berkeley.edu</email>
							<affiliation key="aff0">
								<orgName type="department">School of Information</orgName>
								<orgName type="institution">UC Berkeley</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Li</forename><surname>Lucy</surname></persName>
							<email>lucy3_li@berkeley.edu</email>
							<affiliation key="aff0">
								<orgName type="department">School of Information</orgName>
								<orgName type="institution">UC Berkeley</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Naitian</forename><surname>Zhou</surname></persName>
							<email>naitian@berkeley.edu</email>
							<affiliation key="aff0">
								<orgName type="department">School of Information</orgName>
								<orgName type="institution">UC Berkeley</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">On Classification with Large Language Models in Cultural Analytics</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">D58530B6DA61BB0D92F7DF11E56A2292</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:50+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Classification, sensemaking, large language models (N. Zhou) 0009-0003-1171-9408 (D. Bamman)</term>
					<term>0009-0008-6430-3701 (K. K. Chang)</term>
					<term>0000-0002-6021-7370 (L. Lucy)</term>
					<term>0009-0005-1991-2258 (N. Zhou)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this work, we survey the way in which classification is used as a sensemaking practice in cultural analytics, and assess where large language models can fit into this landscape. We identify ten tasks supported by publicly available datasets on which we empirically assess the performance of LLMs compared to traditional supervised methods, and explore the ways in which LLMs can be employed for sensemaking goals beyond mere accuracy. We find that prompt-based LLMs are competitive with traditional supervised models for established tasks, but perform less well on de novo tasks. In addition, LLMs can assist sensemaking by acting as an intermediary input to formal theory testing.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>One of the core tasks in cultural analytics is the act of classification. We sort discrete thingsbooks, passages, sentences, works of art, songs-into buckets whose boundaries have either been established by long theoretical traditions (such as categories of poetic form <ref type="bibr" target="#b22">[23,</ref><ref type="bibr" target="#b44">45]</ref>), or that are newly circumscribed in the course of our research (cf. "technological strangeness" <ref type="bibr" target="#b35">[36]</ref>). We carry out this act for several reasons: to test the association of features with those categories in order to better understand the categories themselves; to recreate human judgment at a scale beyond what people are able to carry out alone; and to problematize original category boundaries, leveraging machines to help us poke holes in our current understanding of them. As models, classifiers "return us to the process …through which we construct our knowledge of phenomena that exceed our direct observation" <ref type="bibr">(Piper [29,</ref><ref type="bibr">p. 651]</ref>).</p><p>Each one of these goals requires different machinery to accomplish it, from interpretable linear models that allow us to isolate the effect of an attribute on a category, to complex models that may sacrifice such interpretability for predictive accuracy in those cases when accurate judgments are all that matter.</p><p>As large language models (LLMs) have arisen to become the state of the art for a wide range of tasks in natural language processing, a growing body of work is investigating the tradeoffs they bring for this act of classification in fields for which text is data <ref type="bibr" target="#b44">[45,</ref><ref type="bibr" target="#b48">49,</ref><ref type="bibr" target="#b30">31]</ref>. High-quality LLMs are generally far more expensive than alternatives (in compute, environmental, and financial costs), and come burdened with a Pandora's box of pre-training-their source of state-of-theart performance and general-purpose knowledge, but also the vector along which undesirable knowledge is carried. At the same time, they offer hope for dramatically reducing the amount of human effort required for meaningful humanistic inquiry <ref type="bibr" target="#b48">[49]</ref>-trading the scale of manual annotation often required by supervised models for more focused injection of human and domain knowledge into prompts.</p><p>To shed some light on the contours of this landscape, we investigate the intersection of LLMs, classification and cultural analytics along four dimensions:</p><p>• First, we survey recent work published in cultural analytics (CHR, Journal of Cultural Analytics, NLP conferences, etc.) to identify the uses to which researchers are currently employing classification. We group those uses into a typology of six distinct categories. • Second, we identify a set of ten tasks supported by publicly available datasets on which we can examine tradeoffs for some of those uses. These tasks represent a wide range of phenomena of interest to cultural inquiry and provide a sample of the real ways in which researchers now are using classification in their work. We focus on English-language datasets only in this work; our results are therefore limited to the English-language affordances of downstream applications. • Third, we use that suite of datasets to benchmark the performance of different classification models, including bag-of-words linear models, masked language models (such as BERT and RoBERTa) and large language models (both prompting and fine-tuning). We find prompt-based LLMs to be competitive with traditional supervised models for established English-language tasks, but lag behind supervised models for de novo tasks.</p><p>While cautioning against the accuracy-maximizing incentives that benchmarks create <ref type="bibr" target="#b27">[28]</ref>, this suite of datasets allows us to assess the tradeoffs between computation, cost and accuracy for several of those use cases where classification accuracy is paramount. • Fourth, we carry out an exploratory case study bringing together LLMs with this suite of datasets for a goal that does not seek simply to maximize classification accuracy: category sensemaking. While this study can only be suggestive, we find that LLMs provide a mechanism for exploratory data analysis that complements existing methods.</p><p>We release data for others to explore these tradeoffs themselves; data and code to support this work can be found at https://github.com/bamman-group/ca-classification-data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">How are we using classification?</head><p>As distinct from mere automation, the use of classification in cultural analytics is often employed for the purpose of sensemaking. At a high level, the act of supervised classification entails learning a function ĥ(𝑥) from an input source 𝑥 (e.g., text) to a choice (or choices) 𝑦 drawn from an enumerable output space 𝒴 (e.g., for genre classification, 𝒴 = {science fiction, western, detective story, …}), though for our purposes, we also include regression problems (where 𝑦 ∈ ℝ). Classification requires the existence of a previously selected set of categories 𝒴 , which is a political choice grounded in a system of knowledge <ref type="bibr" target="#b11">[12]</ref>; as Bowker and Star note, "each category valorizes some point of view and silences another" <ref type="bibr">[4, p. 5]</ref>. This is an important subject of critique not only when the objects of study are people <ref type="bibr">[9,</ref><ref type="bibr">2]</ref> but also in recognizing that classification can only make statements with respect to the categories that are classified (and not to the gaps that exist between them).</p><p>For much work in this space, the representation of 𝑥 is critical-e.g., the choice of representing 𝑥 as a vector corresponding to word counts from a defined vocabulary, or other more specialized features (e.g., type-token ratio). To be precise, we can describe this representation as a function of its own (e.g., 𝑓 (𝑥)), which is the direct input to a classification function ( ĥ(𝑓 (𝑥))), though modern neural classifiers convolve both functions into one learned operation.</p><p>We can separate out these uses of classification into distinct yet inter-related categories:</p><p>1. Category sensemaking. One of the most widely used applications of classification in cultural analytics is understanding the characteristics of a category through the features that are predictive of it. We can view this as a sensemaking exercise, whose main degree of freedom is the choice of representation function 𝑓 (𝑥). This function can vary from a kitchensink approach (throw every feature into a model and learn what is important) to those that define specific, theory-driven feature classes. We see many examples of this kind of sensemaking. Long and So <ref type="bibr" target="#b23">[24]</ref> use a featurized model to identify the defining characteristics of stream-of-consciousness form-above all, type-token ratio-and how those defining features both converge and diverge across English and Japanese stream-of-consciousness texts. Simeone, Koundinya, Kumar, and Finn <ref type="bibr" target="#b35">[36]</ref> use classification to uncover the characteristics of technological nova-moments in science fiction texts where new technologies are introduced; using a bag-of-words featurization, they find that strangeness is characterized not merely by diction (the choice of words, such as "ship") but also by morphosyntax (presence of determiners and prepositions). Several studies use the machinery of classification to interrogate the boundaries of gender performance in English <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b41">42,</ref><ref type="bibr">7]</ref> and French <ref type="bibr" target="#b42">[43]</ref>. Bode <ref type="bibr">[3]</ref> uses classification in combination with topic modeling to learn that the theme of "nonmetropolitan colonial spaces" most strongly differentiates Australian fiction from British and American fiction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Top-down theory testing.</head><p>A second, but related, use of classification is explicitly theorydriven: articulating a formal test (often of association between a feature and a category) and carrying out that test. We can differentiate this use from exploratory sensemaking by the falsifiability of its claims: a theoretical claim is articulated, and a corresponding null hypothesis is either rejected or not when evaluated on data. For Piper and Bagga <ref type="bibr" target="#b29">[30]</ref>, theory provides the boundaries of operationalization (in defining the feature space through which a text is represented), and classification provides a test for whether that operationalization is sufÏciently descriptive to differentiate texts that are narrative from those that are not across a variety of contexts. The boundaries between this use case and that of category sensemaking are often fluid-even within this theory-driven choice of feature space, Piper and Bagga <ref type="bibr" target="#b29">[30]</ref> also engage in sensemaking by identifying a minimal set of features from predictive models that characterize narrativity and ground that minimal set in a more precise "distant worlds" theory of narrative. Steg, Slot, and Pianzola <ref type="bibr" target="#b36">[37]</ref> take a similar theory-driven approach, testing the degree to which measures of reader suspense, surprise and curiosity are predictive of narrativity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Challenging category boundaries.</head><p>A third use, related to use #1, employs classification to challenge the initial category boundaries of a concept. Long and So <ref type="bibr" target="#b22">[23]</ref> is an exemplary use of this; this work uses classification primarily to explore and challenge the category boundaries of haiku by examining misclassifications-cases where a model classifies a non-haiku poem as haiku and vice versa. They additionally engage in close reading of the model itself, by identifying salient features of texts that are statistically associated with haiku and find that the language of natural imagery and brevity are strong indicators of the form (beyond simple syllabic patterns). Work by Broadwell, Mimno, and Tangherlini <ref type="bibr">[5]</ref> likewise trains a folktale type classifier on Danish narratives to identify "liminal stories that shift across existing categories" <ref type="bibr" target="#b22">(23)</ref>; these stories expose the ambiguity of the initial categories and highlight the ways in which their boundaries are permeable.</p><p>4. Category coherence. While use #3 assumes a fixed category object whose boundaries are unknown, a fourth use employs classification accuracy as a proxy for the coherence of a concept itself, under the assumption that statistical regularities could only exist for concepts that are somehow real. Much work in the first three uses carries with it an implicit theoretical test in the coherence of the category; Long and So <ref type="bibr">[24, p. 350</ref>] argue that the predictability of stream-of-consciousness texts vs. realist texts provides evidence for "some of degree of formal unification"; Thompson and Mimno <ref type="bibr" target="#b38">[39]</ref> use classifier accuracy at recognizing images from the Dada art movement as evidence of its coherence as a concept ("there is substance behind the name Dada, " 193) while stressing that such coherence is not absolute: classifier mistakes illustrate "the porous boundaries of the category" (194). In a similar vein, the counterfactual provides information about the dissolution of the construct. Underwood, Bamman, and Lee <ref type="bibr" target="#b41">[42]</ref> predict character gender from the attributes and actions associated with characters, and find that decreasing predictive accuracy over time suggests an increasing "instability" in the gender construct.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Search.</head><p>We might term a fifth category, search, as "classification-assisted close reading. " This line of work uses the machinery of a classifier as a finding aid to zero in on passages that instantiate a target category. Wilson, Ardanuy, Beelen, McGillivray, and Ahnert <ref type="bibr" target="#b45">[46]</ref> is an exemplary use of this approach. This work explores the concept of "atypical animacy" (machines that are depicted as animate vs. non-animate) in a large collection of historical books, newspapers and journal articles. After designing a method to identify passages of machine animacy in order to make them accessible to close reading, this work uses classification (and especially misclassification) to identify textual sites in which to interrogate child labor, slavery and the trope of "mere machines. " This use of classification is not dissimilar from the interactive use of methods like topic modeling and other unsupervised techniques in the act of close reading (e.g., Klein <ref type="bibr" target="#b19">[20]</ref> and Walsh and Antoniak <ref type="bibr" target="#b43">[44]</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>6.</head><p>Replacing human labeling at scale. A final use case is the application of classification as an instrumental device in downstream sensemaking: recreating human judgments at a scale beyond which people are able to carry out themselves. After establishing the validity and interpretability of a model, Long and So <ref type="bibr" target="#b23">[24]</ref> then use that model as a large-scale classifier, applying it to passages from 1700 novels in order to measure the association with nationality and genre with stream-of-consciousness; in this case, classification provides a new dependent variable over a much larger scale in order to test the association of other features.</p><p>In considering the ways in which the rise of large language models have the ability to influence this landscape of use, we might characterize those use cases by the degree to which calibrated predictive accuracy is critical for accomplishing the intended goals, vs. the degree to which a model enables interpretative sensemaking internally. All categories rely on baseline levels of both accuracy (to validate that a model has learned from data) and sensemaking (for its use to connect to knowledge about its object of study). But we might see use case #6 (replacing human labeling at scale) to lean more toward predictive accuracy as a goal, while use case #1 (category sensemaking) to lean more towards internal interpretability. We consider both of these axes in turn. First, in such cases where accuracy alone is paramount, when are LLMs needed? When are the tradeoffs worth it, and when do smaller, cheaper models sufÏce? Second, most interpretation-maximizing uses of classification in cultural analytics have generally relied on linear models (e.g., logistic regression, SVMs) to directly measure the influence of a feature on a category choice without the confounding effects of non-linearities that more complex models introduce. How are LLMs able to contribute to this goal? To answer both of these sets of questions, we identify a set of ten datasets used in the classification categories above, and put a range of models to the test to explore this landscape.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Datasets</head><p>We draw datasets from public repositories for the following papers. Most of the datasets were created to test specific hypotheses about the association of features with categories (e.g. typetoken ratio with stream-of-consciousness). Across all tasks, we create standard training/development/test splits (summarized in Table <ref type="table" target="#tab_0">1</ref>) to emphasize test validity, generally decreasing the size of training data used relative to the original work. To make the classification tasks more difÏcult for models, we also stratify across groups (e.g., author, genre) to isolate the target construct, so that items from the same group (e.g., texts by the same author) only appear within a single split. These modifications generally make the results reported here incomparable with the original results.</p><p>Atypical animacy. Data from Coll Ardanuy, Nanni, Beelen, Hosseini, Ahnert, Lawrence, McDonough, Tolfo, Wilson, and McGillivray <ref type="bibr">[8]</ref> differentiating mentions of machine-related terms (engine, machine, etc.) as animate vs. non-animate. Animate machines are those that are depicted as being alive (including people referred to as machines), while inanimate machine are not. We create train/dev/test splits at random.</p><p>Emotion. Data from Kim and Klinger <ref type="bibr" target="#b18">[19]</ref>, which identifies the emotions that characters are experiencing by tying mentions of characters in a sentence to mentions of a trigger emotion word. We transform this into a multiclass classification task by extracting sentences in which a character experiences a single emotion, and ask a model to predict the emotion for that character. We create train/dev/test splits so that passages from the same work appear in only one partition.</p><p>Folktales. Data from Hagedorn and Darányi <ref type="bibr" target="#b13">[14]</ref>, which assigns the ATU type <ref type="bibr" target="#b39">[40]</ref> to texts. We select only ATU types that are attested at least nine times among the text labels, and create train/dev/test splits to maintain the same label distribution across splits (i.e., at least three instances per type across each split).</p><p>Genre. We draw inspiration from Sharmaa, Hu, Wu, Shang, Singhal, and Underwood <ref type="bibr" target="#b32">[33]</ref> in using Library of Congress subject classification as a proxy for genre, and use a subset of 5 genres that work studied (science fiction, detective and mystery stories, adventure stories, love stories, and westerns). We draw texts from Project Gutenberg, sampling 5 passages (each approximately 500 words) from 150 books for each genre. To enable a multiclass classification problem, we only consider books that are tagged with one subject classification from the set above (so that works that are tagged with both "love stories" and "westerns" are excluded). We create train/dev/test splits so that texts by the same author appear in only one partition, and we select a maximum of 5 books per author.</p><p>Haiku. Data from Long and So <ref type="bibr" target="#b22">[23]</ref>, which contrasts haiku poems with non-haiku poems. We create train/dev/test splits so that poems by the same author appear in only one partition.</p><p>Hippocorpus. Data from Sap, Horvitz, Choi, Smith, and Pennebaker <ref type="bibr" target="#b31">[32]</ref>, which solicits firstperson stories written by workers on Amazon Mechanical Turk in three categories: recalled stories, which narrate real events transpiring within the past six months; imagined stories, fictional narratives on the same topic as a randomly selected recalled story; and retold stories, recalled stories told again 2-3 months later by the same workers. This is the only task where a human does not judge a label by inspection of a pre-existing text; accordingly, it is not possible to articulate the textual boundaries between those categories a priori. We create train/dev/test splits so that texts by the same author appear in only one partition.</p><p>Literary time. As our sole regression task, we draw data from Underwood <ref type="bibr" target="#b40">[41]</ref>, which labels the number of seconds that transpire in a fictional passage of approximately 250 words. We create train/dev/test splits so that texts from the same title appear in only one partition.</p><p>Narrativity. Data from Piper and Bagga <ref type="bibr" target="#b29">[30]</ref>, which contrasts passages from narrative genres (biography, fairy tales, novels) to passages from non-narrative genres (scientific abstracts, book reviews, supreme court proceedings). In this data, the genre of a text fully determines its narrativity status; in order to assess how models reason about narrativity rather than genre per se, we create train/dev/test splits so that texts of the same genre appear in only one partition (texts of the "fairy tale" genre, for instance, only appear in dev data).</p><p>Strangeness. Data from Simeone, Koundinya, Kumar, and Finn <ref type="bibr" target="#b35">[36]</ref>, which contrasts sentences mentioning "descriptions or introductions of technology and novel science" with those that do not, both drawn from Project Gutenberg texts. We create train/dev/test splits at random.</p><p>Stream-of-consciousness. Data from Long and So <ref type="bibr" target="#b23">[24]</ref>, which contrasts stream-ofconsciousness passages with control passages drawn at random from realist novels. The original work sampled control passages of fixed character lengths, leading to passages that break between words; we re-sample passages from Project Gutenberg texts of the same titles breaking only across sentences, sampling passage lengths to reflect the same empirical distribution of lengths in the SoC texts. We create train/dev/test splits so that the passages by the same author appear in only one partition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Accuracy</head><p>In the goals of classification for which maximizing accuracy is important, these ten tasks encode a wide range of different phenomena that shed some light on the affordances of LLMs compared to traditional supervised classifiers. We compare the performance of the following different classes of models on these tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Supervised models</head><p>Logistic/Linear regression. We train a regularized logistic regression model for all classification tasks and ridge regression for Literary Time, using a bag-of-words representation of text with a vocabulary size of 100K. We tune the ℓ 2 regularization strength on dev data.</p><p>BERT/RoBERTa. We train base English models of BERT <ref type="bibr" target="#b10">[11]</ref> and RoBERTa <ref type="bibr" target="#b21">[22]</ref> with a maximum 512-token input size, truncating excess text. We tune the learning rate on dev data.</p><p>Llama 3 8B. To test the ability of LLMs as supervised classifiers, we fine-tune Llama 3 8B <ref type="bibr" target="#b0">[1]</ref> on the training split with a multiclass classification head (i.e., selecting a discrete category as output, not a choice of word) for all classification tasks and a regression head for Literary Time, using low-rank adaptation <ref type="bibr" target="#b14">[15]</ref>. We represent a passage as the final layer output of the last word within it. Like BERT and RoBERTa, we use a maximum 512-token input size, truncating excess text. We tune the learning rate on dev data.</p><p>For each task, we optimize hyperparameters on development data using the optuna optimization library over 50 optimization trials, training up to 100 epochs with early stopping based on lack of improvement over 10 epochs on dev data. Appendix D illustrates the relationship between learning rate and accuracy on development data across all tasks for BERT, RoBERTa and Llama 3 8B; we see the best learning rate to be highly task-and model-specific, necessitating optimization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Prompting models</head><p>We select three LLMs to evaluate classification through prompting alone. For all models, we provide a description of the task and provide 10 shots as exemplars, with no Chain-of-Thought.</p><p>As Appendix A illustrates, we experiment with shot selection and Chain-of-Thought prompting on development data with GPT-4o, finding that while the number of shots generally improves performance (though not always significantly so), using Chain-of-Thought does not lead to any meaningful difference.</p><p>GPT-4o. We query GPT-4o through the OpenAI API. Across all tasks, inference over the test set cost $164.49.</p><p>Llama 3 70B. We run Llama 3 70B-Instruct locally across 4 L40S GPUs with a total of 192 GB of GPU memory. The Llama 3 context size is 8,192 tokens, which is shorter than the prompt length for the Folktales task (which includes descriptions for the 70 categories), so we do not assess Llama 3 on this task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Mixtral 8x22B.</head><p>We run Mixtral 8x22B-Instruct locally across 4 L40S GPUs with a total of 192 GB of GPU memory. Since Mixtral models are more memory-demanding than alternatives due to their mixture of experts, we load them with 4-bit quantization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Discussion</head><p>We present results in Table <ref type="table" target="#tab_1">2</ref>, reporting accuracy as a metric for all classification tasks and Spearman 𝜌 for Literary time. For all supervised models, we report the total wall clock training time in minutes across all hyperparameter optimization trials. Tasks are ordered (by left to right and top to bottom) by the difference in metric between best performing prompting method and best supervised method.</p><p>Established vs. de novo tasks. The diversity of tasks on which we assess models allows us to characterize disparate performance among them. Tasks measuring concepts that are generally widely known-the animacy of objects, determinations of science fiction vs. westerns, folktale types, character emotional states, and stream-of-consciousness-offer sites where LLMs are able to excel, either beating supervised models or coming close in performance to them. For tasks that measure largely de novo concepts-the passage of time, differentiating stories that are recalled from those that are imagined or retold, "technological strangeness"-the evidence from supervision appears to be important for accurate prediction.</p><p>Memorization. The outsized performance of GPT-4o on the folktale identification task raises the question whether that model has memorized that task; as we detail in Appendix B, we assess the influence of memorization on task performance using a membership inference method inspired by Shi, Ajith, Xia, Huang, Liu, Blevins, Chen, and Zettlemoyer <ref type="bibr" target="#b34">[35]</ref>. We see little evidence of memorization, and after applying correction for multiple hypothesis correction, no evidence for downstream impact.</p><p>Sufficiency of masked LMs. Among models that are optimized for each specific task using supervised data, we do not see a strong difference between base masked language models and Llama 3, which is pre-trained on a much larger amount of data, has 65 times more parameters, and generally takes an order of magnitude longer to train.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Sensemaking</head><p>The comparative performance we see reported above only speaks to high classification accuracy as a desideratum. This leaves aside the other, perhaps more important, ways in which we see classification being used as a tool for sensemaking in cultural analytics.</p><p>If we probe deeper into the ways in which linear classification methods have been used for making sense of categories, both theory-driven approaches and exploratory analysis make use of conceptualization (deciding on specific constructs to measure) and implementation (instantiating those constructs into specific measures), to use the terminology from Piper <ref type="bibr" target="#b28">[29]</ref>. A primary difference, however, comes in the role of prior knowledge and theorization guiding those choices. Top-down approaches often design models to test theories; exploratory approaches may also incorporate theory in the design of the representation space 𝑓 (𝑥), but a starting point is often to let the structure of the data guide our understanding of the concepts themselves. Topic modeling is a widely used example of this form; an unsupervised method learns structure from data which then guides the analysis that follows. We see many examples of this approach. Goldstone and Underwood <ref type="bibr" target="#b12">[13]</ref> use topic modeling to first learn that "critical theory" is a dominant theme in literary journals; after surfacing this fact, they more formally test its rise and fall over time; Bode <ref type="bibr">[3]</ref> likewise uses topic modeling to identify a set of themes organizing a collection of Australian, British and American fiction, then uses close reading and classification to narrow in on a formal analysis of nonmetropolitan colonial spaces as a major organizing principle. Sensemaking as an activity is driven in these cases by the structure found through exploratory models. Indeed, the practice of computational grounded theory <ref type="bibr" target="#b26">[27]</ref> is premised on exactly this movement between exploration through models, close reading, and formal testing.</p><p>If we consider the ways in which LLMs are able to serve some function in this larger sensemaking process, a natural starting point is their role in exploratory analysis. What structure exists in the data that can lead to formal theory-testing? We experiment with this use case by prompting GPT-4o to reason about the characteristics that differentiate a set of data points Consider the data below, which contains a list of text/label pairs that illustrate a set of categories (0, 1):</p><p>[{"text": "Strong called to the remaining ships of his fleet Take formation K. Land and attack the enemy base according to prearranged order. ", "label": 0} {"text": "She dithered for a moment then stepped back toward the kitchen and the stairwell. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3</head><p>GPT-4o output for characterizing the difference between Haiku and Non-Haiku poems through anonymized labels (only presented to the model as class "1" vs. "0"); for brevity, the first 5 characteristics are presented here; the full list can be found in Appendix C.5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Haiku</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Non-Haiku</head><p>Concise and Lyrical: Texts are often shorter, more concise, and have a lyrical quality.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Narrative or Descriptive Style:</head><p>Texts often tell a story or describe a scene in detail. Nature Imagery: Frequent use of natural imagery, such as flowers, trees, and weather.</p><p>Complex Structure: Sentences are often longer and more complex, with multiple clauses. Emotional and Sensory Language: Focus on emotions and sensory experiences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Historical or Mythological References:</head><p>Frequent references to historical events, mythological figures, or classical literature. Simple Structure: Sentences are generally simpler and more straightforward.</p><p>Character Focus: Emphasis on characters, their actions, and their emotions. Personal and Intimate: Often feels personal or intimate, as if sharing a private moment or thought.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dialogue and Monologue: Presence of dialogue or internal monologue.</head><p>attended by anonymized category labels, as illustrated in Figure <ref type="figure" target="#fig_0">1</ref>. We include as many data points as the context length allows.</p><p>Table <ref type="table">3</ref> presents the output of this experiment for the Haiku task; the full outputs for all ten tasks can be found in Appendix C. Here we see similar affordances for sensemaking as alternative models. Just as Long and So <ref type="bibr" target="#b22">[23]</ref> challenge the initial category boundaries of haiku by identifying naturalistic language and brevity as defining characteristics of it, so too does GPT-4o identify "nature imagery" and conciseness as characteristic. An examination of the outputs in Appendix C shows that the sensemaking enabled by this form of LLM prompting intersects with that enabled by alternative methods in these tasks' original papers. The defining features of Animacy identified by GPT-4o stress "the mechanical nature of humans or the human-like qualities of machines, " ( §C.1) including the focus identified by Wilson, Ardanuy, Beelen, McGillivray, and Ahnert <ref type="bibr" target="#b45">[46]</ref> on "mere machines. " This method identifies "narrative style" vs. "lack of narrative" ( §C.8) as the defining feature of the Narrativity task but offers alternatives to the distant worlds theory of Piper and Bagga <ref type="bibr" target="#b29">[30]</ref>-offering "descriptive language" and "imagery" as additional characteristics. In many ways, prompting LLMs to find the ways in which a set of categories are differentiated from each other within the evidence of a labeled dataset can suggest meaningful directions for more formal interrogation.</p><p>At the same time, it is clear that such LLM characterizations are not to be trusted at face value; we know this from prior work on LLM hallucinations <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b46">47,</ref><ref type="bibr" target="#b17">18]</ref> and confabulations <ref type="bibr" target="#b37">[38]</ref>, but also in examining its points of failure within this set of tasks. For Emotion ( §C.2), we see clear evidence of overfitting (e.g., "references to royalty or nobility" for the anticipation class, "mentions of specific names" for joy); and its description of the Folktale task ( §C.3) fails to meaningfully differentiate categories that it performs well on when given their description. Additionally, one of the risks of using pre-trained LLMs for exploratory data analysis is the degree to which the analysis they drive is a direct result of examining the data given, or a function instead of task knowledge acquired during pre-training. While we are careful to anonymize the category labels in the experiments above and not mention the classification task by name (e.g., "haiku"), it is clear that pre-training data informs the knowledge that models bring to this exploratory task, muddying the inferences we can make about data-in-itself. Understanding the role that pre-training plays both in model performance (e.g., through disparate memorization) and in any biases that result (e.g., through disparate focus on whose language is represented <ref type="bibr" target="#b24">[25]</ref>) is an important area of current research on the limits of these models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Discussion</head><p>As an exploratory exercise, the defining characteristics of categories seen through the lens of a prompted LLM provide suggestions on potentially salient dimensions that only later formal testing can truly assess the significance of. Is it indeed the case that westerns "focus on personal experiences and emotions" more than other genres ( §C.4)? Is sentence length indeed correlated with the duration of elapsed time ( §C.7)? While not providing an answer themselves, they suggest potentially interesting directions to test with more formal means (i.e., put through the stages of conceptualization and implementation). In this sense, we might view LLMs as tools for sensemaking-helping us understand categories and the boundaries that circumscribe them-in the same way that other exploratory methods like topic modeling make possible.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In this work, we outline the major ways in which current work in cultural analytics is employing classification, primarily as a method for sensemaking distinct from mere automation, and detail the ways in which large language models fit (and do not fit) within this landscape. For uses that seek to maximize predictive accuracy, LLMs offer competitive performance through prompting alone for established tasks, while traditional supervised methods excel for newly constructed phenomena (even in scenarios with limited training data). For sensemaking applications that aim to understand category constructs better through labeled data, LLMs may have a role to play in exploratory data analysis by outlining potential characteristics that can then be subjected to more formal testing. By considering a set of ten tasks considered in the space of cultural analytics, we complement an existing body of work that probes model knowledge of culture <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b33">34]</ref>) to focus in particular on the ways in which computational models can shed light on culture itself. Data and code to support this work can be found at https://github.com/bamman-group/ca-classification-data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Appendices</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Shots and Chain-of-Thought</head><p>We investigate the relationship between number of shots, Chain-of-Thought reasoning and accuracy for GPT-4o in Table <ref type="table">4</ref> below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4</head><p>GPT-4o dev performance by number of shots/chain of thought. Increasing the number of shots (from 1 to 10) largely helps performance (e.g., for animacy, genre, literary time, and strangeness), even if not significantly so. CoT generally does not improve performance. We do not test CoT on Hippocorpus since it does not originate as a human labeling task (as noted in §3 above). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Memorization</head><p>The outsized performance of GPT-4o on the folktale identification task raises the question whether that model has memorized that task; indeed, we can see that the text itself is often discoverable on Google attended with its label, as illustrated in Figure <ref type="figure" target="#fig_1">2</ref> below. Prior work in cultural analytics has assessed the impact of memorization on downstream tasks, with work demonstrating memorization influence on encyclopedic knowledge (predicting date of publication <ref type="bibr">[6]</ref>) but not on text-internal qualities like poetic form <ref type="bibr" target="#b44">[45]</ref>.</p><p>We assess instance-level memorization in prompt-based LLMs using a membership inference method similar to Shi, Ajith, Xia, Huang, Liu, Blevins, Chen, and Zettlemoyer <ref type="bibr" target="#b34">[35]</ref>; while that work identifies membership through the log probabilities of the 𝑘 least probable tokens in a text (necessitating probability assessments for every token), we identify first the 𝑘 = 5 least probable tokens (relative to the complete data for that task) and ask a model to predict them given a fixed context length of 25 tokens, as in the example below: My wife and I have been living in the same family home for the last [MASK] years . It was our first family home and → ten We judge a passage to be memorized if at least four of the five masked terms are able to be predicted. We measure the influence of memorization on the task through the Pearson correlation between an instance's memorization and whether the model correctly predicted its label (in the case of regression for Literary Time, the mean absolute error between true and predicted label); for correlation between two binary variables, this is equivalent to the 𝜙 coefÏcient <ref type="bibr" target="#b47">[48]</ref>. We select this method over others due to the specific nature of these datasets, which do not always contain names (required by Chang, Cramer, Soni, and Bamman <ref type="bibr">[6]</ref>) and do not always have titles or authors notable enough to condition on (required by D'Souza and Mimno <ref type="bibr">[10]</ref>).</p><p>As Table <ref type="table">5</ref> shows, while GPT-4o on Folktales, hippocorpus and literary time all rank highest on memorization, there is no measurable effect on downstream tasks for any model/task combination when correcting for multiple hypothesis tests.</p><p>This assessment of memorization, however, only considers memorization of the text alone, and not the association of text with a label. While past work has examined this form of label memorization in controlled settings where the training data is known <ref type="bibr" target="#b25">[26]</ref>, it remains a difficult task for membership inference-and in fact may be impossible to disentangle from the understanding of the task itself.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 5</head><p>Dataset memorization by prompting models. With Bonferroni correction for the 𝑛 = 29 hypothesis tests carried out, no model shows significant correlation at 𝛼 = 0.05 between memorization and task performance. We do not assess Folktales on Llama 3 70B since the prompt for that task exceeds the Llama context size (as noted above). For cases with 0 memorization, Pearson's 𝑟 is undefined (denoted by blank values).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>GPT-4o</head><p>Llama 1. Themes of supernatural intervention: Supernatural beings like trolls or witches play a significant role. 2. Rural and pastoral settings: The stories often take place in rural settings, involving common folk and simple tasks.</p><p>Category 4 (sc. 700) Characteristics:</p><p>1. Themes of cleverness and resourcefulness: The protagonist uses cleverness to outwit others, such as a tinker finding treasure. 2. Elements of folklore and superstition: The stories often involve elements of folklore, such as dreams leading to treasure.</p><p>Category 5 (sc. 175) Characteristics:</p><p>1. Themes of trickery and cunning: Characters like Brer Rabbit use cunning to outwit stronger opponents. 2. Animal characters with human traits: The stories often feature animals with humanlike intelligence and behavior.</p><p>1. Frequent use of dialogue and conversational tone: Texts often include direct speech, with characters interacting in a casual or informal manner. 2. Presence of conflict or tension: Many passages involve arguments, disagreements, or confrontations between characters. 3. Focus on mundane or everyday events: The content often revolves around ordinary activities, such as meals, conversations, or minor incidents. 4. Detailed descriptions of characters' actions and emotions: There is a strong emphasis on what characters are doing and feeling in the moment. 5. Use of colloquial language and idiomatic expressions: The language tends to be more relaxed and may include slang or regional dialects. 6. Shorter, more fragmented sentences: The writing style often includes brief, choppy sentences that reflect the immediacy of the scene. 7. Frequent shifts in focus or perspective: The narrative may jump between different characters or scenes, creating a sense of movement and dynamism.</p><p>Pole 2 (sc. longer time) Characteristics:</p><p>1. Formal and elaborate language: The text often employs a more sophisticated and refined vocabulary, with complex sentence structures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Descriptive and expository passages:</head><p>There is a greater emphasis on setting the scene, providing background information, and describing the environment in detail.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Philosophical or reflective tone:</head><p>The content may include introspective or contemplative passages, exploring deeper themes and ideas. 4. Historical or literary references: The text may allude to historical events, classical literature, or other cultural touchstones. 5. Focus on significant or dramatic events: The narrative often centers around major occurrences, such as battles, political intrigues, or life-changing decisions. 6. Longer, more complex sentences: The writing style tends to be more fluid and continuous, with fewer breaks and a more measured pace. 7. Consistent narrative perspective: The text usually maintains a steady point of view, providing a cohesive and unified narrative voice.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.8. Narrativity</head><p>Category 0 (sc. narrative) Characteristics:</p><p>1. Narrative Style: The texts often have a narrative or storytelling style, focusing on characters, events, and dialogues. 2. Descriptive Language: Rich in descriptive language, often detailing settings, emotions, and actions. 3. Literary Elements: Use of literary elements such as metaphors, similes, and personification. 4. Character Interaction: Frequent interactions between characters, including dialogues and internal monologues.</p><p>5. Emotional Tone: Often convey emotions and personal experiences, creating an immersive atmosphere. 6. Historical or Fictional Context: Many texts are set in historical or fictional contexts, providing a backdrop for the narrative. 7. Plot Development: Presence of a clear plot or storyline, with events unfolding over time. 8. Imagery: Use of vivid imagery to paint scenes and evoke sensory experiences.</p><p>Category 1 (sc. non-narrative) Characteristics:</p><p>1. Formal and Technical Language: Use of formal, technical, and legal language, often related to business, contracts, or regulations. 2. Structured Format: Texts are often structured in a formal format, including sections, clauses, and bullet points. 3. Objective Tone: An objective, impersonal tone, focusing on facts, procedures, and instructions. 4. Specific Terminology: Use of specific terminology related to business, law, or technical fields. 5. Lack of Narrative: Absence of a narrative or storytelling style; instead, the focus is on conveying information or instructions. 6. Contractual and Legal Content: Frequent references to agreements, obligations, rights, and legal terms. 7. Professional Context: Texts are often set in a professional or corporate context, dealing with business operations, legal matters, or technical specifications. 8. Precision and Clarity: Emphasis on precision and clarity, ensuring that the information is unambiguous and easily understood.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.9. Strangeness</head><p>Category 0 (sc. strange) Characteristics:</p><p>1. Technical or Scientific Language: Texts often include technical or scientific terms, such as "proton and neutron structure, " "viewer magnification, " "synthetic gravity generator, " and "molecular started biting at the fort's ray screen. " 2. Space and Technology Themes: Many texts reference space, ships, and technology, e.g., "space navigator, " "rocket ships, " "air ship, " "space speed, " and "space tug. " 3. Descriptive of Actions and Objects: Texts frequently describe actions or objects in a detailed manner, such as "The ship plunged planetward, " "The boats slid into space, " and "The hull of the _Enterprise_ bulked huge in the guidance screen. " 4. Formal and Objective Tone: The tone is often formal and objective, focusing on factual descriptions and technical details, e.g., "The machine replenishes the energy because it is set to maintain a certain energy level in the field. " 5. Lack of Personal Interaction: There is minimal personal interaction or dialogue between characters, with a focus on events and descriptions, e.g., "The car roared through the gates of the spaceport and sped across the hard surface of the field. "</p><p>Figures 4, 5 and 6 illustrate the range of learning rates considered during hyperparameter optimization, along with the ensuing accuracy on development data; while we consider 50 optimization trials of each learning rate, optuna discards trials that are unlikely to outperform learning rates seen so far (so each task/model may have fewer than 50 points illustrated). Across all supervised methods-BERT (fig. <ref type="figure" target="#fig_2">4</ref>), RoBERTa (fig. <ref type="figure">5</ref>) and Llama 8B (fig. <ref type="figure">6</ref>)-the choice of learning rate is very dependent on the task, necessitating task-specific optimization. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Template for LLM sensemaking exercise, with anonymized integer labels, illustrated with data for the Strangeness task (0 = strange; 1 = not strange).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Searching for the text of a data point in the Folktales training set ("A father, now aged, had given over all his property to his children") brings up its label (982).</figDesc><graphic coords="19,150.54,161.10,291.70,78.07" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Learning rate sweep on development data for BERT across different tasks. The x axis is the log 10 of the learning rate.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Dataset sizes and median number of tokens per document (in training data, tokenized through NLTK).</figDesc><table><row><cell>Dataset</cell><cell cols="5"># train # dev # test # categories median tokens</cell></row><row><cell>Atypical animacy</cell><cell>197</cell><cell>198</cell><cell>198</cell><cell>2</cell><cell>41</cell></row><row><cell>Emotion</cell><cell>276</cell><cell>276</cell><cell>276</cell><cell>8</cell><cell>68</cell></row><row><cell>Folktales</cell><cell>328</cell><cell>338</cell><cell>328</cell><cell>70</cell><cell>674</cell></row><row><cell>Genre</cell><cell cols="2">1250 1250</cell><cell>1250</cell><cell>5</cell><cell>485</cell></row><row><cell>Haiku</cell><cell>368</cell><cell>363</cell><cell>363</cell><cell>2</cell><cell>85</cell></row><row><cell>Hippocorpus</cell><cell cols="2">2285 2285</cell><cell>2284</cell><cell>3</cell><cell>278</cell></row><row><cell>Literary time</cell><cell>589</cell><cell>588</cell><cell cols="2">588 (regression)</cell><cell>291</cell></row><row><cell>Narrativity</cell><cell cols="2">4191 4841</cell><cell>4380</cell><cell>2</cell><cell>127</cell></row><row><cell>Strangeness</cell><cell>609</cell><cell>610</cell><cell>610</cell><cell>2</cell><cell>17</cell></row><row><cell>Stream-of-consciousness</cell><cell>200</cell><cell>200</cell><cell>200</cell><cell>2</cell><cell>246</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Task performance, ordered by absolute difference between best performing prompting model and best performing supervised model. Colors visualize the proximity of that value to 1.0 (blue) and 0.0 (red).</figDesc><table><row><cell></cell><cell>Folktales</cell><cell></cell><cell>Animacy</cell><cell></cell><cell>Genre</cell><cell></cell></row><row><cell></cell><cell>Accuracy</cell><cell>Time</cell><cell>Accuracy</cell><cell cols="2">Time Accuracy</cell><cell>Time</cell></row><row><cell>Majority</cell><cell>0.033</cell><cell></cell><cell>0.641</cell><cell></cell><cell>0.200</cell><cell></cell></row><row><cell>Linear</cell><cell>0.457 [0.405-0.509]</cell><cell>14.0</cell><cell>0.778 [0.717-0.833]</cell><cell>0.0</cell><cell>0.528 [0.502-0.556]</cell><cell>12.5</cell></row><row><cell>BERT</cell><cell>0.494 [0.442-0.546]</cell><cell>82.6</cell><cell>0.843 [0.788-0.889]</cell><cell>12.0</cell><cell>0.614 [0.588-0.641]</cell><cell>194.4</cell></row><row><cell>RoBERTa</cell><cell>0.616 [0.567-0.665]</cell><cell>174.4</cell><cell>0.833 [0.783-0.884]</cell><cell>10.0</cell><cell>0.648 [0.623-0.674]</cell><cell>195.0</cell></row><row><cell>Llama 3 8B</cell><cell>0.366 [0.311-0.421]</cell><cell>672.5</cell><cell>0.828 [0.773-0.879]</cell><cell>57.8</cell><cell cols="2">0.741 [0.717-0.765] 2013.5</cell></row><row><cell>GPT-4o</cell><cell>0.838 [0.799-0.875]</cell><cell></cell><cell>0.848 [0.798-0.899]</cell><cell></cell><cell>0.710 [0.683-0.734]</cell><cell></cell></row><row><cell>Llama 3 70B</cell><cell>-</cell><cell></cell><cell>0.869 [0.818-0.914]</cell><cell></cell><cell>0.724 [0.699-0.750]</cell><cell></cell></row><row><cell>Mixtral 8x22B</cell><cell>0.488 [0.433-0.543]</cell><cell></cell><cell>0.823 [0.763-0.874]</cell><cell></cell><cell>0.380 [0.351-0.406]</cell><cell></cell></row><row><cell></cell><cell cols="2">Stream of consciousness</cell><cell>Emotion</cell><cell></cell><cell>Narrativity</cell><cell></cell></row><row><cell></cell><cell>Accuracy</cell><cell>Time</cell><cell>Accuracy</cell><cell>Time</cell><cell>Accuracy</cell><cell>Time</cell></row><row><cell>Majority</cell><cell>0.500</cell><cell></cell><cell>0.207</cell><cell></cell><cell>0.601</cell><cell></cell></row><row><cell>Linear</cell><cell>0.875 [0.830-0.920]</cell><cell>0.0</cell><cell>0.181 [0.138-0.225]</cell><cell>15.7</cell><cell>0.875 [0.866-0.884]</cell><cell>3.7</cell></row><row><cell>BERT</cell><cell>0.905 [0.865-0.945]</cell><cell>29.9</cell><cell>0.402 [0.344-0.464]</cell><cell>20.3</cell><cell>0.863 [0.853-0.873]</cell><cell>334.4</cell></row><row><cell>RoBERTa</cell><cell>0.900 [0.855-0.940]</cell><cell>24.6</cell><cell>0.362 [0.308-0.417]</cell><cell>22.7</cell><cell>0.928 [0.920-0.935]</cell><cell>442.0</cell></row><row><cell>Llama 3 8B</cell><cell>0.945 [0.910-0.975]</cell><cell>179.5</cell><cell>0.250 [0.203-0.304]</cell><cell>106.1</cell><cell>0.933 [0.926-0.940]</cell><cell>2635.2</cell></row><row><cell>GPT-4o</cell><cell>0.925 [0.885-0.960]</cell><cell></cell><cell>0.370 [0.315-0.428]</cell><cell></cell><cell>0.830 [0.818-0.840]</cell><cell></cell></row><row><cell>Llama 3 70B</cell><cell>0.875 [0.825-0.920]</cell><cell></cell><cell>0.373 [0.312-0.435]</cell><cell></cell><cell>0.875 [0.865-0.884]</cell><cell></cell></row><row><cell>Mixtral 8x22B</cell><cell>0.820 [0.765-0.870]</cell><cell></cell><cell>0.163 [0.123-0.207]</cell><cell></cell><cell>0.604 [0.591-0.618]</cell><cell></cell></row><row><cell></cell><cell>Strangeness</cell><cell></cell><cell>Haiku</cell><cell></cell><cell>Hippocorpus</cell><cell></cell></row><row><cell></cell><cell>Accuracy</cell><cell>Time</cell><cell>Accuracy</cell><cell>Time</cell><cell>Accuracy</cell><cell>Time</cell></row><row><cell>Majority</cell><cell>0.556</cell><cell></cell><cell>0.529</cell><cell></cell><cell>0.405</cell><cell></cell></row><row><cell>Linear</cell><cell>0.805 [0.772-0.834]</cell><cell>0.0</cell><cell>0.705 [0.658-0.752]</cell><cell>0.0</cell><cell>0.518 [0.497-0.539]</cell><cell>24.2</cell></row><row><cell>BERT</cell><cell>0.879 [0.852-0.903]</cell><cell>17.9</cell><cell>0.782 [0.738-0.824]</cell><cell>34.0</cell><cell>0.615 [0.598-0.636]</cell><cell>289.7</cell></row><row><cell>RoBERTa</cell><cell>0.877 [0.849-0.903]</cell><cell>23.9</cell><cell>0.986 [0.972-0.997]</cell><cell>56.9</cell><cell>0.646 [0.624-0.665]</cell><cell>511.0</cell></row><row><cell>Llama 3 8B</cell><cell>0.874 [0.848-0.900]</cell><cell>145.8</cell><cell>0.992 [0.981-1.000]</cell><cell>296.0</cell><cell>0.682 [0.662-0.699]</cell><cell>4228.8</cell></row><row><cell>GPT-4o</cell><cell>0.564 [0.523-0.602]</cell><cell></cell><cell>0.785 [0.741-0.826]</cell><cell></cell><cell>0.408 [0.388-0.429]</cell><cell></cell></row><row><cell>Llama 3 70B</cell><cell>0.726 [0.690-0.761]</cell><cell></cell><cell>0.625 [0.576-0.675]</cell><cell></cell><cell>0.412 [0.391-0.432]</cell><cell></cell></row><row><cell>Mixtral 8x22B</cell><cell>0.761 [0.726-0.793]</cell><cell></cell><cell>0.813 [0.771-0.851]</cell><cell></cell><cell>0.263 [0.245-0.279]</cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="2">Literary Time</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell>𝜌</cell><cell>Time</cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="2">Majority</cell><cell>0</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>Linear</cell><cell></cell><cell>0.640 [0.588-0.687]</cell><cell>0.2</cell><cell></cell><cell></cell></row><row><cell></cell><cell>BERT</cell><cell></cell><cell>0.764 [0.720-0.805]</cell><cell>88.6</cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="2">RoBERTa</cell><cell>0.782 [0.738-0.817]</cell><cell>111.4</cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="2">Llama 3 8B</cell><cell>0.772 [0.730-0.809]</cell><cell>966.3</cell><cell></cell><cell></cell></row><row><cell></cell><cell>GPT-4o</cell><cell></cell><cell>0.485 [0.406-0.560]</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="2">Llama 3 70B</cell><cell>0.371 [0.285-0.455]</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="2">Mixtral 8x22B</cell><cell>0.447 [0.378-0.508]</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>of laziness and cleverness:</head><label></label><figDesc>The story teaches a lesson about gratitude and the consequences of greed. The protagonist, Jack, is initially lazy but uses cleverness to achieve his goals. 2. Transformation and success: Jack transforms from a lazy boy to a successful individual through a series of clever actions. The stories often involve a quest to regain one's true form or to find true love.</figDesc><table><row><cell cols="8">2. Expressions of emotions like sadness or happiness: Characters expressing feelings of joy or sorrow.</cell></row><row><cell cols="8">3. References to specific actions or events: Descriptions of particular incidents or ac-tivities.</cell></row><row><cell>C.3. Folktales</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="8">[Note: for legibility, we present the first 10 categories; all results can be viewed in our Github</cell></row><row><cell>repository.]</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="3">Category 0 (sc. 982) Characteristics:</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="8">1. Themes of deception and trickery: The story involves a father deceiving his children by pretending to have money.</cell></row><row><cell cols="3">2. Moral lesson: Category 1 (sc. 650A) Characteristics:</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="3">Mem 0.010 0.022 0.116 0.022 0.003 0.137 0.034 0.076 1. Themes Category 3 (sc. 113A) Characteristics: 𝑟 Animacy -0.098 Emotion -0.063 Folktales 0.027 Genre -0.022 Haiku 0.027 Hippocorpus -0.021 Literary time 0.001 Narrativity 0.042</cell><cell>𝑝 0.169 0.300 0.624 0.432 0.602 0.327 0.990 0.005</cell><cell>Mem 0.000 0.007 -0.010 0.003 0.067 0.003 0.032</cell><cell>3 70B -0.066 𝑟 -0.060 0.041 0.005 -0.013 0.037</cell><cell>𝑝 0.271 0.034 0.440 0.814 0.744 0.014</cell><cell>Mixtral 8x22B Mem 𝑟 0.000 0.004 -0.027 0.660 𝑝 0.006 0.002 0.976 0.002 -0.005 0.868 0.000 0.006 0.048 0.023 0.000 0.005 0.020 0.186</cell></row><row><cell>Strangeness</cell><cell>0.040</cell><cell>-0.039</cell><cell>0.341</cell><cell>0.015</cell><cell>0.016</cell><cell>0.703</cell><cell>0.000</cell></row><row><cell>Stream-of-consciousness</cell><cell>0.030</cell><cell>0.050</cell><cell>0.481</cell><cell>0.005</cell><cell>0.027</cell><cell>0.706</cell><cell>0.000</cell></row></table><note>Category 2 (sc. 510B) Characteristics:1. Themes of enchantment and transformation: Characters are often transformed by magical means, such as a girl being turned into a reindeer. 2. Quest for identity and love:</note></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We thank the original authors of the tasks we study for openly releasing the data on which this current work stands. The research reported in this article was supported by funding from the National Science Foundation (IIS-1942591) and the National Endowment for the Humanities (HAA-271654-20), with computing resources provided by Microsoft Azure (Accelerating Foundation Models Research) and the Accelerating Computing for Emerging Sciences (ACES) high-performance computing cluster at Texas A&amp;M University, funded by the National Science Foundation (NAIRR-240114).</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Category 6 (sc. 980) Characteristics:</p><p>1. Themes of family and respect: The story involves family dynamics and the importance of respecting elders.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Moral lesson:</head><p>The story teaches a lesson about treating family members with kindness and respect.</p><p>Category 7 (sc. 726) Characteristics:</p><p>1. Themes of supernatural and magical intervention: Supernatural beings like trolls or witches play a significant role. 2. Rural and pastoral settings: The stories often take place in rural settings, involving common folk and simple tasks.</p><p>Category 8 (sc. 1430) Characteristics:</p><p>1. Themes of laziness and cleverness: The protagonist, Heinz, is initially lazy but uses cleverness to achieve his goals. 2. Transformation and success: Heinz transforms from a lazy boy to a successful individual through a series of clever actions.</p><p>Category 9 (sc. 2250) Characteristics:</p><p>1. Themes of brevity and simplicity: The story is very short and simple, often involving a single event or action. 2. Lack of detailed plot: The story lacks a detailed plot or character development.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.4. Genre</head><p>Category 0 (sc. science fiction) Characteristics:</p><p>1. Frequent use of dialogue and conversational tone: The texts often include characters speaking to each other, which makes the narrative dynamic and engaging.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Presence of action and movement:</head><p>The texts often describe characters performing actions, moving from one place to another, or engaging in some form of physical activity. 3. Descriptive language for settings and events: The texts provide detailed descriptions of the environment, events, and actions, creating a vivid picture for the reader. 4. Suspense and tension: The texts often build suspense and tension, keeping the reader engaged and curious about what will happen next.</p><p>Category 1 (sc. western stories) Characteristics:</p><p>1. Focus on personal experiences and emotions: The texts often delve into the personal thoughts, feelings, and experiences of the characters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Introspective and reflective tone:</head><p>The narrative often includes characters reflecting on their past actions, decisions, and emotions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Detailed character development:</head><p>The texts provide in-depth descriptions of characters' personalities, backgrounds, and motivations.</p><p>4. Use of first-person perspective: Some texts are written from a first-person point of view, giving a direct insight into the character's mind.</p><p>Category 2 (sc. detective and mystery stories) Characteristics:</p><p>1. Mystery and investigation elements: The texts often involve characters trying to solve a mystery or uncover hidden information. 2. Presence of law enforcement or detective figures: Characters such as detectives, police ofÏcers, or investigators are commonly featured. 3. Clues and evidence: The narrative includes details about clues, evidence, and logical deductions made by the characters. 4. Formal and procedural language: The texts often use formal language and describe procedural actions related to investigations or legal matters.</p><p>Category 3 (sc. adventure stories) Characteristics:</p><p>1. Historical or period settings: The texts are often set in historical or period-specific contexts, with detailed descriptions of the time period. 2. Focus on social and cultural aspects: The narrative often explores social norms, cultural practices, and historical events relevant to the time period. 3. Descriptive language for attire and environment: The texts provide detailed descriptions of clothing, architecture, and other elements specific to the historical setting. 4. Formal and elaborate language: The language used is often more formal and elaborate, reflecting the historical context.</p><p>Category 4 (sc. love stories) Characteristics:</p><p>1. Romantic and emotional themes: The texts often focus on romantic relationships, emotional experiences, and personal connections between characters. 2. Descriptive language for emotions and interactions: The narrative includes detailed descriptions of characters' emotions, facial expressions, and interactions. 3. Use of dialogue to convey feelings: Dialogue is often used to express characters' feelings, thoughts, and relationship dynamics. 4. Focus on personal and intimate moments: The texts highlight personal and intimate moments between characters, emphasizing their emotional significance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.5. Haiku</head><p>Category 0 (sc. non-haiku) Characteristics:</p><p>1. Narrative or Descriptive Style: Texts often tell a story or describe a scene in detail.</p><p>2. Complex Structure: Sentences are often longer and more complex, with multiple clauses. 3. Historical or Mythological References: Frequent references to historical events, mythological figures, or classical literature. 4. Character Focus: Emphasis on characters, their actions, and their emotions. 5. Dialogue and Monologue: Presence of dialogue or internal monologue.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Emotional but not overly dramatic:</head><p>While there is some emotional content, it is not as intense or dramatic as in Category 2.</p><p>Category 2 (sc. recalled) Characteristics:</p><p>1. Emotional depth and intensity: Entries in this category are characterized by a high level of emotional content, often describing feelings of sadness, joy, anxiety, or relief. 2. Personal and introspective: These entries often include personal reflections and introspection, exploring the emotional impact of events on the writer. 3. Detailed and elaborate: The descriptions are often more elaborate and detailed, providing a comprehensive view of the events and their emotional significance. 4. Significant life events: Entries frequently describe major life events, such as births, deaths, illnesses, or significant personal challenges. 5. Narrative style: The writing style is often more narrative, telling a story with a clear beginning, middle, and end, and focusing on the emotional journey.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.7. Literary time</head><p>[Note for this regression task we used the following prompt.]</p><p>Consider the data below, which contains a list of text/label pairs that illustrate a regression problem:</p><p>[10pt] [{"text": "Carlyle disappeared. Barbara had taken off her own warm night-gown then, and put it upon William in place of his cold one-had struck a light and was busily dressing herself. …", "label": -4.454347296253507}</p><p>{"text": "At first he lived up above entirely, reading a good deal, and strumming upon an old harp …", "label": 4.033719940405666} …]</p><p>Using this data, and this data alone, what are the textual characteristics that differentiate the two ends of this scale (let us call those ends "Pole 1" and the "Pole 2")? The label "-1" is closer to Pole 1 than the label "1" is; the label "1", likewise, is closer to Pole 2 than the label "-1" is. Provide a list of bullet points of those textual features in the following format: </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Llama 3 Model Card</title>
		<author>
			<persName><surname>Aimeta</surname></persName>
		</author>
		<ptr target="https://github.com/meta-llama/llama3/blob/main/MODEL%5C%5FCARD.md" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Racial categories in machine learning</title>
		<author>
			<persName><forename type="first">S</forename><surname>Benthall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">D</forename><surname>Haynes</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Conference on Fairness, Accountability, and Transparency</title>
				<meeting>the Conference on Fairness, Accountability, and Transparency</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="289" to="298" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Man people woman life&quot; / &quot;Creek sheep cattle horses&quot;: Influence, Distinction, and Literary Traditions</title>
		<author>
			<persName><forename type="first">K</forename><surname>Bode</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">A World of Fiction: Digital Collections and the Future of Literary History</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Sorting things out: Classification and its consequences</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">C</forename><surname>Bowker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">L</forename><surname>Star</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2000">2000</date>
			<publisher>MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The tell-tale hat: Surfacing the uncertainty in folklore classification</title>
		<author>
			<persName><forename type="first">P</forename><surname>Broadwell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mimno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Tangherlini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Cultural Analytics</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4</title>
		<author>
			<persName><forename type="first">K</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cramer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bamman</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.emnlp-main.453</idno>
		<ptr target="https://aclanthology.org/2023.emnlp-main.453" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Bouamor</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Pino</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Bali</surname></persName>
		</editor>
		<editor>
			<persName><surname>Singa</surname></persName>
		</editor>
		<meeting>the 2023 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="7312" to="7327" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Fleshing out models of gender in English-language novels (1850-2000)</title>
		<author>
			<persName><forename type="first">J</forename><surname>Cheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Cultural Analytics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Living Machines: A study of atypical animacy</title>
		<author>
			<persName><forename type="first">M</forename><surname>Coll Ardanuy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Nanni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Beelen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hosseini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ahnert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lawrence</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mcdonough</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Tolfo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">C</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mcgillivray</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.coling-main.400</idno>
		<ptr target="https://aclanthology.org/2020.coling-main.400" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th International Conference on Computational Linguistics</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Scott</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Bel</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Zong</surname></persName>
		</editor>
		<meeting>the 28th International Conference on Computational Linguistics<address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="4534" to="4545" />
		</imprint>
	</monogr>
	<note>International Committee on Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">What Gets Counted Counts</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Ignazio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Klein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Data Feminism</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The Chatbot and the Canon: Poetry Memorization in LLMs</title>
		<author>
			<persName><forename type="first">L</forename><surname>Souza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mimno</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Computational Humanities Research Conference (CHR)</title>
				<meeting>the Computational Humanities Research Conference (CHR)</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Long and Short Papers</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">The Order of Things: An Archaeology of the Human Sciences</title>
		<author>
			<persName><forename type="first">M</forename><surname>Foucault</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1994">1994</date>
			<pubPlace>Vintage</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">The quiet transformations of literary studies: What thirteen thousand scholars could tell us</title>
		<author>
			<persName><forename type="first">A</forename><surname>Goldstone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Underwood</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">New Literary History</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="359" to="384" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Bearing a bag-of-tales: An open corpus of annotated folktales for reproducible research</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hagedorn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Darányi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Open Humanities Data</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page">16</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">LoRA: Low-Rank Adaptation of Large Language Models</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Allen-Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions</title>
		<author>
			<persName><forename type="first">L</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Qin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2311.05232</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Understanding gender and character agency in the 19th century novel</title>
		<author>
			<persName><forename type="first">M</forename><surname>Jockers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kirilloff</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Cultural Analytics</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Calibrated language models must hallucinate</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">T</forename><surname>Kalai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Vempala</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 56th Annual ACM Symposium on Theory of Computing</title>
				<meeting>the 56th Annual ACM Symposium on Theory of Computing</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="160" to="171" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Who Feels What and Why? Annotation of a Literature Corpus with Semantic Roles of Emotions</title>
		<author>
			<persName><forename type="first">E</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Klinger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics</title>
				<meeting>COLING 2018, the 27th International Conference on Computational Linguistics<address><addrLine>Santa Fe, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Dimensions of Scale: Invisible Labor, Editorial Work, and the Future of Quantitative Literary Studies</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">F</forename><surname>Klein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pmla</title>
		<imprint>
			<biblScope unit="volume">135</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="23" to="39" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings</title>
		<author>
			<persName><forename type="first">C</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Koto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Baldwin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2024.naacl-long.112" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long Papers</title>
		<editor>
			<persName><forename type="first">K</forename><surname>Duh</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Gomez</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Bethard</surname></persName>
		</editor>
		<meeting>the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Mexico City, Mexico</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="2016" to="2039" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">RoBERTa: A robustly optimized BERT pretraining approach</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.11692</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Literary pattern recognition: Modernism between close reading and machine learning</title>
		<author>
			<persName><forename type="first">H</forename><surname>Long</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>So</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Critical inquiry</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="235" to="267" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Turbulent flow: A computational model of world literature</title>
		<author>
			<persName><forename type="first">H</forename><surname>Long</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>So</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Modern Language Quarterly</title>
		<imprint>
			<biblScope unit="volume">77</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="345" to="367" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters</title>
		<author>
			<persName><forename type="first">L</forename><surname>Lucy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gururangan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Soldaini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Strubell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bamman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dodge</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2024.acl-long.400</idno>
		<ptr target="https://aclanthology.org/2024.acl-long.400" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<editor>
			<persName><forename type="first">L.-W</forename><surname>Ku</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Martins</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Srikumar</surname></persName>
		</editor>
		<meeting>the 62nd Annual Meeting of the Association for Computational Linguistics<address><addrLine>Bangkok, Thailand</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="7393" to="7420" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Data Contamination: From Memorization to Exploitation</title>
		<author>
			<persName><forename type="first">I</forename><surname>Magar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Schwartz</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.acl-short.18</idno>
		<ptr target="https://aclanthology.org/2022.acl-short.18" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Muresan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Villavicencio</surname></persName>
		</editor>
		<meeting>the 60th Annual Meeting of the Association for Computational Linguistics<address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="157" to="165" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Computational grounded theory: A methodological framework</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">K</forename><surname>Nelson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Sociological Methods &amp; Research</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="3" to="42" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">AI as a Sport: On the Competitive Epistemologies of Benchmarking</title>
		<author>
			<persName><forename type="first">W</forename><surname>Orr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">B</forename><surname>Kang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The 2024 ACM Conference on Fairness, Accountability, and Transparency</title>
				<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="1875" to="1884" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Think small: on literary modeling</title>
		<author>
			<persName><forename type="first">A</forename><surname>Piper</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pmla</title>
		<imprint>
			<biblScope unit="volume">132</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="651" to="658" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Toward a data-driven theory of narrativity</title>
		<author>
			<persName><forename type="first">A</forename><surname>Piper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bagga</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">New Literary History</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="879" to="901" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Comparing ChatGPT to Human Raters and Sentiment Analysis Tools for German Children&apos;s Literature</title>
		<author>
			<persName><forename type="first">S</forename><surname>Rebora</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Heumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lauer</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org1613" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Computational Humanities Research conference</title>
				<meeting>the Computational Humanities Research conference</meeting>
		<imprint>
			<date type="published" when="1973">2023. 0073</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Recollection versus Imagination: Exploring Human Memory and Cognition via Neural Language Models</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Horvitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pennebaker</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.178</idno>
		<ptr target="https://www.aclweb.org/anthology/2020.acl-main.178" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1970" to="1978" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">The rise and fall of genre differentiation in English-language fiction</title>
		<author>
			<persName><forename type="first">A</forename><surname>Sharmaa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Shang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singhal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Underwood</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Computational Humanities Research Conference (CHR)</title>
				<meeting>the Computational Humanities Research Conference (CHR)</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense</title>
		<author>
			<persName><forename type="first">S</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Logeswaran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Poria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mihalcea</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2024.naacl-long.316" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long Papers</title>
		<editor>
			<persName><forename type="first">K</forename><surname>Duh</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Gomez</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Bethard</surname></persName>
		</editor>
		<meeting>the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Mexico City, Mexico</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="5668" to="5680" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Detecting Pretraining Data from Large Language Models</title>
		<author>
			<persName><forename type="first">W</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ajith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Blevins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Twelfth International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Towards a Poetics of Strangeness: Experiments in Classifying Language of Technological Novelty</title>
		<author>
			<persName><forename type="first">M</forename><surname>Simeone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G V</forename><surname>Koundinya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Finn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Cultural Analytics</title>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Computational Detection of Narrativity: A Comparison Using Textual Features and Reader Response</title>
		<author>
			<persName><forename type="first">M</forename><surname>Steg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Slot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Pianzola</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2022.latechclfl-1.13" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Degaetano</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Kazantseva</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Reiter</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Szpakowicz</surname></persName>
		</editor>
		<editor>
			<persName><surname>Gyeongju</surname></persName>
		</editor>
		<meeting>the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="105" to="114" />
		</imprint>
	</monogr>
	<note>Republic of Korea: International Conference on Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Confabulation: The Surprising Value of Large Language Model Hallucinations</title>
		<author>
			<persName><forename type="first">P</forename><surname>Sui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Duede</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>So</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2024.acl-long.770</idno>
		<ptr target="https://aclanthology.org/2024.acl-long.770" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<editor>
			<persName><forename type="first">L.-W</forename><surname>Ku</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Martins</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Srikumar</surname></persName>
		</editor>
		<meeting>the 62nd Annual Meeting of the Association for Computational Linguistics<address><addrLine>Bangkok, Thailand</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="14274" to="14284" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">Computational cut-ups: The influence of Dada</title>
		<author>
			<persName><forename type="first">L</forename><surname>Thompson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mimno</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Journal of Modern Periodical Studies</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="179" to="195" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">Motif-index of folk-literature; a classification of narrative elements in folktales</title>
		<author>
			<persName><forename type="first">S</forename><surname>Thompson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ballads, myths, fables, mediaeval romances, exempla, fabliaux, jest-books, and local legends</title>
				<meeting><address><addrLine>Bloomington</addrLine></address></meeting>
		<imprint>
			<publisher>Indiana University Press</publisher>
			<date type="published" when="1955">1955</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">Why literary time is measured in minutes</title>
		<author>
			<persName><forename type="first">T</forename><surname>Underwood</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Elh</title>
		<imprint>
			<biblScope unit="volume">85</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="341" to="365" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">The Transformation of Gender in English-Language Fiction</title>
		<author>
			<persName><forename type="first">T</forename><surname>Underwood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bamman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lee</surname></persName>
		</author>
		<idno type="DOI">10.22148/16.019</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Cultural Analytics</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2018-02-13">Feb. 13, 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<analytic>
		<title level="a" type="main">Gender Bias in French Literature</title>
		<author>
			<persName><forename type="first">L</forename><surname>Vianne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Dupont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Barré</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Humanities Research conference</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b43">
	<analytic>
		<title level="a" type="main">The Goodreads &quot;classics&quot;: a computational study of readers, Amazon, and crowdsourced amateur criticism</title>
		<author>
			<persName><forename type="first">M</forename><surname>Walsh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Antoniak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Cultural Analytics</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="243" to="287" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b44">
	<analytic>
		<title level="a" type="main">Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets</title>
		<author>
			<persName><forename type="first">M</forename><surname>Walsh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Preus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Antoniak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP</title>
				<imprint>
			<date type="published" when="2024">2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b45">
	<analytic>
		<title level="a" type="main">The Living Machine: A Computational Approach to the Nineteenth-Century Language of Technology</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">C</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Ardanuy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Beelen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mcgillivray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ahnert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Technology and Culture</title>
		<imprint>
			<biblScope unit="volume">64</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="875" to="902" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b46">
	<monogr>
		<title level="m" type="main">Hallucination is inevitable: An innate limitation of large language models</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kankanhalli</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2401.11817</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b47">
	<analytic>
		<title level="a" type="main">On the methods of measuring association between two attributes</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">U</forename><surname>Yule</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the Royal Statistical Society</title>
		<imprint>
			<biblScope unit="volume">75</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="579" to="652" />
			<date type="published" when="1912">1912</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b48">
	<analytic>
		<title level="a" type="main">Can large language models transform computational social science?</title>
		<author>
			<persName><forename type="first">C</forename><surname>Ziems</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Held</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Shaikh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">50</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="237" to="291" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b49">
	<monogr>
		<author>
			<persName><forename type="first">C</forename></persName>
		</author>
		<title level="m">Animacy Category 0 (sc. animate) Characteristics: 1. Human or animal comparison: Texts often compare machines to humans or animals, emphasizing the mechanical nature of humans or the human-like qualities of machines</title>
				<imprint/>
	</monogr>
	<note>servants must be treated, not as machines, but as human beings</note>
</biblStruct>

<biblStruct xml:id="b50">
	<monogr>
		<title level="m">Metaphorical or figurative language: Usage of machines in a metaphorical sense to describe people or actions</title>
				<imprint/>
	</monogr>
	<note>a mere machine. a teaching machine</note>
</biblStruct>

<biblStruct xml:id="b51">
	<monogr>
		<title level="m">Negative connotation: Machines are often described in a negative or dehumanizing context</title>
				<imprint/>
	</monogr>
	<note>mere machines. living machine</note>
</biblStruct>

<biblStruct xml:id="b52">
	<analytic>
		<title level="a" type="main">about the machines or their operations</title>
	</analytic>
	<monogr>
		<title level="m">Lack of technical detail: Texts do not provide specific technical details</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b53">
	<monogr>
		<title level="m">Focus on human experience: Emphasis on human emotions, experiences, or societal roles in relation to machines</title>
				<imprint/>
	</monogr>
	<note>is, of all descriptions of men, the most completely a machine</note>
</biblStruct>

<biblStruct xml:id="b54">
	<monogr>
		<title level="m">Texts often describe machines in a technical or operational context, detailing their functions or uses</title>
				<imprint/>
	</monogr>
	<note>printing machines of automatic construction. engines employed in pumping</note>
</biblStruct>

<biblStruct xml:id="b55">
	<monogr>
		<title level="m">Specific machine types: Mention of specific types of machines or apparatus</title>
				<imprint/>
	</monogr>
	<note>locomotive. dynamo</note>
</biblStruct>

<biblStruct xml:id="b56">
	<monogr>
		<title level="m">Positive or neutral connotation: Machines are described in a functional, neutral, or positive light, often highlighting their utility or innovation</title>
				<imprint/>
	</monogr>
	<note>valuable improvement on Lamb&apos;s knitting machine. self-feeders</note>
</biblStruct>

<biblStruct xml:id="b57">
	<monogr>
		<title level="m">Historical or factual description: Texts provide historical or factual descriptions of machines and their development or usage</title>
				<imprint/>
	</monogr>
	<note>The first steam engine for grinding burnt flint</note>
</biblStruct>

<biblStruct xml:id="b58">
	<monogr>
		<title level="m">The drum of the King pit engine on the Wigan Four-feet rope side is 15 feet 2 inches in diameter</title>
				<imprint/>
	</monogr>
	<note>Detailed descriptions: Detailed descriptions of the machines&apos; components,</note>
</biblStruct>

<biblStruct xml:id="b59">
	<monogr>
		<author>
			<persName><forename type="first">C</forename></persName>
		</author>
		<title level="m">Emotion Category 0 (sc. anticipation) Characteristics: 1. References to royalty or nobility: Mentions of princes, kings, or other noble titles</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b60">
	<monogr>
		<title level="m" type="main">Dialogue involving commands or requests: Characters giving orders or making requests</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b61">
	<monogr>
		<title level="m">) Characteristics: 1. Philosophical or reflective tone: Discussions about life, existence, or deep thoughts</title>
				<imprint/>
	</monogr>
	<note>Descriptions of physical appearance: Comments on someone&apos;s looks, such as &quot;beautiful boots</note>
</biblStruct>

<biblStruct xml:id="b62">
	<monogr>
		<title level="m" type="main">Expressions of doubt or uncertainty: Characters questioning or reflecting on their actions or beliefs</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b63">
	<analytic>
		<title level="a" type="main">Mentions of emotions like sadness or irritation: Characters expressing feelings of melancholy or frustration</title>
	</analytic>
	<monogr>
		<title level="m">) Characteristics: 1. Mentions of specific names: Characters like Dora</title>
				<imprint>
			<publisher>or Miss Stackpole</publisher>
		</imprint>
	</monogr>
	<note>Category 2</note>
</biblStruct>

<biblStruct xml:id="b64">
	<analytic>
		<title level="a" type="main">family, or acquaintances</title>
	</analytic>
	<monogr>
		<title level="m">Dialogue involving personal relationships: Conversations about friends</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b65">
	<monogr>
		<title level="m">of groups or collective actions: References to groups of people, such as &quot;the crowd&quot; or</title>
				<imprint/>
	</monogr>
	<note>: Comments on someone&apos;s looks or attractiveness. the little garrison</note>
</biblStruct>

<biblStruct xml:id="b66">
	<monogr>
		<title level="m" type="main">Descriptions of conflict or tension: Situations involving arguments, fights, or other forms of conflict</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b67">
	<monogr>
		<title level="m">) Characteristics: 1. Mentions of nature or the environment: Descriptions of natural settings, weather, or animals</title>
				<imprint/>
	</monogr>
	<note>References to specific locations or settings: Mentions of places like military prisons or specific rooms</note>
</biblStruct>

<biblStruct xml:id="b68">
	<monogr>
		<title level="m" type="main">Expressions of fear or danger: Characters feeling threatened or in peril</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b69">
	<monogr>
		<title level="m">or conditions: Descriptions of physical states, such as being cold or in pain</title>
				<imprint/>
	</monogr>
	<note>of family or domestic settings: References to family members or home life</note>
</biblStruct>

<biblStruct xml:id="b70">
	<monogr>
		<title level="m">Expressions of surprise or disbelief: Characters reacting with astonishment or incredulity</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b71">
	<monogr>
		<title level="m">References to specific actions or events: Descriptions of particular incidents or activities</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b72">
	<monogr>
		<title level="m">) Characteristics: 1. Mentions of specific names: Characters like Sister Martha</title>
				<imprint>
			<publisher>Henriot, or Charles</publisher>
		</imprint>
	</monogr>
	<note>Category 6</note>
</biblStruct>

<biblStruct xml:id="b73">
	<monogr>
		<title level="m">Descriptions of medical or health-related situations: References to doctors, patients, or health conditions</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b74">
	<monogr>
		<title level="m">) Characteristics: 1. Dialogue involving personal relationships: Conversations about friends, family, or acquaintances</title>
				<imprint/>
	</monogr>
	<note>Expressions of fear or danger: Characters feeling threatened or in peril</note>
</biblStruct>

<biblStruct xml:id="b75">
	<monogr>
		<title level="m">Detailed Imagery: Rich, detailed imagery that paints a vivid picture</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b76">
	<monogr>
		<title level="m" type="main">Formal Tone: Often a more formal or archaic tone</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b77">
	<monogr>
		<title level="m">Philosophical or Reflective: Texts may include philosophical musings or reflections on life and existence</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b78">
	<monogr>
		<title level="m" type="main">Narrative Progression: Clear progression of events or thoughts</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b79">
	<analytic>
		<title level="a" type="main">Use of Direct Address: Often addresses the reader or another character directly</title>
	</analytic>
	<monogr>
		<title level="m">Concise and Lyrical: Texts are often shorter, more concise, and have a lyrical quality</title>
				<imprint/>
	</monogr>
	<note>Category 1</note>
</biblStruct>

<biblStruct xml:id="b80">
	<monogr>
		<title level="m" type="main">Nature Imagery: Frequent use of natural imagery, such as flowers, trees, and weather</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b81">
	<monogr>
		<title level="m" type="main">Emotional and Sensory Language: Focus on emotions and sensory experiences</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b82">
	<monogr>
		<title level="m" type="main">Simple Structure: Sentences are generally simpler and more straightforward</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b83">
	<analytic>
		<title level="a" type="main">as if sharing a private moment or thought</title>
	</analytic>
	<monogr>
		<title level="m">Personal and Intimate: Often feels personal or intimate</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b84">
	<monogr>
		<title level="m" type="main">Symbolism and Metaphor: Heavy use of symbolism and metaphor to convey deeper meanings</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b85">
	<monogr>
		<title level="m" type="main">Reflective and Meditative: Texts often reflect on a single idea or moment</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b86">
	<monogr>
		<title level="m" type="main">Visual and Sensory Details: Emphasis on visual and sensory details to create a mood or atmosphere</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b87">
	<monogr>
		<title level="m">Ephemeral and Transient Themes: Themes of impermanence, change, and fleeting moments</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b88">
	<monogr>
		<title level="m">specific events or actions: Entries in this category often describe a particular event or action in detail, such as a specific activity, task</title>
				<imprint/>
	</monogr>
	<note>Minimalist Style: A minimalist approach with fewer words and more impact. or incident</note>
</biblStruct>

<biblStruct xml:id="b89">
	<monogr>
		<title level="m" type="main">Lack of emotional depth: These entries tend to be more factual and less emotionally charged, focusing on the sequence of events rather than the emotional impact</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b90">
	<monogr>
		<title level="m" type="main">Shorter and more concise: The descriptions are often brief and to the point, without extensive elaboration or reflection</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b91">
	<monogr>
		<title level="m">Category 1 (sc. imagined) Characteristics: 1. Personal achievements or milestones: Entries often describe personal accomplishments, milestones, or significant life changes</title>
				<imprint/>
	</monogr>
	<note>Less personal reflection: There is minimal introspection or personal reflection on the significance of the event</note>
</biblStruct>

<biblStruct xml:id="b92">
	<monogr>
		<title level="m">Positive or neutral tone: The tone is generally positive or neutral, focusing on achievements, celebrations, or positive changes</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b93">
	<monogr>
		<title level="m">and Emotional Language: Texts often include personal and emotional language, such as &quot;She felt her chest tighten and her face become stiff</title>
				<imprint/>
	</monogr>
	<note>He liked young people</note>
</biblStruct>

<biblStruct xml:id="b94">
	<monogr>
		<title level="m">Dialogue and Interactions: Many texts feature dialogue and interactions between characters</title>
				<imprint/>
	</monogr>
	<note>What do you mean inquired Mr. Roumann</note>
</biblStruct>

<biblStruct xml:id="b95">
	<monogr>
		<title level="m">Everyday Activities and Settings: Texts frequently describe everyday activities and settings, such as &quot;She fished in the voluminous pockets of her jeans</title>
				<imprint/>
	</monogr>
	<note>Alan slurped down the rest of his coffee. She rattled around in the galley for some time</note>
</biblStruct>

<biblStruct xml:id="b96">
	<monogr>
		<title level="m">Informal and Conversational Tone: The tone is often informal and conversational, focusing on personal experiences and interactions</title>
				<imprint/>
	</monogr>
	<note>Well that&apos;s awfully nice of you Nothing at all. I m hungry</note>
</biblStruct>

<biblStruct xml:id="b97">
	<monogr>
		<title level="m">Presence of Personal Pronouns: There is a higher usage of personal pronouns, indicating a focus on individuals and their experiences</title>
				<imprint/>
	</monogr>
	<note>) Characteristics: 1. Dialogue-Driven: Texts often contain significant amounts of dialogue between characters,. which drives the narrative forward</note>
</biblStruct>

<biblStruct xml:id="b98">
	<monogr>
		<title level="m">Everyday Situations: The scenarios described are often mundane or everyday occurrences, such as conversations about family, meals, or simple activities</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b99">
	<monogr>
		<title level="m">Realistic Tone: The tone tends to be straightforward and realistic, focusing on practical matters and direct interactions</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b100">
	<monogr>
		<title level="m" type="main">Character Interactions: Emphasis on interactions between characters, often involving social or familial relationships</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b101">
	<monogr>
		<title level="m">Descriptive but Practical: Descriptions are present but are more practical and less ornate, focusing on the immediate environment or actions</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b102">
	<monogr>
		<title level="m">Conflict and Resolution: Often includes a clear conflict and resolution within the passage, typically involving personal or social issues</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b103">
	<monogr>
		<title level="m">) Characteristics: 1. Introspective and Reflective: Texts often delve into the inner thoughts and emotions of characters, providing a deep introspective look</title>
				<imprint/>
	</monogr>
	<note>Historical or Social Context: Some texts provide a historical or social context, often reflecting on societal norms or personal histories</note>
</biblStruct>

<biblStruct xml:id="b104">
	<monogr>
		<title level="m" type="main">Descriptive and Poetic: Rich, ornate descriptions that create vivid imagery and evoke strong emotions</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b105">
	<monogr>
		<title level="m">Philosophical and Abstract: Themes can be more abstract, philosophical, or existential, exploring deeper meanings and human conditions</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b106">
	<monogr>
		<title level="m">Stream-of-Consciousness: Some texts use a stream of consciousness style, reflecting the continuous flow of thoughts and feelings</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b107">
	<monogr>
		<title level="m" type="main">Symbolism and Metaphor: Frequent use of symbolism and metaphor to convey complex ideas and emotions</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b108">
	<monogr>
		<title level="m">Emotional Depth: High emotional intensity, often focusing on personal struggles, desires, and inner conflicts</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b109">
	<monogr>
		<title level="m" type="main">Literary and Artistic References: References to literature, art, or historical events are more common, adding layers of meaning to the text</title>
		<imprint/>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
