<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Literary Canonicity and Algorithmic Fairness: The Effect of Author Gender on Classification Models</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ida</forename><surname>Marie</surname></persName>
							<email>idamarie@cas.au.dk</email>
							<affiliation key="aff0">
								<orgName type="department">Center for Humanities Computing</orgName>
								<orgName type="institution">Aarhus University</orgName>
								<address>
									<country key="DK">Denmark</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">S</forename><surname>Lassen</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Center for Humanities Computing</orgName>
								<orgName type="institution">Aarhus University</orgName>
								<address>
									<country key="DK">Denmark</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pascale</forename><forename type="middle">Feldkamp</forename><surname>Moreira</surname></persName>
							<email>pascale.moreira@cc.au.dk</email>
							<affiliation key="aff0">
								<orgName type="department">Center for Humanities Computing</orgName>
								<orgName type="institution">Aarhus University</orgName>
								<address>
									<country key="DK">Denmark</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yuri</forename><surname>Bizzoni</surname></persName>
							<email>yuri.bizzoni@cc.au.dk</email>
							<affiliation key="aff0">
								<orgName type="department">Center for Humanities Computing</orgName>
								<orgName type="institution">Aarhus University</orgName>
								<address>
									<country key="DK">Denmark</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Kristoffer</forename><surname>Nielbo</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Center for Humanities Computing</orgName>
								<orgName type="institution">Aarhus University</orgName>
								<address>
									<country key="DK">Denmark</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Literary Canonicity and Algorithmic Fairness: The Effect of Author Gender on Classification Models</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">93D2C851F4117F434D07DA6BA4D1F31B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>bias, algorithmic fairness, gender bias, computational literary studies, canonicity, (K. Nielbo) 0000-0001-6905-5665 (I. M. S. Lassen)</term>
					<term>0000-0002-2434-4268 (P. F. Moreira)</term>
					<term>0000-0002-6981-7903 (Y. Bizzoni)</term>
					<term>0000-0002-5116-5070 (K. Nielbo)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This study examines gender biases in machine learning models that predict literary canonicity. Using algorithmic fairness metrics like equality of opportunity, equalised odds, and calibration within groups, we show that models violate the fairness metrics, especially by misclassifying non-canonical books by men as canonical. Feature importance analysis shows that text-intrinsic differences between books by men and women authors contribute to these biases. Men have historically dominated canonical literature, which may bias models towards associating men-authored writing styles with literary canonicity. Our study highlights how these biased models can lead to skewed interpretations of literary history and canonicity, potentially reinforcing and perpetuating existing gender disparities in our understanding of literature. This underscores the need to integrate algorithmic fairness in computational literary studies and digital humanities more broadly to foster equitable computational practices.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In recent years, computational literary studies have increasingly utilised machine learning (ML) models to analyse and classify literary texts, e.g. to predict reader appreciation <ref type="bibr" target="#b30">[31,</ref><ref type="bibr" target="#b33">34]</ref> or literary success <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b44">45,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b8">9]</ref> with uptake in applications in the publishing industry. 1 Models often rely on text-intrinsic features, contributing to the study of which text characteristics serve as predictors for a given classification. While other studies have shown that literature assessment can be biased by gender <ref type="bibr" target="#b42">[43,</ref><ref type="bibr" target="#b28">29]</ref> and ethnicity <ref type="bibr" target="#b14">[15]</ref>, focusing on text-intrinsic characteristics might seem like a way to avoid such biases as it concentrates solely on the text.</p><p>However, seemingly objective features can harbour social biases, reflecting disparities in the underlying data. The present work examines gender biases in ML models that predict literary canonicity, demonstrating how the uncritical use of ML models in humanities research can lead to biased knowledge production, potentially skewing our understanding of literary history and the phenomenon of canonicity. This has implications beyond academic research, as these models could influence real-world applications, including the assessment of new manuscripts by publishers based on predicted success or likeness to existing canon. By integrating insights from algorithmic fairness into our analysis of predictive models, we aim to highlight the potential for hidden biases in seemingly objective computational methods. Our analysis demonstrates how these biases can affect our interpretation of literary history and canon formation, and we emphasise the importance of critical reflection on ML methodologies in DH research.</p><p>Our findings underscore that the significance of this work lies not only in the practical application of prediction models but also in exposing the epistemic consequences of using biased ML models to study literary phenomena. This approach invites researchers to consider how computational methods may inadvertently reproduce or amplify existing biases in literary history.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related works 2.1. Predicting canonicity</head><p>This study builds on prior research demonstrating the potential of ML classifiers to predict various literary attributes, such as whether a book belongs to the literary canon, is written by a Nobel laureate, is a bestseller, is longlisted for given awards, or receives a high rating on GoodReads <ref type="bibr" target="#b7">[8]</ref>. To narrow the scope, we will focus on the attempt to predict canonicity. While various studies focus on classifying canonical works and gauging their textual profile <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b32">33]</ref>, the limited resources in the literary field are rarely openly available. We thus focus on one newly published dataset <ref type="bibr" target="#b9">[10]</ref>, which served as the foundation for Bizzoni, Feldkamp, Jacobsen, Thomsen, and Nielbo <ref type="bibr" target="#b7">[8]</ref> and provides a rich and diverse collection of features of literary works.</p><p>In <ref type="bibr" target="#b7">[8]</ref>, the focus extended beyond classification accuracy to provide insights into the textual features important for the classification models, seeking to understand the characteristics that differentiate canonical from non-canonical books. The study found that "canonical texts have the most distinctive profile across all dimensions and are therefore the easiest to classify in the binary classification task" due to their denser nominal style, lower readability, less predictable sentiment arcs, and higher perplexity.</p><p>However, it is well-known that canonical literature -like the literary field more broadlyhas historically been dominated by men <ref type="bibr" target="#b36">[37,</ref><ref type="bibr" target="#b29">30,</ref><ref type="bibr" target="#b35">36]</ref>. Still, studies that seek to predict some form of canonicity or perceived literary quality rarely include reflections on how biases in their data inform their results, and the cultural, temporal, or gendered dimensions of texts are rarely mentioned. While Algee-Hewitt and McGurl <ref type="bibr" target="#b0">[1]</ref> show how the "canon" significantly changes depending on the approach taken; our study highlights the critical oversight of gender imbalances inherent in literary datasets, which can inadvertently bias model outcomes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Gender differences in literary texts</head><p>Previous research on gender differences in literary texts highlights key issues to avoid. One concern is treating these differences as fixed and universal markers of men's and women's writing. For example, Burrows <ref type="bibr" target="#b12">[13]</ref> shows that gendered patterns in writing styles changed over time, with distinct differences found before 1860 but not after, indicating that gendered styles are historically contingent.</p><p>A second concern is the assumption of a binary gender model, where men's writing is seen as the default. Land <ref type="bibr" target="#b25">[26]</ref> critiques such approaches for framing women's writing as deviant, as seen in studies like <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b24">25]</ref>, which rely on essentialist assumptions and risk reinforcing biased interpretations of literary styles. <ref type="foot" target="#foot_0">2</ref>With that being said, studies have found linguistic and stylistic differences between texts by men and women that are independent of topic and genre <ref type="bibr" target="#b40">[41]</ref>. In the literary domain, Argamon, Koppel, Fine, and Shimoni <ref type="bibr" target="#b2">[3]</ref> shows that a high frequency of pronouns is a "strong female marker", which is supported by Newman, Groom, Handelman, and Pennebaker <ref type="bibr" target="#b34">[35]</ref> who also found that women's language more frequently includes pronouns, social words, various psychological process references, and verbs, as well as negations and home-related terms. Men, on the other hand, used longer words, more numbers, articles, and prepositions than women (p. 223).</p><p>Hiatt <ref type="bibr" target="#b19">[20]</ref> examined contemporary (1978) American prose and found that women use twice as many emotional adverbs compared to men, while men use nearly twice as many pace adverbs. She concludes that while there is a distinct feminine writing style, there is "far less basis for labelling the feminine styles as hyperemotional than for labelling the masculine style hypo-emotional" <ref type="bibr">[20, p. 226]</ref>.</p><p>Hayward <ref type="bibr" target="#b18">[19]</ref> tests whether readers can identify an author's gender and concludes that gender differences are subtler than genre differences. Koolen <ref type="bibr" target="#b23">[24]</ref> goes deeper into the question of genre and examines the interaction of gender and genre, especially with regard to "false labelling", i.e., that works by women are more often labelled as "women's books" regardless of genre <ref type="bibr" target="#b39">[40]</ref>. The findings suggest that while some romantic novels have distinct styles, novels by women are heterogeneous and not distinguishable from those by men. Considering the prevalence of biased mechanisms in the literary field (e.g., false labelling), it is possible that readers focus on similarities among women authors and differences among men authors rather than the reverse.</p><p>The literature reviewed here highlights the complexity of considering gender differences in literary texts and not reducing these differences to essentialist notions about "how women write. " In the following, we will use methods from algorithmic fairness to examine biases in models used to predict canonicity. We do not claim to establish definitive conclusions about the general differences between men's and women's writing; rather, we emphasise how modelling a literary phenomenon inevitably mirrors the underlying data and that results could differ if other datasets were used.</p><p>Considering that questions about bias and fairness are increasingly discussed in ML development and that ML is increasingly applied in DH, algorithmic fairness insights are rarely integrated into computational literary studies. Although Bagga and Piper <ref type="bibr" target="#b3">[4]</ref> explored the impact of bias on predictive accuracy and positive prediction balance in literary data, our study presents a more comprehensive bias analysis informed by the methodologies of algorithmic fairness. We aim to answer the following research questions:</p><p>• RQ1: To what extent do ML models trained on (imbalanced) literary corpora exhibit biases on author gender in classification tasks, particularly in predicting canonicity? • RQ2: Which features in the dataset significantly differ between books by women and men authors, and how do these features impact the bias in classification models?</p><p>These questions are, of course, contingent on the data analysed. Therefore, we zoom out and include a question that addresses a broader concern:</p><p>• RQ3: How does the use of biased ML models affect the knowledge produced in computational literary studies?</p><p>This study focuses on binary gender categories, including only men and women authors. We acknowledge that this does not capture the full spectrum of gender identities and that gender is performative and shaped by discursive practices <ref type="bibr" target="#b13">[14]</ref>. However, this approach aligns with historical perspectives and addresses existing biases between men and women in literary canonicity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data</head><p>The dataset used in this work is the Chicago Corpus, which consists of 9,089 novels from diverse genres published in the US between 1880 and 2000. The data is compiled on the number of libraries holding each novel, with a preference for more circulated works. The dataset was made available with a recent paper <ref type="bibr" target="#b9">[10]</ref>. <ref type="foot" target="#foot_1">3</ref> The canon category is compiled from books by authors in the Norton Anthology, the Penguin Classics series, and the top 1000 authors mentioned in English syllabi (collected by the OpenSyllabus project), as shown in Table <ref type="table" target="#tab_0">1</ref>.</p><p>A diverse set of stylistic, syntactic and narrative features were used in <ref type="bibr" target="#b7">[8]</ref>, which found that "[t]he highest F1 score was achieved when all proposed features were included". In addition to these features, we have included normalised frequencies of part-of-speech (PoS) features, as they have been highlighted as different in the writings of men and women (see Section 2.2).<ref type="foot" target="#foot_2">4</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Modelling</head><p>To replicate the experiments in <ref type="bibr" target="#b7">[8]</ref>, we employed Random Forest (RF) models for the classification task. RF models are known for their robustness to overfitting and ability to handle nonlinear relationships. For fairness analysis, we utilised the Dalex library <ref type="foot" target="#foot_3">5</ref> , which provides tools to explain, explore, and mitigate biases in ML models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Algorithmic Fairness</head><p>In this work, bias is defined as systematic deviations in predictions that favour or disadvantage one group -here, authors -based on sensitive features (such as gender, ethnicity, religion, etc.). To address this, we incorporate fairness analyses to identify and examine such biases.</p><p>Group fairness is particularly relevant in our context as it seeks equitable treatment across different groups of authors. This approach balances the distribution of treatments and resources between groups to ensure that predictions do not disproportionately favour or disadvantage one or multiple social groups <ref type="bibr" target="#b15">[16]</ref>. Equality of opportunity, equalised odds, and calibration within groups are metrics used to estimate group fairness in predictive models. Integrating these fairness considerations into DH research is crucial, as biased tools can lead to the misrepresentation of corpora and minority groups, as highlighted in <ref type="bibr" target="#b26">[27]</ref>.</p><p>Equality of opportunity ensures that the opportunity to be classified as a true positive instance is equal for all groups, and for all social groups considered their positive rate (TPR) should be equal:</p><formula xml:id="formula_0">𝑇 𝑃 𝑎 𝑇 𝑃 𝑎 + 𝐹 𝑁 𝑎 = 𝑇 𝑃 𝑏 𝑇 𝑃 𝑏 + 𝐹 𝑁 𝑏 for all groups 𝑎, 𝑏<label>(1)</label></formula><p>In relation to the binary classifier for canonicity, equality of opportunity ensures that the likelihood of correctly recognising a canon book is equal regardless of whether the book is written by a man or a woman.</p><p>Equalised odds extends beyond equality of opportunity by ensuring equality of the true negative rate (TNR) and the false positives rate (FPR) for all the specified groups:</p><formula xml:id="formula_1">𝑇 𝑁 𝑎 𝑇 𝑁 𝑎 + 𝐹 𝑃 𝑎 = 𝑇 𝑁 𝑏 𝑇 𝑁 𝑏 + 𝐹 𝑃 𝑏 for all groups 𝑎, 𝑏<label>(2)</label></formula><p>For the canonicity classifier, equalised odds ensure that the likelihood of incorrectly classifying a non-canon book as canon is equal regardless of whether the book is written by a man or a woman.</p><p>Calibration Within Groups ensures that the precision of the classifier is balanced, meaning the proportion of correct positive predictions (true positives) out of all positive predictions is the same for all groups:</p><formula xml:id="formula_2">𝑇 𝑃 𝑎 𝑇 𝑃 𝑎 + 𝐹 𝑃 𝑎 = 𝑇 𝑃 𝑏 𝑇 𝑃 𝑏 + 𝐹 𝑃 𝑏 for all groups 𝑎, 𝑏<label>(3)</label></formula><p>For the canonicity classifier, this means that the books classified as 'canon' are actually canon and that the accuracy of these predictions is consistent across books written by both men and women. Dalex reports various classification outcomes and calculates the fairness metrics outlined above. The criteria are evaluated using the following: 𝜖 ≤ metric for non-privileged group metric for privileged group</p><formula xml:id="formula_3">≤ 1 𝜖<label>(4)</label></formula><p>with 𝜖 = 0.8, following the four-fifths rule <ref type="bibr" target="#b4">[5]</ref>. This threshold is widely used to detect significant disparities in treatment between groups. The benefit of this approach is that it offers a clear and standardised benchmark for assessing fairness, while its limitation is that it may not detect subtle biases and could oversimplify complex fairness issues <ref type="bibr" target="#b38">[39]</ref>. The groups considered in our experiments are women and men authors, with men authors being the privileged group. The outlined criteria have been shown to be impossible to satisfy simultaneously, except for trivial cases <ref type="bibr" target="#b31">[32,</ref><ref type="bibr" target="#b22">23]</ref>. This is a challenging finding because it is difÏcult to justify sacrificing any of these criteria in a fair classifier. It emphasises the importance of conducting fairness analysis and interpretation within the specific context of use, considering the underlying data foundation. We prioritise equalised odds in the canonicity classifier to ensure fair treatment of men and women authors by balancing FPR and TNR across genders. Without this, one group could disproportionately influence what is deemed canonical. See Section 5 for further discussion.</p><p>Dalex was also used to estimate feature importance for the canon classifiers, employing a permutation-based approach to compute feature importance. This assesses the contribution of each feature to classification outcomes by systematically permuting them and calculating their impact on model performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>In the first round of the experiments, we used the same sampling methods as in <ref type="bibr" target="#b7">[8]</ref> to ensure balance between the positive and negative class: All 618 canon books are used with a random sub-sample of 618 non-canon books. This process was repeated 20 times, and the average accuracy was 0.72 -somewhat reproducing the accuracy of 0.75 reported in <ref type="bibr" target="#b7">[8]</ref>. However, as the gender distribution is not equal for either the positive or negative class, we cannot rule out the effect of class imbalance when examining the fairness results, as models trained on imbalanced datasets often develop a bias favouring the majority class <ref type="bibr" target="#b15">[16]</ref>.</p><p>Considering this, we bootstrapped a 50-50 gender distribution to conduct a more meaningful bias analysis. Since the canon group contains few books by women authors (166 vs. 452 by men authors), we randomly sampled 166 books by men authors from the canon group to achieve gender balance, alongside 166 books by men and women authors from the non-canon group, resulting in a total of 𝑛 = 664. Each sampling selects a random subset of canon books by men authors and non-canon books, and the entire process, including model training and fairness analysis, is repeated 20 times. Sampling is conducted with replacement between rounds to ensure variability between iterations. Hence, all 166 canon books by women authors are in each run used together with a random subset of canon books by men authors.</p><p>When training on a 50-50 gender distribution, the average accuracy on the 20 runs remains approximately the same, 0.71. One potential reason the accuracy is not affected by a smaller data sample is that the balanced gender distribution may enhance the model's ability to generalise across different author groups, counteracting any potential loss of information from the reduced sample size.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Fairness</head><p>Out of the 20 models trained on a 50-50 gender distribution for both the positive and negative classes, 16 models are unfair according to the fairness criteria. Specifically, for 9 of the models, the FPR is lower for women authors than for men authors, and for 7 models, the TPR is higher for women authors than for men authors. The FPR results indicate that the models have a greater tendency to classify non-canon books by men authors as canon, compared to non-canon books by women authors, violating the equalised odds metric. The higher TPR for women shows that the proportion of correctly recognised canon books is greater for women authors, violating the equality of opportunity metric. To gain insights into these results, in the following section, we summarise the feature distributions in the underlying data and feature importance of the models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Feature Importance</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">Consistent Statistically Different Features</head><p>Before examining the predictive models' feature importance, we first tested whether the included features differed between books by women and men authors. To do so, we conducted a Mann-Whitney U test with Bonferroni correction to account for multiple comparisons. This was done for each sample process to ensure that the findings were robust and not related to the random sample. Conducting the test on a 50-50 gender distribution sample rather than the full (imbalanced) dataset minimises the influence of unequal group sizes, providing a clearer understanding of each feature without the confounding effects of gender imbalance. The following features are reported as statistically significant between books by men and women in the canon set in more than half of the sampling rounds:</p><p>• Narrative features: The mean sentiment of all sentences in the book as well as the mean sentiment of the first and last 10% of the book. • The normalised frequencies of negation modifiers, auxiliaries, pronouns, verbs, and nominal subjects. • The ratio between verbs and nouns. Thus, at least some of the 36 text-intrinsic characteristics differ between the canon books by men and women authors, suggesting that there may be a distinct profile for women and men canon authors. When performing a Mann-Whitney U test with Bonferroni correction for multiple comparisons for the whole corpus of 9,000 novels, 31 out of 36 features exhibit a statistically significant difference between books by men vs. books by women authors. Hence, there are larger differences between books written by women and men in the whole corpus than there are in the canon set. Next, we examined each model's feature importance to see if the differences in features between men and women drive the observed biases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Feature Importance in Fair and Unfair Models</head><p>For each model, we analysed feature importance and counted the presence of each feature in both fair and unfair models, respectively. Using the Dalex Library, which identifies the top 10 most influential features, we counted the presence of these features across all models. Fig. <ref type="figure" target="#fig_1">2</ref> presents an overview of the important features in both fair and unfair models, as well as the features that are reported as statistically significantly different between author genders within the canon set.</p><p>The frequency of negation modifiers, type-token-ratio, perplexity and approximate entropy are often reported among the top ten features regardless of whether the classifier is fair or unfair (w.r.t. the considered fairness criteria). Recalling the findings from <ref type="bibr" target="#b7">[8]</ref>, our results confirm the discriminating power of the textual metric perplexity.</p><p>The frequency of negation modifiers and auxiliaries is statistically significant between canon books by men and women authors in all sample runs and an important feature in all and most fair models, respectively. This might suggest that the canon vs. non-canon signal for these features is stronger than the gender difference. This may also be the case for the type-token ratio, which we report to be different between canon books by men and women in 30% of the sample runs, but important for all models.</p><p>Furthermore, the frequency of relative clause modifiers and the compressibility of the text are also important features for distinguishing canon books from non-canon books. Both features are reported more often for the fair models, indicating that despite compressibility being reported as different for men and women authors in the canon group (in 25% of the sample runs), this does not explain the observed bias.</p><p>For the unfair models specifically, we find that the stop words and verb frequency are more important than in the fair models. Verb frequency is reported as statistically significant between books by men and women canon authors in 60% of the sample runs. It is reported as important only in unfair models, indicating that relying on this feature might contribute to the observed biases. Similarly, although the frequency of stop words is only reported as statistically different in books by women and men canon authors in 5% of the sample runs, it might still add to the observed biases when combined with other features.</p><p>For the mean sentiment of the 10% first and last parts of the books, we see that they are only important for the unfair models, while they are reported to be statistically significantly different for books by men and women canon authors. This indicates that these features might contribute to the observed biases. The mean sentiment of all sentences, which is reported as statistically significantly different between men and women authors in all runs, is an important feature for 20% of both the fair and the unfair models, and we can, therefore, not conclude how it contributes to biases.</p><p>Moreover, the frequencies of nominal subjects and the verb-noun ratio are reported as different between canon books by men and women authors. However, these are not important for the classifiers to tell non-canon from canon books. This suggests that while women canon au-thors and men canon authors differ in these features, they are not important predictors for the canon category as such. On the other hand, features such as approximate entropy, perplexity, relative clause modifiers, use of stop words, and type-token ratio appear crucial for determining canonicity. Notably there are no substantial differences between men and women authors regarding these features within the canon group, suggesting a shared canon style among men and women canon writers w.r.t. these features.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion</head><p>The results presented in this paper show that while it is possible to predict canonicity based on text-intrinsic features, it is crucial to consider social biases in these models, such as the effect of author gender. Moreover, our results show that research in DH and computational literary studies can benefit from insights from algorithmic fairness to increase awareness of social biases ingrained into methods and datasets. In the following, we outline our main findings and discuss them in relation to earlier work and fairness considerations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Features</head><p>Regarding the feature importance results, it is important to note that with the 50-50 gender distribution in this work and the inclusion of PoS frequencies, we do not reproduce the same feature importances as reported in <ref type="bibr" target="#b7">[8]</ref>. While perplexity is confirmed as having discriminative power in our results, nominal style, readability, and predictability of sentiment arcs do not appear to be significant predictors of canonicity.</p><p>As the experiments in <ref type="bibr" target="#b7">[8]</ref> did not take gender into account, their models have been exposed to more men authors than women authors -both of the canon and the non-canon group. In contrast, our bootstrapped sampling process ensures that our models are exposed to an equal number of texts by men and women authors. Predictability and nominal style were reported as statistically significantly different in canon books by men and women authors, but these features did not emerge as predictors of canonicity in our experiments. Therefore, it seems plausible that by highlighting these exact features, the models in <ref type="bibr" target="#b7">[8]</ref> might have picked up on a style associated with men (canon) authors rather than canonicity itself. However, keep in mind that our inclusion of PoS features in the analysis may also influence which features are reported as most important. It is possible that these features remain important but appear further down the list in our models.</p><p>These results underscore the necessity of a careful sampling process when dealing with imbalanced data. Our bootstrapping method, while straightforward, is not without the limitation of reducing datapoints. A more refined approach would involve up-sampling texts by women authors to match the distribution of existing women-authored books.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">False Positives</head><p>11 of the 16 unfair models have a higher FPR for men authors than for women authors, showing a tendency to over-include non-canon men in the canon, rather than non-canon women. At the same time, the TPR is higher for women authors, indicating that the models have an easier time recognising canon works by women than by men. Overall, this seems to point to a harder divisibility of the men authors' space between canonical and non-canonical books. One potential explanation for this is that the distance from the canon group might be larger for the non-canon women than for the non-canon men. The hypothesis is sketched in Fig. <ref type="figure" target="#fig_2">3</ref>. Further work is needed to test whether this is the case, potentially through techniques like embedding-based clustering of books based on text-intrinsic features used in the present study. A closer examination of how genre plays into the effect observed is also needed, especially as a larger distance between canon and non-canon women authors may be due to other effects related to gender disparity. Women authors are shown to predominantly write in genres such as romance, children's literature, and young adult fiction <ref type="bibr" target="#b41">[42,</ref><ref type="bibr" target="#b27">28]</ref>. If genres like romance are dominated by women authors and are less represented in canonical compilations <ref type="bibr" target="#b16">[17]</ref>, and if genres are closely related to writing style <ref type="bibr" target="#b21">[22]</ref>, the disparity between canon and non-canon books (such as romance novels) by women authors may be larger.</p><p>To avoid naturalising these findings, caution is required when speculating that non-canon women authors align less with the canon style; and our study does not draw definitive conclusions about the intrinsic qualities of men's versus women's writing. Previous studies have identified gender differences in texts (see Section 2.2), but these findings do not always generalise well <ref type="bibr" target="#b10">[11]</ref>. Our analysis reflects the underlying data of the Chicago Corpus, which prioritises widely circulated books. If there is a greater disparity between canon and non-canon women authors than men authors, it could result from differential reception <ref type="bibr" target="#b23">[24]</ref> and "false labelling" of women's works. This highlights how seemingly objective text-intrinsic features can embody social biases, as extensively discussed in <ref type="bibr" target="#b10">[11]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Impossibility considerations</head><p>As discussed in Section 3.3, the impossibility theorem of algorithmic fairness <ref type="bibr" target="#b31">[32]</ref> shows that different metrics for group fairness are incompatible with each other if the distribution of pos-itives varies between groups -also known as unequal base rates. In our experiments, we ensured equal base rate through the 50-50 gender distribution for both the positive and negative classes. Despite this, the majority of the models displayed biased predictions based on author gender. Specifically, the FPR is higher for men authors than for women authors in 11 out of the 16 unfair models, leading to a violation of equalised odds.</p><p>In real-world scenarios, the base rates are rarely equal. Therefore, addressing such unfairness often involves accepting lower accuracy, a trade-off known as the parity-accuracy tradeoff <ref type="bibr" target="#b22">[23]</ref>. To balance accuracy and fairness and to choose which fairness metric to prioritise, it is essential to consider the context of use and the intended goals carefully. For a canonicity classifier aimed at understanding canonical literature, it is arguably important to avoid unequal false positives, as this would result in one social group having disproportionate (false) influence over what represents canonical literature. This consideration supports prioritising equalised odds, which addresses fairness in terms of error rates across groups.</p><p>The publishing industry might make up another potential use case for binary classifiers predicting categories such as 'bestseller' or 'quality' <ref type="bibr" target="#b44">[45,</ref><ref type="bibr" target="#b1">2]</ref>. If an ML classifier predicts the success of new manuscripts, it is still preferable to avoid favouring one group over another, thus supporting equalised odds. However, if human experts later sort the manuscripts, over-including false positives is not as harmful as violating equal opportunity (where one group's positive instances are more likely to be disregarded). In such a use case, the cost of being falsely disregarded is higher than being falsely recognised. Therefore, equality of opportunity is crucial to ensure manuscripts with high potential are equally likely to be recognised, regardless of the author's group (e.g., gender, ethnicity)</p><p>For a binary classifier used in the publishing industry, it is also crucial to consider the fairness criteria calibration within groups, as it ensures consistency between predicted probabilities and actual outcomes within each group. Hence, if the classifier consistently predicts a 10% likelihood of bestseller status for manuscripts written by men, then roughly 10% of those manuscripts should indeed turn out to be bestsellers when checked against the actual data, and similarly for other groups. Lack of calibration within groups could lead to systematically overconfident or underconfident predictions for certain groups.</p><p>The sections above show how biases can be embedded within ML models used to predict literary phenomena. While the field of algorithmic fairness can help identify and address such skewness, it is worth asking some more fundamental questions about the existing approaches of using imbalanced literary corpora to classify literary works. One thing to keep in mind is that developing predictive ML models relies on an assumption about the existence of classification schema, which can serve as a ground truth. In other words, justifying a canonicity classifier through its accuracy relies on an acceptance of the distinction between the canon novels and the non-canon novels and the canon's historical profile. Such considerations should not be seen as a dismissal of the idea of a literary canon per se; rather, we aim to encourage reflections about what happens when contested classification schema is operationalised into predictive models. Similar points are addressed by Piper <ref type="bibr" target="#b37">[38]</ref>: "[W]hile statistical tests can measure the functioning of the model ("the extent to which what we are observing exceeds the boundaries of chance"), they cannot confirm "whether the model is an appropriate approximation of the phenomenon that one is claiming to observe"" (quoted in <ref type="bibr" target="#b10">[11]</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion and Future Works</head><p>This study emphasises the critical role of algorithmic fairness in computational literary studies, especially in addressing gender biases in classification models. Despite balanced training data, our findings show that ML models still exhibit significant gender biases, misclassifying noncanon books by men as canon more frequently than those by women, thus violating equalised odds. This suggests that ignoring gender distribution in literary datasets can bias models towards associating men-authored writing styles with canonicity and relatedly can lead to misguided ideas about the textual characteristics of categories like canonic literature.</p><p>Our results reveal that seemingly objective text-intrinsic features can harbour social biases, highlighting the need to critically reflect on potential biases in datasets and corpora. By integrating fairness considerations into ML model development and application in computational literary studies, we can not only improve the reliability of the research results but also foster inclusivity by ensuring the representation of all social groups and not just those historically included in established canons. <ref type="foot" target="#foot_4">6</ref>Further research is needed to understand feature distributions across author genders and their impact on biases. One approach is to create embedding-based clustering to analyse how different author genders are located and distributed within and outside of the canon category.</p><p>As pointed out in section 4.3, some features (approximate entropy, perplexity, relative clause modifiers, use of stop words, and type-token ratio) appear crucial for determining canonicity while showing no substantial differences between men and women authors. This suggests a shared canon style among the canon writers, and future work could examine whether these features are consistent across different genres or literary movements within the canon and how this evolves over time.</p><p>A limitation of our experiments is that genres were not considered. Future research should incorporate genre distinctions to ensure significant features of canon literature are not conflated with genre-specific ones. This is particularly crucial in the sampling process to avoid comparing canon books against genre literature.</p><p>Another limitation is the influence of pressures from the publishing industry. Research has shown that women writers often face constraints from publishers regarding their writing style and subject matter <ref type="bibr" target="#b43">[44,</ref><ref type="bibr" target="#b6">7]</ref>. While this requires further investigation, it highlights the social context shaping how literature is written, published, and distributed -factors that inevitably influence literary data and the resulting predictive models.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure1: For this model, bias was detected in 3 metrics: TPR, ACC, and FPR. When FPR are too unequal between men and women authors, the fairness criteria Equalised Odds is violated. This is true for 11 of the 20 models. When the TPR is too unequal between men and women authors, the fairness criteria of Equal Opportunity is violated; this holds for 5 of the 20 models. The accuracy for men and women authors is too unequal in 2 out of 20 models.</figDesc><graphic coords="7,120.53,84.17,354.21,216.10" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2:For each feature, the coloured bars show how large a proportion of the fair models (green) and the unfair models (pink) are reporting this feature in their ten most important features. The grey bars show how large a proportion of the sampling rounds the feature is reported as statistically significant between canon books by men and women. Hence, the grey bars are not linked to the classifiers but are descriptive statistics of the underlying data of canon books and can be used to interpret whether the observed biases can be linked to differences in feature distributions. (F) = frequencies, normalised by word count.</figDesc><graphic coords="9,89.28,84.17,416.72,149.99" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure3: Sketch of one potential explanation for the higher FPR for men authors: non-canon books written by women might be more different from the canon group compared to the difference between canon and non-canon books by men.</figDesc><graphic coords="11,141.37,84.17,312.54,162.05" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Women/men authors (bottom) represent the number of works written by women or men authors in the canon/noncanon, and in subcategories of the canon.</figDesc><table><row><cell></cell><cell>Chicago corpus</cell><cell cols="4">Canon OpenSyllabus Norton Penguin classics</cell></row><row><cell>Texts</cell><cell>9,089</cell><cell>618</cell><cell>476</cell><cell>401</cell><cell>77</cell></row><row><cell>/</cell><cell cols="2">3,289 / 5,800 166 / 452</cell><cell cols="2">132 / 344 93 / 307</cell><cell>7 / 70</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">We use the terms "women and men authors" instead of the more commonly used "female and male authors" to distinguish cultural gender (which is examined in this paper) from biological sex.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">The textual features, including reception categories like 'canon', are described on Github</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">Including features that are potentially strong markers of gender is important because other features can act as 'proxies' for these. Ignoring them might not reduce bias, as the model could still pick up on these proxies. Including them allows for a more comprehensive analysis of potential biases<ref type="bibr" target="#b4">[5]</ref>.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">https://dalex.drwhy.ai/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">Distribution plots for features statistically significant between men and women canon authors are provided in the appendix.</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Online Resources</head><p>See https://zenodo.org/records/12699037 for code.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Appendix Table 2</head><p>Number of (sampling) runs where the feature levels are statistically significant between men canon and women canon books. A statistically significant difference is defined as p&lt;0.05, with Bonferroni correction for multiple comparisons. (F) = frequencies, normalised based on the word count. Figure <ref type="figure">4</ref>: Distribution plots for features reported as statistically significant between men and women canon authors in more than half of the sampling rounds.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Between Canon and Corpus: Six Perspectives on 20th-Century Novels</title>
		<author>
			<persName><forename type="first">M</forename><surname>Algee-Hewitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mcgurl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Literary Lab Pamphlet</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<date type="published" when="2015">2015</date>
			<publisher>Stanford Literary Lab</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Archer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Jockers</surname></persName>
		</author>
		<idno type="DOI">10.5555/3098683</idno>
		<title level="m">The bestseller code: Anatomy of the blockbuster novel</title>
				<meeting><address><addrLine>Usa</addrLine></address></meeting>
		<imprint>
			<publisher>St. Martin&apos;s Press</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Gender, genre, and writing style in formal written texts</title>
		<author>
			<persName><forename type="first">S</forename><surname>Argamon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Koppel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Shimoni</surname></persName>
		</author>
		<idno type="DOI">10.1515/text.2003.014</idno>
	</analytic>
	<monogr>
		<title level="j">Text &amp; talk</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="321" to="346" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Measuring the effects of bias in training data for literary classification</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bagga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Piper</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences</title>
				<meeting>the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences</meeting>
		<imprint>
			<publisher>Humanities and Literature</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="74" to="84" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Big data&apos;s disparate impact</title>
		<author>
			<persName><forename type="first">S</forename><surname>Barocas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D</forename><surname>Selbst</surname></persName>
		</author>
		<idno>doi: 24758720</idno>
	</analytic>
	<monogr>
		<title level="j">Calif. L. Rev</title>
		<imprint>
			<biblScope unit="volume">104</biblScope>
			<biblScope unit="page">671</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Operationalizing Canonicity: A Quantitative Study of French 19th and 20th Century Literature</title>
		<author>
			<persName><forename type="first">J</forename><surname>Barré</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-B</forename><surname>Camps</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Poibeau</surname></persName>
		</author>
		<idno type="DOI">10.22148/001c.88113</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Cultural Analytics</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">3</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Authors of Slender Means? Female Authorship in Mid-Twentieth-Century British Fiction</title>
		<author>
			<persName><forename type="first">I</forename><surname>Berensmeyer</surname></persName>
		</author>
		<idno type="DOI">10.1515/zaa-2022-2073</idno>
	</analytic>
	<monogr>
		<title level="j">Zeitschrift für Anglistik und Amerikanistik</title>
		<imprint>
			<biblScope unit="volume">70</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="385" to="402" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Good Books are Complex Matters: Gauging Complexity Profiles Across Diverse Categories of Perceived Literary Quality</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bizzoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Feldkamp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jacobsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Thomsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Nielbo</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2404.04022</idno>
		<idno type="arXiv">arXiv:2404.04022</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">The fractality of sentiment arcs for literary quality assessment: The case of Nobel laureates</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bizzoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Moreira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Thomsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Nielbo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Data Mining &amp; Digital Humanities</title>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A Matter of Perspective: Building a Multi-Perspective Annotated Dataset for the Study of Literary Quality</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bizzoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">F</forename><surname>Moreira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">M S</forename><surname>Lassen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Thomsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Nielbo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING</title>
				<meeting>the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING</meeting>
		<imprint>
			<date type="published" when="2024">2024. 2024</date>
			<biblScope unit="page" from="789" to="800" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Why you can&apos;t model away bias</title>
		<author>
			<persName><forename type="first">K</forename><surname>Bode</surname></persName>
		</author>
		<idno type="DOI">/10.1215/00267929-7933102</idno>
	</analytic>
	<monogr>
		<title level="j">Modern Language Quarterly</title>
		<imprint>
			<biblScope unit="volume">81</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="95" to="124" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Predicting Canonization: Comparing Canonization Scores Based on Text-Extrinsic and -Intrinsic Features</title>
		<author>
			<persName><forename type="first">J</forename><surname>Brottrager</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Stahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Arslan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<meeting><address><addrLine>Antwerp, Belgium</addrLine></address></meeting>
		<imprint>
			<publisher>Ceur</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="195" to="205" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Textual Analysis</title>
		<author>
			<persName><forename type="first">J</forename><surname>Burrows</surname></persName>
		</author>
		<idno type="DOI">10.1002/9780470999875</idno>
	</analytic>
	<monogr>
		<title level="m">A Companion to Digital Humanities</title>
				<imprint>
			<publisher>John Wiley &amp; Sons, Ltd</publisher>
			<date type="published" when="2004">2004</date>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="323" to="347" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Gender Trouble: Feminism and the Subversion of Identity</title>
		<author>
			<persName><forename type="first">J</forename><surname>Butler</surname></persName>
		</author>
		<idno type="DOI">10.4324/9780203824979</idno>
		<imprint>
			<date type="published" when="2006">2006</date>
			<publisher>Routledge</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Reading difference: How race and ethnicity function as tools for critical appraisal</title>
		<author>
			<persName><forename type="first">P</forename><surname>Chong</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.poetic.2010.11.003</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1016/j.poetic.2010.11.003" />
	</analytic>
	<monogr>
		<title level="j">Poetics</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="64" to="84" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Algorithmic fairness</title>
		<author>
			<persName><forename type="first">S</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Stanton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Wallace</surname></persName>
		</author>
		<idno type="DOI">10.1146/annurev-financial-110921-125930</idno>
	</analytic>
	<monogr>
		<title level="j">Annual Review of Financial Economics</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="565" to="593" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Measuring Literary Quality. Proxies and Perspectives</title>
		<author>
			<persName><forename type="first">P</forename><surname>Feldkamp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bizzoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Thomsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Nielbo</surname></persName>
		</author>
		<idno type="DOI">10.26083/tuprints-00027391</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
			<pubPlace>Darmstadt</pubPlace>
		</imprint>
	</monogr>
	<note type="report_type">Report</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Success with Style: Using Writing Style to Predict the Success of Novels</title>
		<author>
			<persName><forename type="first">V</forename><surname>Ganjigunte Ashok</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Choi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2013 Conference on Empirical Methods in Natural Language Processing<address><addrLine>Seattle, Washington, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="1753" to="1764" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Are texts recognizably gendered? An experiment and analysis</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hayward</surname></persName>
		</author>
		<idno type="DOI">10.1016/s0304-422x(03)00005-6</idno>
	</analytic>
	<monogr>
		<title level="j">Poetics</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="87" to="101" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">The feminine style: Theory and fact</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">P</forename><surname>Hiatt</surname></persName>
		</author>
		<idno type="DOI">10.2307/356931</idno>
	</analytic>
	<monogr>
		<title level="j">College Composition &amp; Communication</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="222" to="226" />
			<date type="published" when="1978">1978</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">A Study on Using Semantic Word Associations to Predict the Success of a Novel</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">Jannatus</forename><surname>Saba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">S</forename><surname>Bijoy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gorelick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ismail</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Islam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Amin</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.starsem-1.4</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics. Online: Association for Computational Linguistics</title>
				<meeting>*SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics. Online: Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="38" to="51" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Macroanalysis: Digital Methods and Literary History</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Jockers</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Topics in the digital humanities</title>
				<meeting><address><addrLine>Urbana</addrLine></address></meeting>
		<imprint>
			<publisher>University of Illinois Press</publisher>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Inherent Trade-Offs in the Fair Determination of Risk Scores</title>
		<author>
			<persName><forename type="first">J</forename><surname>Kleinberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mullainathan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Raghavan</surname></persName>
		</author>
		<idno type="DOI">10.4230/LIPIcs.ITCS.2017.43</idno>
	</analytic>
	<monogr>
		<title level="m">Leibniz International Proceedings in Informatics (LIPIcs). Dagstuhl</title>
		<title level="s">Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik</title>
		<editor>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Papadimitriou</surname></persName>
		</editor>
		<meeting><address><addrLine>Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="volume">67</biblScope>
			<biblScope unit="page">23</biblScope>
		</imprint>
	</monogr>
	<note>8th Innovations in Theoretical Computer Science Conference (ITCS</note>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Women&apos;s books versus books by women</title>
		<author>
			<persName><forename type="first">C</forename><surname>Koolen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Automatically categorizing written texts by author gender</title>
		<author>
			<persName><forename type="first">M</forename><surname>Koppel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Argamon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Shimoni</surname></persName>
		</author>
		<idno type="DOI">10.1093/llc/17.4.401</idno>
	</analytic>
	<monogr>
		<title level="j">Literary and linguistic computing</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="401" to="412" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Predicting author gender using machine learning algorithms: Looking beyond the binary</title>
		<author>
			<persName><forename type="first">K</forename><surname>Land</surname></persName>
		</author>
		<idno type="DOI">10.16995/dscn.362</idno>
	</analytic>
	<monogr>
		<title level="j">Digital Studies/Le champ numérique</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Epistemic consequences of unfair tools</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">M S</forename><surname>Lassen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">D</forename><surname>Kristensen-Mclachlan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Almasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Enevoldsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Nielbo</surname></persName>
		</author>
		<idno type="DOI">10.1093/llc/fqad091</idno>
	</analytic>
	<monogr>
		<title level="j">Digital Scholarship in the Humanities</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="198" to="214" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Persistence of Gender Asymmetries in Book Reviews Within and Across Genres</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">M S</forename><surname>Lassen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">F</forename><surname>Moreira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bizzoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Thomsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Nielbo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">3558</biblScope>
			<biblScope unit="page">14</biblScope>
		</imprint>
	</monogr>
	<note>ceur workshop proceedings</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Reviewer Preferences and Gender Disparities in Aesthetic Judgments</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">M S</forename><surname>Lassen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bizzoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Peura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Thomsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Nielbo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">3290</biblScope>
			<biblScope unit="page" from="280" to="290" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Race and Gender in the Shaping of the American Literary Canon A Case Study from the Twenties</title>
		<author>
			<persName><forename type="first">P</forename><surname>Lauter</surname></persName>
		</author>
		<idno type="DOI">10.1093/oso/9780195055931.003.0007</idno>
	</analytic>
	<monogr>
		<title level="m">Canons and Contexts</title>
				<imprint>
			<publisher>Oxford University Press</publisher>
			<date type="published" when="1991">1991</date>
			<biblScope unit="page" from="22" to="47" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">A Multi-task Approach to Predict Likability of Books</title>
		<author>
			<persName><forename type="first">S</forename><surname>Maharjan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Arevalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">A</forename><surname>González</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<editor>
			<persName><forename type="first">M</forename><surname>Lapata</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Blunsom</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Koller</surname></persName>
		</editor>
		<meeting>the 15th Conference of the European Chapter of the Association for Computational Linguistics<address><addrLine>Valencia, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1217" to="1227" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<title level="m" type="main">The impossibility of &quot;fairness&quot;: a generalized impossibility result for decisions</title>
		<author>
			<persName><forename type="first">T</forename><surname>Miconi</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1707.01195</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv</note>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Approximate Entropy in Canonical and Non-Canonical Fiction</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mohseni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Redies</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Gast</surname></persName>
		</author>
		<idno type="DOI">10.3390/e24020278</idno>
	</analytic>
	<monogr>
		<title level="j">Entropy</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">278</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Modeling Readers&apos; Appreciation of Literary Narratives Through Sentiment Arcs and Semantic Profiles</title>
		<author>
			<persName><forename type="first">P</forename><surname>Moreira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bizzoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Nielbo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">M</forename><surname>Lassen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Thomsen</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.wnu-1.5</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th Workshop on Narrative Understanding</title>
				<meeting>the 5th Workshop on Narrative Understanding</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="25" to="35" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Gender differences in language use: An analysis of 14,000 text samples</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Newman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Groom</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">D</forename><surname>Handelman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Pennebaker</surname></persName>
		</author>
		<idno type="DOI">10.1080/01638530802073712</idno>
	</analytic>
	<monogr>
		<title level="j">Discourse processes</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="211" to="236" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Kønshierarki i kanonlitteratur: En kvantitativ undersøgelse af køn</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">L</forename><surname>Overgaard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">M</forename><surname>Granum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Dansknoter</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="46" to="49" />
			<date type="published" when="2023">2023. 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">The Textbook Canon: Genre, Gender, and Race in US Literature Anthologies</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">G</forename><surname>Pace</surname></persName>
		</author>
		<idno type="DOI">10.2307/819892</idno>
	</analytic>
	<monogr>
		<title level="j">The English Journal</title>
		<imprint>
			<biblScope unit="volume">81</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="33" to="38" />
			<date type="published" when="1992">1992</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Think small: on literary modeling</title>
		<author>
			<persName><forename type="first">A</forename><surname>Piper</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pmla</title>
		<imprint>
			<biblScope unit="volume">132</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="651" to="658" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">Modeling the behavior of the 4/5ths rule for determining adverse impact: Reasons for caution</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">L</forename><surname>Roth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bobko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">S</forename><surname>Switzer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Iii</forename></persName>
		</author>
		<idno type="DOI">10.1037/0021-9010.91.3.507</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Applied Psychology</title>
		<imprint>
			<biblScope unit="volume">91</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page">507</biblScope>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<monogr>
		<title level="m" type="main">How to Suppress Women&apos;s Writing</title>
		<author>
			<persName><forename type="first">J</forename><surname>Russ</surname></persName>
		</author>
		<idno type="DOI">10.7560/316252</idno>
		<imprint>
			<date type="published" when="1983">1983</date>
			<publisher>University of Texas Press</publisher>
			<pubPlace>Austin, Texas, USA</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">Gender Attribution: Tracing Stylometric Evidence Beyond Topic and Genre</title>
		<author>
			<persName><forename type="first">R</forename><surname>Sarawgi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gajulapalli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Choi</surname></persName>
		</author>
		<idno type="DOI">10.5555/2018936.2018946</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fifteenth Conference on Computational Natural Language Learning</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Goldwater</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Manning</surname></persName>
		</editor>
		<meeting>the Fifteenth Conference on Computational Natural Language Learning<address><addrLine>Portland, Oregon, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="78" to="86" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">Book genre and author gender: Romance&gt;Paranormal-Romance to Autobi-ography&gt;Memoir</title>
		<author>
			<persName><forename type="first">M</forename><surname>Thelwall</surname></persName>
		</author>
		<idno type="DOI">10.1002/asi.23768</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of the Association for Information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">68</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="1212" to="1223" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<analytic>
		<title level="a" type="main">Gender and sentiment, critics and authors: a dataset of Norwegian book reviews</title>
		<author>
			<persName><forename type="first">S</forename><surname>Touileb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Øvrelid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Velldal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Second Workshop on Gender Bias in Natural Language Processing</title>
				<meeting>the Second Workshop on Gender Bias in Natural Language Processing<address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="125" to="138" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b43">
	<analytic>
		<title level="a" type="main">Edging women out: Victorian novelists</title>
		<author>
			<persName><forename type="first">G</forename><surname>Tuchman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">E</forename><surname>Fortin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">publishers and social change</title>
				<meeting><address><addrLine>Oxfordshire, England, UK</addrLine></address></meeting>
		<imprint>
			<publisher>Routledge</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">13</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b44">
	<analytic>
		<title level="a" type="main">Success in books: predicting book sales before publication</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Yucesoy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Varol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Eliassi-Rad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A.-L</forename><surname>Barabási</surname></persName>
		</author>
		<idno type="DOI">10.1140/epjds/s13688-019-0208-6</idno>
	</analytic>
	<monogr>
		<title level="j">EPJ Data Science</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">31</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
